Data Management in the Cloud “Consistency Rationing in the Cloud: Pay Only When It Matters” Authors: Tim Kraska, Martin Hentschel, Gustavo Alonso, and Donald Kossmann VLDB ’09, August Latasha A. Gibbs CSCE 824 – Secure Database Systems Spring 2013 University of South Carolina AGENDA • INTRODUCTION • USE CASES • CONSISTENCY RATIONING • POLICIES • IMPLEMENTATION • SUMMARY • FUTURE WORK & QUESTIONS INTRODUCTION • Promise of high scalability and low cost • Existing solutions differ in the level of consistency provided • Implement database-like facilities on top of cloud storage • High consistency means high cost per transaction • Lower consistency is cheaper • Not all data needs to be treated with the same level of consistency AT WHAT PRICE? • CONSISTENCY LEVEL is measured in terms of the number (#) of service calls needed to enforce CONSISTENCY LEVEL A&B • Category A – Serializability • Expensive in in terms of monetary costs and performance • Serializability is provided via 2PL • Data should be put in Category A when up-to-date views are a must • Category B – Adaptive • Level of consistency depends on situation • Switches between session consistency and serializability at runtime • Policies are designed to make the switch automatic and dynamic C • Category C – Session Consistency has been identified as the minimum consistency level that does not result in excessive complexity for the developer • After some time the system will converge and become eventually consistent • Session consistency is cheap • Permits extensive caching • When inconsistencies cannot occur, cloud databases should place data in C category AGENDA • INTRODUCTION • USE CASES • CONSISTENCY RATIONING • POLICIES • IMPLEMENTATION • SUMMARY • FUTURE WORK & QUESTIONS USE CASES CONTINUED… Collaborative Editing • Strategy based on update frequency • Selection of consistency protocol is based on the likelihood of conflicts • Parts of the document that are updated frequently would be handled by strong consistency guarantees in for instance (Category A) CONSISTENCY RATIONING? Since strong consistency is expensive… 1. Use the analysis of categories A, B, and C to categorize the data 2. Apply different consistency strategies for each category AGENDA • INTRODUCTION • USE CASES • CONSISTENCY RATIONING • POLICIES • IMPLEMENTATION • SUMMARY • FUTURE WORK & QUESTIONS POLICIES Five different policies are created to adapt the consistency guarantees for data items in Category B • General Policy • Time Policy • Fixed Threshold Policy • Demarcation Policy • Dynamic Policy GENERAL POLICY • Works on the basis of conflict probability • Looks into the probability of conflict on a given data item and switches to serializability if probability is high enough • Probability of conflicting update on a record is given by the formula below: AGENDA • INTRODUCTION • USE CASES • CONSISTENCY RATIONING • POLICIES • IMPLEMENTATION • SUMMARY • FUTURE WORK & QUESTIONS IMPLEMENTATION BASIC PROTOCOL 16 HOW IS DATA ABOUT DATA USED? • Each collection contains meta data about its type • Given the collection a record belongs to, the system checks to see which consistency level should be enforced • For example, if a record is classified as A data, serializability with strong guarantees is performed • For B data, meta data contains the name of the policy and additional parameters EXPERIMENTS • Database hosted on S3, clients connect to the database via applications servers that run on Amazon’s EC2 • Based on the TPC-W benchmark • Relax the requirement for strong consistency guarantees • All experiments were scheduled to run for 300 seconds and were repeated 10x • Consistency categories were assigned to the data types of the TPC-W benchmark for A data, C data, and Mixed data OPTIMIZATION PENALTIES AGENDA • INTRODUCTION • USE CASES • CONSISTENCY RATIONING • POLICIES • IMPLEMENTATION • SUMMARY • FUTURE WORK & QUESTIONS SUMMARY • It is possible to assign a very precise monetary cost to consistency protocols • Optimization is based on allowing the database to exhibit inconsistencies if it helps to reduce the cost of a transaction and does not cause higher penalty costs • Consistency Rationing lowers overall cost and improves performance in cloud-based database systems • Step towards probabilistic consistency guarantees… “One small step for man, one giant leap for mankind.” –Neil Armstrong FUTURE OUTLOOK • Faster statistical methods • Automatic optimization • New policies • Implementation on other platforms • Emergency rationing REFERENCES  Abadi, Daniel, J. “Data Management in the Cloud: Limitations and Opportunities”. In IEEE Data Engineering Bulletin, 32 (1) Yale University. 2009.  Coy, Steven P. “Security Implications of the Choice of Distributed Database Management System Model: Relational vs. Object-Oriented”. University of Maryland. 2008.  Niccolai, James. “Four Companies Rethink Databases for the Cloud”. Computer World. 23 June. 2011. Web. 16 February 2013.  Abbadi, Amr El, Agrawal, Divyakant, and Das, Sudipto. “Big Data and Cloud Computing: Current State and Future Opportunities”. In the Proceedings of EDBT 2011, ACM March 22-24, 2011.  Valduriez, Patrick. “Principles of Distributed Data Management in 2020?”. DEXA’11 In the Proceedings of the 22nd International Conference on Database and Expert Systems Applications. Volume 1.  Lu, Yanbin and Tsudik, Gene. “Privacy-Preserving Cloud Database Querying”. In the Journal of Internet Services and Information Security (JISIS). Vol.1. No. 4, November 2011.  http://www.hadoop.apahe.org/  http://www.relationalcloud.com/  Curino, C., Jones, E., Popa, R., Malviya, N., Wu, E. Madden, S., Balakrishnan, H., and Zeldovich, N. “Relational Cloud: A Database-as-aService for the Cloud”. In the Proceedings of the 5th Biennial Conference on Innovative Data Systems Research. January 2011.  Özsu, M. Tamer and Valduriez, Patrick. Principles of Distributed Database Systems. New York: Pearson Education, Inc., 2011. Print.  Alonso, Gustavo, Hentschel, Martin, Kossmann, Donald, and Kraska, Tim, “Consistency Rationing: Pay Only When It Matters”. In the Proceedings of International Conference of Very Large Databases (VLDB). 2009. QUESTIONS?