Transaction Processing on Top of Hadoop

Transaction Processing
on Top of Hadoop
Spring 2012
Aviram Rehana
Lior Zeno
Supervisor : Edward Bortnikov
• The problem and motivation
• Background on databases’ access
• What is done so far
• Project goals
• Effort assessment and roadmap
The problem and motivation
• Combining transactional and non-transactional operations on
top of a distributed NoSQL database.
• A little bit of motivation:
• A new approach in the distributed databases field.
• Better performance in comparison to related solutions.
• Ease of use from a programmer’s perspective.
Background on databases
• Historically, data processing was done with SQL
databases, which supported transactions.
• The computer and internet evolution caused information
growth and specifically in databases area there was a
need for bigger databases support.
• Due to the above scalability issues, SQL databases were
replaced by Web-scale NoSQL stores in the early 2000s.
These technologies dropped txn support though.
Background on databases
• Lately, some NoSQL databases like HBase started supporting
• Application designers noticed the potential of transactions in
the distributed field and solutions have risen over the years.
• Our goal is to extend the mechanism to support both types of
What is done so far
• Existing implementation open-source stack:
• HDFS – a distributed file system primarily designed for read only
access (In use by Facebook and others).
• HBase – a distributed R/W database on top of HDFS without support
for Txns.
• OMID – transaction processing service on top of HBase designed by
Yahoo! Research team.
• OMID does not guarantee consistency between transactional and
non-transactional access to the same data at the same time.
• We wish to extend OMID to support transactional as well as
traditional non-transactional clients.
The OMID model
An example of mixed access
• What happens if there are no rules:
What is a transaction?
• A transaction is a collection of operations in an application
that fulfill the ACID Properties:
• Atomicity - Either all or none of the transaction's operations are
• Consistency - T(x) is a correct transformation of the state.
• Isolation - Even though txs execute concurrently, it appears to
each tx T that others executed either before T or after
• Durability - Results of a committed transaction are permanent, in
spite of possible failures.
Levels of Isolation
• Nevertheless, achieving the highest level of isolation
(Serializability) is too expensive. It requires locking – less
• How do we overcome this issue?
• Snapshot Isolation - A guarantee that all reads made in a
transaction will see a consistent snapshot of the database (in
practice it reads the last committed values that existed at the
time it started), and the transaction itself will successfully
commit only if no updates it has made conflict with any
concurrent updates made since that snapshot.
• SI is less consistent, but it is
much more scalable!
Project goals
• Extend OMID work to support both transactional and
non-transactional accesses against the distributed
• Implementation of base algorithms in use for the project:
• Lazy Snapshot Algorithm (LSA)
• RingSTM
• Achieve better performance in comparison to OMID and
other solutions in regular transactional clients.
Roadmap of the project
• Phase 1:
• Design RingSTM + LSA components – 2 weeks
• Integrate the design into OMID – 5 weeks
• Simulations and benchmarking – 2 weeks
• Phase 2:
• Design write conflict against non-transactional clients module – 2
• Integrate above design into OMID – 3 weeks
• Simulations and benchmarking of whole project – 2 weeks
• Final conclusions – 1 week
• Future work:
• Implementing serializability and dynamic separation.
Special thanks: Eshcar Hillel.

similar documents