Closing the performance gap between Causal Consistency and

Report
Causal Consistency Without
Dependency Check Messages
Willy Zwaenepoel
INTRODUCTION
Geo-replicated data stores
• Geo-replicated data centers
• Full replication between data centers
• Data in each data center partitioned
Data center
Data center
Data center
Data center
Data center
The Consistency Dilemma
Strong
Consistency
• synchronous
replication
• all replicas
share same
consistent
view
• sacrifice
availability
Causal Consistency
• asynchronous
replication
• all replicas
eventually
converge
• sacrifice
consistency, but …
• … replication
respects causality
Eventual
Consistency
• asynchronous
replication
• all replicas
eventually
converge
• sacrifice
consistency
The cost of causal consistency
Throughput
700
600
Kops/sec
500
400
300
Eventual
200
Causal
100
0
1
2
3
4
Partitions
5
6
Can we close the throughput gap?
• The answer is: yes, but there is a price
STATE OF THE ART: WHY THE GAP?
How is causality enforced?
• Each update has associated dependencies
• What are dependencies?
– metadata to establish causality relations between operations
– used only for data replication
• Internal dependencies
– previous updates of the same client session
• External dependencies
– read updates of other client sessions
Internal dependencies
Alice
US Datacenter
Europe Datacenter
Bob
Example of 2 users performing operations at different datacenters at the same partition
External dependencies
Alice
Bob
US Datacenter
Europe Datacenter
Charlie
Example of 3 users performing operations at datacenters at the same partition
How dependencies are tracked & checked
• In current implementations
–
COPS [SOSP ’11], ChainReaction [Eurosys ‘13], Eiger [NSDI ’13], Orbe [SOCC ’13]
• DepCheck(A) – “Do you have A installed yet?”
Read(B)
Client
Partition 0
Partition 1
…
Partition N
Partition 0
Partition 1
…
Partition N
US Datacenter
Europe Datacenter
DepCheck(A)
DepCheck(B)
Encoding of dependencies
• COPS [SOSP ’11], ChainR. [Eurosys ‘13], Eiger [NSDI ’13]
– “direct” dependencies
– Worst case: O( reads before a write )
• Orbe [SOCC ‘13]
– Dependency matrix
– Worst case: O( partitions )
The main issues
• Metadata size is considerable
– for both storage and communucation
• Remote dependency checks are expensive
– multiple partitions are queried for each update
The cost of causal consistency
Throughput
700
600
Kops/sec
500
400
300
Eventual
200
Causal
100
0
1
2
3
4
Partitions
5
6
The cost of dependency check messages
Throughput
700
600
Kops/sec
500
400
Eventual
300
Causal
200
Causal - Dep check msgs
100
0
1
2
3
4
Partitions
5
6
CAUSAL CONSISTENCY WITH
0/1 DEPENDENCY CHECK MESSAGES
Getting rid of external dependencies
• Partitions serve only fully replicated updates
– Replication Confirmation messages broadcast periodically
• External dependencies are removed
– replication information implies dependency installation
• Internal dependencies are minimized
– we only track the previous write
– requires at most one remote check
• zero if write is local, one if it is remote
The new replication workflow
Alice
Replication Confirmation
(periodically)
US Datacenter
Europe Datacenter
Asia Datacenter
Bob
Example of 2 users performing operations at different datacenters at the same
partition
Reading your own writes
• Clients need not wait for the replication confirmation
– they can see their own updates immediately
– other clients’ updates are visible once they are fully replicated
• Multiple logical update spaces
Alice’s update
space
(visible to Alice)
Bob’s update
space
(visible to Bob)
…
Replication update space (not yet visible)
Global update space (fully visible)
The cost of causal consistency
Throughput
700
600
Kops/sec
500
400
Eventual
300
01-msg Causal
200
Conventional Causal
100
0
1
2
3
4
Partitions
5
6
The price paid: update visibility increased
• With new implementation
~ max ( network latency from origin to furthest replica +
network latency from furthest replica to destination +
interval of replication information broadcast )
• With conventional implementation:
~ network latency from origin to destination
CAUSAL CONSISTENCY WITHOUT
DEPENDENCY CHECK MESSAGES ?!
Is it possible?
• Only make update visible when one can locally
determine no causally preceding update will become
visible later at another partition
How to do that?
• Encode causality by means of Lamport clock
• Each partition maintains its Lamport clock
• Each update is timestamped with Lamport clock
• Update visible
– update.timestamp ≤ min( Lamport clocks )
• Periodically compute minimum
0-msg causal consistency - throughput
Kops/sec
Throughput
900
800
700
600
500
400
300
200
100
0
Eventual
0-msg Causal
Conventional Causal
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Partitions
The price paid: update visibility increased
• With new implementation
~ max ( network latency from origin to furthest replica +
network latency from furthest replica to destination +
interval of minimum computation )
• With conventional implementation:
~ network latency from origin to destination
How to deal with “stagnant” Lamport clock?
• Lamport clock stagnates if no update in a partition
• Combine
– Lamport clock
– Loosely synchronized physical clock
– (easy to do)
More on loosely synchronized physical clocks
• Periodically broadcast clock
• Reduces update visibility latency to
– Network latency from furthest replica to destination +
maximum clock skew +
clock broadcast interval
Can we close the throughput gap?
• The answer is: yes, but there is a price
• The price is increased update visibility
Conclusion: Throughput, messages, latency
Method
Throughput
# of dep.check
messages
Update visibility
Conventional
< Evt. consistency
O( Rs since W )
O( partitions )
D
D
01-msg
~ Evt. consistency
O or 1
~ 2 Dmax
0-msg
~ Evt. consistency
0
~ 2 Dmax
0-msg + physical clock
~ Evt. consistency
0
~ Dmax

similar documents