Distributed Databases

Distributed databases
A brief introduction
(Figure numbers may not be the
same as in the book)
Distributed databases
Distributed database concepts
• Distributed database (DDB)
– Collection of multiple logically interrelated
databases distributed over a computer network
• Distributed database management systems
– Software systems managing a distributed
database, making distribution transparent to the
Distributed databases
• Hiding implementation details from the users of the
• Data organization transparency
– Location transparency
• Use does not depend on location
– Naming transparency
• Naming is independent from location
• Replication transparency
– Copies can be kept for availability, performance, and availability
• User are unaware of the existence of these copies
• Fragmentation transparency
– One table is divided into more locations
– Horizontal fragmentation
• Table divided by rows
– Vertical fragmentation
• Table divided by columns
Distributed databases
Example: Replication and horizontal
Distributed databases
Reliability and Availability
• Two common advantages of distributed
• Reliability
– The probability that a system is running at a
certain time point
• Availability
– The probability that a system is continuously
available during a time interval
Distributed databases
Advantages of distributed databases
1. Improved ease and flexibility of application
– Transparency: Developers do not have to know …
2. Increased reliability and availability
– Faults are isolated to a single site
3. Improved performance
– Data localization, means less network traffic
– Parallelism
4. Easier expansion
– Easy to add more data, processors, etc.
Distributed databases
Types of distributed database systems
• Degree of homogeneity
– Homogeneous: All local DBMSs run identical
– Heterogeneous: Local DMBSs run different
• Autonomy
– Local autonomy: Local site can function as a
standalone DBMS
– No autonomy: Local site can not function as a
standalone DBMS
Distributed databases
Classification of distributed databases
Distributed databases
Database system architectures
Distributed databases
General architecture
Distributed databases
Component architecture
of distributed databases
Distributed databases
Data fragmentation
• Which site should store which portion of the database?
• Simple fragmentation
– Each site has a whole relation
• Horizontal fragmentation
– Subset of rows in each site
• Sometimes based on location
• Vertical fragmentation
– Subset of columns in each site
• Primary key must be in all sites
• Mixed / hybrid fragmentation
– Horizontal + vertical fragmentation
– Described by fragmentation schema
Distributed databases
Example fragmentation
Distributed databases
Example fragmentation, continued
Distributed databases
Data replication
• Replication to improve availability
• Fully replicated database
– All data is replicated to each site
• Non replication
– All data is stored at exactly one site
• Partial replication
– Some data is replicated to some sites
– Described by replication schema
Distributed databases
Distributed query processing
1. Query mapping
– Query mapped from SQL to relational algebra using
the global conceptual schema
2. Localization
– Map query on the global schema to separate queries
on the local schemas
– Using fragmentation and replication information
3. Global query optimization
– Cost = CPU time + I/O time + communication time
4. Local query optimization
– Same as in centralized databases
Distributed databases
Distributed transaction management,
Two-phase commit protocol (2PC)
• Global transaction manager / coordinator
– Coordinates the results of local transaction managers.
– All local transaction managers must be able to ”commit”, before
actually doing the ”commit”
• Two-Phase commit protocol (2PC)
– Phase 1
• Individual databases tell the coordinator that they have finished transaction
• All individual databases have finished: Coordinator sends ”prepare for
commit” to all databases
• Individual databases answer ”read to commit” or ”cannot commit”
– Phase 2
• If all databases answered ”ready to commit”, coordinator sends ”commit” to
all databases
• If one (or more) databases answered ”cannot commit”, coordinator sends
”abort” to all databases.
• Timeout: if one (or more) databases does not answer within a given amount
of time, coordinator sends ”abort”.
Distributed databases
Two-phase commit protocol (2PC)
• Problems with 2PC
– Coordinator crashes: All participating sites are
– No way of knowing whether participating sites
really got the ”commit” / ”abort”
Distributed databases
Three-phase commit (3PC)
Distributed databases

similar documents