Data-center scale computing presentation

Data Center
Scale Computing
If computers of the kind I have advocated become the computers
of the future, then computing may someday be organized as a
public utility just as the telephone system is a public utility. . . .
The computer utility could become the basis of a new and
important industry.
John McCarthy
MIT centennial celebration (1961)
Presentation by:
Ken Bakke
Samantha Orogvany
John Greene
Data Center System Components
Design and Storage Considerations
Data Center Power supply
Data Center Cooling
Data center failures and fault tolerances
Data center repairs
Current challenges
current research, trends, etc
Data Center VS Warehouse Scale Computer
Data center
• Provide colocated equipment
• Consolidate heterogeneous
• Serve wide variety of customers
• Binaries typically run on a small
number of computers
• Resources are partitioned and
separately managed
• Facility and computing resources
are designed separately
• Share security, environmental and
maintenance resources
Warehouse-scale computer
Designed to run massive internet
Individual applications run on
thousands of computers
Homogeneous hardware and
system software
Central management for a common
resource pool
The design of the facility and the
computer hardware is integrated
Need for Warehouse-scale Computers
Renewed focus on client-side consumption
of web resources
Constantly increasing numbers of web users
Constantly expanding amounts of
Desire for rapid response for end user
Focus on cost reduction delivering massive
Increased interest in Infrastructure as a
Service (Iaas)
Performance and Availability Techniques
Reed-Solomon codes
Health checking
Application specific compression
Eventual consistency
Centralized control
Redundant execution and tail tolerance
Major system components
Typical server is 4 CPU - 8 Dual threaded cores yielding 32 cores
Typical rack - 40 servers & 1 or 10 Gbps ethernet switch
Cluster containing cluster switch and 16 - 64 racks
A cluster may contain tens of thousands of processing threads
Low-end Server vs SMP
Latency 1000 time faster in SMP
Less impact on applications too large for single server
Performance advantage of a cluster built with large SMP server nodes (128-core SMP) over a cluster
with the same number of processor cores built with low-end server nodes (four-core SMP), for clusters
of varying size.
Brawny vs Wimpy
Advantages of wimpy computers
• Multicore CPUs carry a premium cost
of 2-5 times vs multiple smaller CPUs
• Memory and IO bound applications do
not take advantage of faster CPUs
• Slower CPUs are more power
Disadvantages of wimpy computer
• Increasing parallelism is
programmatically difficult
• Programming costs increase
• Networking requirements increase
• Less tasks / smaller size creates
loading difficulties
• Amdahl’s law impacts
Design Considerations
Software design and improvements can be made to align with architectural
Resource requirements and utilization can be balanced among all
o Spare CPU cycles can be used for process intensive applications
o Spare storage can be used for archival purposes
Fungible resources are more efficient
Workloads can be distributed to fully utilize servers
Focus on cost-effectiveness
Smart programmers may be able to restructure algorithms to match a more
inexpensive design.
Storage Considerations
Private Data
• Local DRAM, SSD or Disk
Shared State Data
• High throughput for thousands of users
• Robust performance tolerant to errors
• Unstructure Storage - (Google - GFS)
Master plus thousnads of “chunk” servers
Utilizes every system with a disk drive
Cross machine replication
Robust performance tolerant to errors
• Structured Storage
o Big Table provides Row, Key, Timestamp mapping to byte array
o Trade-offs favor high performance and massive availability
o Eventual consistency model leaves applications managing
consistency issues
Google File System
WSC Network Architecture
Leaf Bandwidth
• Bandwidth between servers in common rack
• Typically managed with a commodity switch
• Easily increased by increasing number of ports or speed of ports
Bisection Bandwidth
• Bandwidth between the two halves of a cluster
• Matching leaf bandwidth requires as many uplinks to fabric as links within a
• Since distances are longer, optical interfaces are required.
Three Stage Topology
Required to maintain same throughput as single switch.
Network Design
Oversubscription ratios of 4-10 are common.
Limit network cost per server
Offloading to special networks
Centralized management
Service level response times
Consider servers with 99th, 99.9th and 99.99th latency > 1s vs # required service requests
Selective replication is one mitigating strategy
Power Supply Distribution
Uninterruptible Power Systems
Transfer switch used to chose active power input
from either utility sources or generator
● After a power failure, the transfer switch will detect
the power generator and after 10-15 seconds,
provide power
● This power system has energy storage to provide
additional protection between power failure of main
utility power and when generators begin providing
full load
● Levels incoming power feed to remove spikes and
lags from AC-feed
Example of Power Distribution Units
Traditional PDU
• Takes in power output from
• Regulates power with
transformers to distribute
power to servers
• Handles 75-225 kW typically
• Provides Redundancy by
switching between 2 power
Examples of Power Distribution
Facebook’s power
distribution system
Designed to increase power
efficiency by reducing
energy loss to about 15%
Eliminates the UPS and
PDU and adds on-board
12v battery for each cabinet
Power Supply Cooling Needs
Air Flow Consideration
Fresh Air cooling
o “Opening the
Closed loop system
o Underfloor
o Servers are on
raised concrete
tile floors
Power Cooling Systems
2-loop Systems
Loop 1 - Hot Air/Cool air circuit (Red/Blue Arrows)
Loop 2 - Liquid supply to Computer Room Air
Conditioning Units and heat discharging
Example of Cooling System Design
3 - Loop System
Chiller sends cooled
water to CRACs
Heated water sent
from building to
chiller for heat
Condenser water
loop flows into
cooling tower
Cooling System for Google
Estimated Annual Costs
Estimated Carbon Costs for Power
Based on local utility power generated via the use of oil, natural gas, coal or
renewable sources, including hydroelectricity, solar energy, wind and
Power Efficiency
Sources of Efficiency Loss
Overheading cooling systems,
such as chillers
Improvements to
Air movement
IT Equipment
Power distribution unit
Handling air flow more carefully.
Keep cooling path short and
separate hot air from servers
from system
Consider raising cooling
Employ “free cooling” by locating
datacenter in cooler climates
Select more efficient power
Data Center Failures
Reliability of Data Center
Trade off between cost of failures, along with repairing,
and preventing failures.
Fault Tolerances
•Traditional servers require high degree of reliability and
redundancy to prevent failures as much as possible
For data warehouses, this is not practical
o Example: a cluster of 10,000 servers will have an average of 1
server failure/day
Data Center Failures
Fault Severity Categories
Data is lost, corrupted, or cannot be regenerated
o Service is down
o Service is available, but limited
o Faults occur but due to fault tolerance, this is
masked from user
Data Center Fault Causes
•Software errors
•Faulty configs
•Human Error
•Networking faults
•Faulty hardware
It’s easier to tolerate
known hardware issues
than software bugs or
human error.
•It’s not critical to quickly
repair individual servers
In reality, repairs are
scheduled as a ‘daily
Individual failures mostly
do not affect overall data
center health
System is designed to
tolerate faults
Google Restarts and Downtime
Relatively New Class of Computers
Facebook founded in 2004
Google’s Modular Data Center in 2005
Microsoft’s Online Services Division in 2005
Amazon Web Services in 2006
Netflix added streaming in 2007
Balanced System
Nature of workload at this scale is:
Large volume
Large variety
This means no servers (or parts of servers)
get to slack while others do the work.
Keep servers busy to amortize cost
Need high performance from all
Imbalanced Parts
Latency lags bandwidth
Imbalanced Parts
CPUs have been historical focus
Focus Needs to Shift
Push toward SaaS will highlight these
Requires concentrating research:
Improving non-CPU components
Improving responsiveness
Improving end-to-end experience
Why does latency matter?
Responsiveness dictated by latency
Productivity affected by responsiveness
Real Estate Considerations
Google’s Data Centers
Economical Efficiency
DC is non-trivial cost
o Does not include land
Servers is bigger cost
o More servers desirable
o Busy servers desirable
Improving Efficiency
Better components
Energy proportional (less use == less energy)
Power-saving modes
Transparent (e.g., clock-gating)
o Active (e.g., CPU throttling)
o Inactive (e.g., idle drives stop spinning)
Changing Workloads
Workloads more agile in nature
SaaS Shorter release cycles
Even major software gets rewritten
Office 365 updates several times per year
Some Google services update weekly
Google search engine re-written from scratch 4
Internet services are still young
Usage can be unpredictable
Started in 2005
Fifth most popular site within first year
Strike balance of need to deploy with longevity
Need it fast and good
Design to make software easy to create
Easier to find programmers
Redesign when warranted
Google Search’s rewrites removed inefficiencies
Contrast to Intel’s backwards compatibility spanning
Future Trends
● Continued emphasis on:
○ Networking, both within and to/from datacenters
○ Reliability via redundancy
○ Optimizing efficiency (energy proportionality)
Environmental impact
Energy costs
● Amdahl’s law will remain major factor
● Need increased focus on end-to-end
● Computing as a utility?
“Anyone can build a fast CPU. The trick is to build a fast
-Seymour Cray
“Anyone can build a fast CPU. The trick is to build a fast
-Seymour Cray

similar documents