Overview of Storage and IT

Report
Storage Overview
and IT-DM Lessons Learned
Luca Canali, IT-DM
DM Group Meeting
10-3-2009
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Outline
• Goal: review of storage technology
– HW layer (HDs, storage array)
– Interconnect (how to attach storage to the
server)
– Service layer (filesystems)
• Expose current hot topics in storage
– Identify challenges
– Stimulate ideas for management of large data
volumes
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Why storage is a very interesting
area in the coming years
• Storage market is very conservative
– Few vendors share market for large enterprise
solutions
– Enterprise storage has typically a high premium
• Opportunities
– Commodity HW/grid-like solutions provide order
of magnitude gain in cost/performance
– New products coming to the market promise
many changes:
– Solid state disks, high capacity disks, high
performance and low cost interconnects
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
HW layer – HD, the basic element
• Hard disk technology
– Basic block of storage since 40 years
– Main intrinsic limitation: latency
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
HD specs
• HDs are limited
– In particular seek time is unavoidable (7.2k to 15k
rpm, ~2-10 ms)
– 100-200 IOPS
– Throughput ~100MB/s, typically limited by
interface
– Capacity range 300GB -2TB
– Failures: mechanical, electric, magnetic, firmware
issues. MTBF: 500k -1.5M hours
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Enterprise disks
• Performance
– enterprise disks offer more performance:
– They spin faster and have better interconnect
protocols (e.g. SAS vs SATA)
– Typically of low capacity
– Our experience: often not competitive in cost/perf
vs. SATA
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
HD failure rates
• Failure rate
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
– Our experience: it depends on: vendor,
temperature, infant mortality, age.
– At FAST’07 2 papers (one from Google) showed
that vendor specs often need to be ‘adjusted’ in
real life.
– Google data seriously questioned usefulness of
SMART probes and correlation of
temperature/age/usage with MTBF.
– Other study showed that consumer and enterprise
disks have similar failure pattern and life time.
Moreover HD failures in RAID sets have
correlations.
HD wrap-up
• HD is a old but evergreen technology
– In particular disk capacities have increased of one
order of magnitude in just a few years
– At the same time prices have gone down (below
0.1 USD per GB for consumer products)
– 1.5 TB consumer disks, and 450GB enterprise
disks are common
– 2.5’’ drives are becoming standard to reduce
power consumption
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Scaling out the disk
• The challenge for storage systems
–
–
–
–
–
Scale out the disk performance to meet demands
Throughput
IOPS
Latency
Capacity
• Sizing storage systems
– Must focus on critical metric(s)
– Avoid ‘capacity trap’
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
RAID and redundancy
• Storage arrays are the traditional approach
– implement RAID to protect data.
– Parity based: RAID5, RAID6
– Stripe and mirror: RAID10
• Scalability problem of this method
– For very large configurations MTBF ~ RAID
rebuild time (!)
– Challenge: RAID does not scale
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Beyond RAID
• Google and Amazon don’t use RAID
• Main idea:
–
–
–
–
Divide data in ‘chunks’
Write multiple copies of the chunks
Google file system: writes chunks in 3 copies
Amazon S3: write copies at different destinations,
i.e. data center mirroring
• Additional advantages:
– Removes the constraint of locally storing
redundancy inside one storage arrays
– Can move, refresh, or relocate data chunks easily
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Our experience
• Physics DB storage uses ASM
– Volume manager and cluster file system
integrated with Oracle
– Soon to be also a general-purpose cluster file
system (11gR2 beta testing)
– Oracle files are divided in chunks
– Chunks are distributed evenly across storage
– Chunks are written in multiple copies (2 or 3 it
depends on file type and configuration)
– Allows the use of low-cost storage arrays: does
not need RAID support
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Scalable and distributed file systems
on commodity HW
• Allow to manage and protect large volumes
of data
• Solutions proven by Google and Amazon,
Sun’s ZFS, Oracle’s ASM
• Can provide order of magnitude savings on
HW acquisition
• Additional scale savings by deployment of
cloud and virtualization models
• Challenge: solid and scalable distributed file
systems are hard to build
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
The interconnect
• Several technologies available
–
–
–
–
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
SAN
NAS
iSCSI
Direct attach
The interconnect
• Throughput challenge
– It takes 3 hours to copy/backup 1TB over 1
GBPS network
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
IP based connectivity
• NAS, iSCSI suffer from poor performance of
Gbps Ethernet
• 10 Gbps may/will(?) change the picture
• At present not widely deployed on servers
because of cost
• Moreover TCP/IP has CPU overhead
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Specialized storage networks
• SAN is the de facto standard for most
enterprise level storage
• Fast, low overhead on server CPU, easy to
configure
• Our experience (and Tier1s): SAN networks
with max 64 ports at low cost
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
– Measured: 8 Gbps transfer rate (4+4 dual ported
HBAs for redundancy and load balancing)
– Proof of concept FC backup (LAN free) reached
full utilization of tape heads
– Scalable: proof of concept ‘Oracle supercluster’ of
410 SATA disks, and 14 dual quadcore servers
NAS
• CERN’s experience of NAS for databases
• Netapp filer can use several protocols, the
main being NFS
– Throughput limitation because of TCP/IP
– Trunking is possible to alleviate the problem, main
solution may/will(?) be to move to 10Gbps
• The filer contains a server with CPU and OS
– In particular the proprietary WAFL filesystem is
capable of creating read-only snapshots
– Proprietary Data ONTAP OS runs on the filer box
– Additional features make worse cost/performance
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
iSCSI
• iSCSI is interesting for cost reduction
• Many concerns on performance though, due
to IP interconnect
– Adoption seems to be only for low-end systems at
the moment
• Our experience:
– IT-FIO is acquiring some test units, we have been
announced that some test HW will be available for
IT-DM databases
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
The quest for ultimate latency
reduction
• Solid state disks provide unique specs
– Seek time are at least one order of magnitude
better than best HDs
– A single disk can provide >10k random read IOPS
– High read throughput
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
SSD (flash) problems
• Flash based SSD still suffer from major
problems for enterprise solutions
–
–
–
–
–
Cost/GB: more than 10 times vs. ‘normal HDs’
Small capacity compared to HDs
They have several issues with write performance
Limited number of erase cycles
Need to write entire cells (issue for transactional
activities)
– Some workarounds for write performance and cell
lifetime improvements are being implemented,
different quality from different vendors and grade
– A field in rapid evolution
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Conclusions
• Storage technologies are in a very
interesting evolution phase
– On one side ‘old-fashioned storage technologies’
give more capacity and performance for a lower
price every year
– New technologies are emerging for scaling out
very large data sets (see Google, Amazon,
Oracle’s ASM, SUN’s ZFS)
– 10 Gbps Ethernet and SSD have the potential to
change storage in the coming years (but are not
mature yet)
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Acknowledgments
• Many thanks to Jacek, Dawid and Maria
• Eric and Nilo
• Helge, Tim Bell and Bernd
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

similar documents