Slide 1

Data Latency
Rich Altmaier
Software and Services Group
Software and Services Group
CPU Architecture Contribution
• Data intensive == memory latency bound
– Minimal cache line use and reuse
– Often pointer chasing – hard to prefetch
Software and Services Group
CPU Architecture Contribution
• Large Instruction cache
– Capture a sophisticated code loop, esp database
• Share last level cache across cores
– Nehalem added this for I & D
– When lacking, a copy per core of I, and data lock lines
have to move between caches
• Integrated Memory Controller
– Big win for latency in Nehalem
• QPI for socket to socket cache line movement
– Introduced in Nehalem, faster than FSB
Software and Services Group
CPU Architecture Contribution
• Improvements in branch prediction
– Successful prediction of more complex branching
• Total number of outstanding cache line reads per
– Improved in Nehalem
– Exploited by Out of Order execution
– Exploited by Hyper Threading (database benchmarks
usually enable and win)
– Opportunity to tune data structures for parallel reading
Software and Services Group
System Architecture Contribution
Larger physical memory
Faster memory (lower latency)
Faster I/O, and more ports, for data movement
SSDs – big boost to IOPS (I/Os per second)
– Filesystem read/write is usually small and scattered
– No big sequential ops
• Faster networking
Software and Services Group
• Large & shared cache
• Latency reduction with Integrated Memory
Controller, and QPI socket to socket
• Total number of outstanding reads
• Branch prediction
• Storage configured for IOPS
Software and Services Group
Software and Services Group

similar documents