Data Latency Rich Altmaier Software and Services Group 1 Software and Services Group CPU Architecture Contribution • Data intensive == memory latency bound – Minimal cache line use and reuse – Often pointer chasing – hard to prefetch 2 Software and Services Group CPU Architecture Contribution • Large Instruction cache – Capture a sophisticated code loop, esp database • Share last level cache across cores – Nehalem added this for I & D – When lacking, a copy per core of I, and data lock lines have to move between caches • Integrated Memory Controller – Big win for latency in Nehalem • QPI for socket to socket cache line movement – Introduced in Nehalem, faster than FSB 3 Software and Services Group CPU Architecture Contribution • Improvements in branch prediction – Successful prediction of more complex branching structures • Total number of outstanding cache line reads per socket – Improved in Nehalem – Exploited by Out of Order execution – Exploited by Hyper Threading (database benchmarks usually enable and win) – Opportunity to tune data structures for parallel reading 4 Software and Services Group System Architecture Contribution • • • • Larger physical memory Faster memory (lower latency) Faster I/O, and more ports, for data movement SSDs – big boost to IOPS (I/Os per second) – Filesystem read/write is usually small and scattered – No big sequential ops • Faster networking 5 Software and Services Group Summary • Large & shared cache • Latency reduction with Integrated Memory Controller, and QPI socket to socket • Total number of outstanding reads • Branch prediction • Storage configured for IOPS 6 Software and Services Group 7 Software and Services Group