Link to CALCM Home  

Breaking the Memory Wall!

Tuesday November 5, 2002
Hamerschlag Hall D-210
4:00 p.m.

Babak Falsafi
Babak Falsafi

Carnegie Mellon University

Increasing processor clock speeds along with microarchitectural innovation have led to a tremendous gap between processor and memory performance. While there has been tremendous progress made in bridging this performance gap in specialized applications (e.g., graphics) using custom streaming/vector architectures, less progress has been made in alleviating this bottleneck in general-purpose desktop/server systems. General-purpose computer system designers have primarily relied on cache memory hierarchies, where each cache level trades off faster lookup speed for larger capacity, to reduce the performance gap. Unfortunately, the effectiveness of cache hierarchies is reaching a point of diminishing returns especially in applications with adverse memory access patterns and large memory footprints -- e.g., commercial server workloads. The performance gap is especially exacerbated in multiprocessor servers, where sharing data may require traversing multiple cache hierarchies, and thousands of processor clock cycles. In this talk, I will first describe the memory bottleneck in modern desktop/server systems. I will then propose the PUMA (Proactively Uniform Memory Access) architecture we are developing at CMU, in which the memory system relies on prediction/speculation in hardware to hide or tolerate latency. PUMA enhances programmability of modern systems with deep cache hierarchies by presenting to software a memory system that appears to be flat with a uniform access latency. PUMA helps bridge the processor/memory performance gap by hiding latency when memory access patterns are repetitive albeit arbitrarily irregular. I will present preliminary results from software simulation indicating PUMA's potential.

Babak Falsafi joined the Electrical and Computer Engineering Department at CMU as an Assistant Professor in January 2001. Prior to joining CMU, he held a position as an Assistant Professor in the School of Electrical and Computer Engineering at Purdue University. His research interests include prediction and speculation in high-performance memory systems, power-aware processor and memory architectures, single-chip multi-processor/multi-threaded architectures, and analytic and simulation tools for computer system performance evaluation. He has made several contributions in the design of distributed shared-memory multiprocessors and memory systems, including a recent result indicating that hardware speculation can bridge the performance gap among memory consistency models, and an adaptive and scalable caching architecture, Reactive NUMA, that lays the foundation for a family of multiprocessors built by Sun Microsystems code-named WildFire. He is a recipient of an NSF CAREER award in 2000 and an IBM Faculty Partnership Award in 2001. You may contact him at



Department of Electrical and Computer EngineeringCarnegie Mellon UniversitySchool of Computer Science