the Memory Wall!
Tuesday November 5, 2002
Hamerschlag Hall D-210
Carnegie Mellon University
Increasing processor clock speeds along with microarchitectural innovation
have led to a tremendous gap between processor and memory performance.
While there has been tremendous progress made in bridging this performance
gap in specialized applications (e.g., graphics) using custom streaming/vector
architectures, less progress has been made in alleviating this bottleneck
in general-purpose desktop/server systems. General-purpose computer system
designers have primarily relied on cache memory hierarchies, where each
cache level trades off faster lookup speed for larger capacity, to reduce
the performance gap. Unfortunately, the effectiveness of cache hierarchies
is reaching a point of diminishing returns especially in applications
with adverse memory access patterns and large memory footprints -- e.g.,
commercial server workloads. The performance gap is especially exacerbated
in multiprocessor servers, where sharing data may require traversing multiple
cache hierarchies, and thousands of processor clock cycles. In this talk,
I will first describe the memory bottleneck in modern desktop/server systems.
I will then propose the PUMA (Proactively Uniform Memory Access) architecture
we are developing at CMU, in which the memory system relies on prediction/speculation
in hardware to hide or tolerate latency. PUMA enhances programmability
of modern systems with deep cache hierarchies by presenting to software
a memory system that appears to be flat with a uniform access latency.
PUMA helps bridge the processor/memory performance gap by hiding latency
when memory access patterns are repetitive albeit arbitrarily irregular.
I will present preliminary results from software simulation indicating
Babak Falsafi joined the Electrical and Computer Engineering Department
at CMU as an Assistant Professor in January 2001. Prior to joining CMU,
he held a position as an Assistant Professor in the School of Electrical
and Computer Engineering at Purdue University. His research interests
include prediction and speculation in high-performance memory systems,
power-aware processor and memory architectures, single-chip multi-processor/multi-threaded
architectures, and analytic and simulation tools for computer system performance
evaluation. He has made several contributions in the design of distributed
shared-memory multiprocessors and memory systems, including a recent result
indicating that hardware speculation can bridge the performance gap among
memory consistency models, and an adaptive and scalable caching architecture,
Reactive NUMA, that lays the foundation for a family of multiprocessors
built by Sun Microsystems code-named WildFire. He is a recipient of an
NSF CAREER award in 2000 and an IBM Faculty Partnership Award in 2001.
You may contact him at email@example.com.