Logo

 

 

 

Goals

While memory capacity in recent years has increased commensurately with processor speeds, memory speeds have primarily lagged behind. This project proposes to bridge the processor/memory performance gap using a novel memory system that: (1) "proactively" moves and places data among the hierarchy levels in anticipation of a processor read, to hide the long read latency, and (2) overlaps processor write latency with other operations to avoid exposing the write latency on the critical path of execution. The key mechanisms enabling proactive memory are memory access predictors that monitor program execution, capture processor read access patterns, and accurately predict subsequent reads, and efficient techniques for processor state checkpointing and recovery to allow relaxing memory order and overlap the write latency.

Click here for the PUMA2 Talk @ UW, Seattle on 6/6/2002

Current contributions:

Relaxing Memory Order Using Transactional Execution

This work proposes to wait-free implementations of memory consistency models in which efficient hardware for checkpointing and recovery allows for relaxing the memory order dynamically, obviating the need for waiting for store acknowledgements or memory fence/barrier instructions in the common case where there are no races among processors to enter a critical section. A wait-free sequentially-consistent system can improve performance over a conventional release-consistent system.

Dead-Block Predictors & Dead-Block Correlating Prefetchers

This work proposes instruction-trace-based predictors that track repetitive instruction sequences from a cache fill to an eviction to: (1) predict the eviction early and replace the current block, and (2) subsequently fetch a to-be-reference block. These predictors replace/fetch data orders of magnitude in latency in advance of a processor reference to virtually hide all of memory access latency.

Self-Invalidation Using Last-Touch Prediction

This work proposes instruction-trace based predictors that track repetitive instruction sequences in shared-memory multiprocessor to predict and self-invalidate a shared cache block early. Early self-invalidation improves performance by turning a three-hop coherence protocol transaction between a producer and a consumer to a 2-hop transaction.

Memory Sharing Predictors

This work proposes sharing signature based predictors that track repetitive memory sharing patterns in a shared-memory multiprocessor.