Increasing on-chip cache sizes and the widespread use of shared caches in CMPs has revived cache management as a hot research topic in both industry and academia. This talk focuses on improving on-chip cache performance by describing the cache management problem in a novel framework called Re-Reference Interval Prediction (RRIP). The first part of the talk aims at improving the performance of the last-level cache (LLC). We propose RRIP to address the drawbacks of the commonly used LRU replacement policy. LRU replacement performs badly when the application working-set size is larger than the available cache or when applications have frequent bursts of references to non-temporal data (called scans). To improve the performance of such applications, we propose Static RRIP (SRRIP) and Dynamic RRIP (DRRIP). We show that SRRIP and DRRIP do not require changes to the existing cache design, have insignificant hardware overhead, and can easily be integrated into existing cache designs of modern high performance processors.
The next part of the talk focuses not just on improving LLC performance but also on improving the performance of a multi-level cache hierarchy. In par-ticular, we focus on improving the performance of an inclusive cache hierarchy. Inclusive caches are commonly used by microprocessors to simplify cache coherence. However, the trade-off has been lower performance compared to non-inclusive and exclusive caches. Contrary to conventional wisdom, we show that the limited performance of inclusive caches is due to inclusion victims–lines that are evicted from the core caches to satisfy the inclusion property–and not the reduced cache capacity of the hierarchy due to the duplication of data. These inclusion victims are incorrectly chosen for replacement because the LLC is unaware of the temporal locality of lines in the core caches. We propose Temporal Locality Aware (TLA) cache management policies to allow an inclusive LLC to be aware of the temporal locality of lines in the core caches. At no additional hardware cost, we show that TLA cache management significantly bridges the performance gap between inclusive and non-inclusive caches.
Joel Emer is an Intel Fellow and Director of Microarchitecture Research at Intel in Hudson, Massachusetts. Previously he worked at Compaq and Digital Equipment Corporation where he has held various research and advanced development positions investigating processor microarchitecture for a variety of VAX and Alpha processors and developing performance modeling and evaluation techniques. His research included pioneering efforts in simultaneous multithreading, analysis of the architectural impact of soft errors and early contributions to the now pervasive quantitative approach to processor evaluation. His current research interests include memory hierarchy design, processor reliability, reconfigurable logic-based computation and performance modeling. In his spare time, he serves as a part-time professor at MIT.
Emer holds over 25 patents and has published more than 35 papers. He received a bachelor's degree with highest honors in electrical engineering in 1974, and his master's degree in 1975 – both from Purdue University. Emer earned a doctorate in electrical engineering from the University of Illinois in 1979. He is a Fellow of both the ACM and the IEEE, and was the 2009 recipient of the Eckert-Mauchly award for lifetime contributions in computer architecture.