Link to CALCM Home  

Redesigning Database Hash Join Algorithms in Light of CPU Cache Prefetching

Tuesday February 14, 2006
Hamerschlag Hall D-210
4:00 pm

Shimin Chen
Intel Research Pittsburgh

The latency gap between CPU caches and main memories has become a major performance problem in database systems. Recent studies have demonstrated that on commercial databases, about 50% or more of execution time in memory is often wasted due to cache misses. In light of this problem, a number of recent studies focused on reducing the number of cache misses of database algorithms. In this talk, I investigate a different approach: reducing the impact of cache misses through a technique called CPU cache prefetching.

Specifically, I will focus on one of the core database algorithms, the hash join algorithm, and re-examine its design in light of CPU cache prefetching. One of the major challenges here is to prefetch for hash table accesses, which are random in nature. Our solutions, Group Prefetching and Software-Pipelined Prefetching, exploit the inter-tuple parallelism to overlap cache misses across the processing of multiple tuples. Moreover, we exploit the two-pass structure of the hash join algorithm to learn from the earlier pass for even better performance of the later pass. Our experimental results demonstrate dramatic performance benefits of our cache prefetching enabled techniques compared to state-of-the-art cache-friendly algorithms for hash joins.

Shimin Chen obtained his PhD under the supervision of Prof. Todd C. Mowry and Prof. Anastassia Ailamaki in Computer Science from Carnegie Mellon University in December 2005. He joined Intel Research Pittsburgh after graduation. His research interests are database systems, computer architectures, and software systems. He received his master's degree in 1999 and bachelor's degree in 1997 both from Tsinghua University in China.


Department of Electrical and Computer EngineeringCarnegie Mellon UniversitySchool of Computer Science