Link to CALCM Home  

Big Wins With Small Application-aware Software-managed Caches

Tuesday March 7, 2006
Hamerschlag Hall D-210
4:30 pm

Julio Lopez
Carnegie Mellon University

Large datasets, on the order of gigabytes to petabytes, are increasingly common as abundant computational resources allow practitioners to collect, produce and store data at higher rates. As dataset sizes grow, it becomes more challenging to interactively manipulate and analyze these datasets due to the large amounts of data that need to be moved and processed.

Application-independent software-managed caches, such as operating system page caches and database buffers, are present throughout the memory hierarchy to reduce data access times and alleviate transfer overheads.

We claim that an application-aware cache with relatively modest memory requirements can effectively exploit dataset structure and application information to speed access to large datasets. We demonstrate this idea in the context of a system named the tree cache, to reduce query latency up to an order of magnitude in the context of accesses to large octree datasets used in earth sciences.

Julio is a Ph.D candidate in the Department of Electrical and Computer Engineering at Carnegie Mellon. He is a member of the Computing Media and Communications Laboratory (CMCL) and the Parallel Data Laboratory (PDL).

His research interests lie at the intersection of Scientific Computing, Databases and Storage Systems. His thesis work focuses on new efficient query capabilities to large scientific datasets using compact data representations and layouts. This is joint work with his adviser, Prof. David O'Hallaron in the context of the CMU Quake project.


Department of Electrical and Computer EngineeringCarnegie Mellon UniversitySchool of Computer Science