Big Wins
With Small Application-aware Software-managed Caches
Tuesday March 7, 2006
Hamerschlag Hall D-210
4:30 pm
Julio
Lopez
Carnegie Mellon University
Large datasets, on the order of gigabytes to petabytes, are increasingly
common as abundant computational resources allow practitioners to
collect, produce and store data at higher rates. As dataset sizes grow,
it becomes more challenging to interactively manipulate and analyze
these datasets due to the large amounts of data that need to be moved
and processed.
Application-independent software-managed caches, such as operating
system page caches and database buffers, are present throughout the
memory hierarchy to reduce data access times and alleviate transfer
overheads.
We claim that an application-aware cache with relatively modest memory
requirements can effectively exploit dataset structure and application
information to speed access to large datasets. We demonstrate this idea
in the context of a system named the tree cache, to reduce query latency
up to an order of magnitude in the context of accesses to large octree
datasets used in earth sciences.
Julio is a Ph.D candidate in the Department of Electrical and Computer
Engineering at Carnegie Mellon. He is a member of the Computing Media
and Communications Laboratory (CMCL) and the Parallel Data Laboratory
(PDL).
His research interests lie at the intersection of Scientific Computing,
Databases and Storage Systems. His thesis work focuses on new efficient
query capabilities to large scientific datasets using compact data
representations and layouts. This is joint work with his adviser, Prof.
David O'Hallaron in the context of the CMU Quake project.
|