Preserving disk locality for multidimensional datasets
Tuesday April 10, 2007
Hamerschlag Hall D-210
Carnegie Mellon University
MultiMap is an algorithm for mapping multidimensional datasets so as to preserve the
data's spatial locality on disks. Without revealing disk-specific details to
applications, MultiMap exploits modern disk characteristics to provide full streaming
bandwidth for one (primary) dimension and maximally efficient non-sequential access
(i.e., minimal seek and no rotational latency) for the other dimensions.
This is in contrast to existing approaches, which either severely penalize
non-primary dimensions or fail to provide full streaming bandwidth for any dimension.
Experimental evaluation of a prototype implementation demonstrates MultiMap’s
superior performance for range and beam queries. On average, MultiMap reduces total
I/O time by over 50% when compared to traditional linearized layouts and by over 30%
when compared to space-filling curve approaches such as Z-ordering and Hilbert
curves. For scans of the primary dimension, MultiMap and traditional linearized
layouts provide almost two orders of magnitude higher throughput than space-filling
Minglong Shao is a Ph.D. candidate in the Computer Science Department at Carnegie Mellon
University, working with Prof. Anastassia Ailamaki. Her research interests focus on
new data organization and management for database systems.