Link to CALCM Home  

MultiMap: Preserving disk locality for multidimensional datasets

Tuesday April 10, 2007
Hamerschlag Hall D-210
4:30 pm



Minglong Shao
Carnegie Mellon University

MultiMap is an algorithm for mapping multidimensional datasets so as to preserve the data's spatial locality on disks. Without revealing disk-specific details to applications, MultiMap exploits modern disk characteristics to provide full streaming bandwidth for one (primary) dimension and maximally efficient non-sequential access (i.e., minimal seek and no rotational latency) for the other dimensions.

This is in contrast to existing approaches, which either severely penalize non-primary dimensions or fail to provide full streaming bandwidth for any dimension. Experimental evaluation of a prototype implementation demonstrates MultiMap’s superior performance for range and beam queries. On average, MultiMap reduces total I/O time by over 50% when compared to traditional linearized layouts and by over 30% when compared to space-filling curve approaches such as Z-ordering and Hilbert curves. For scans of the primary dimension, MultiMap and traditional linearized layouts provide almost two orders of magnitude higher throughput than space-filling curve approaches.


Minglong Shao is a Ph.D. candidate in the Computer Science Department at Carnegie Mellon University, working with Prof. Anastassia Ailamaki. Her research interests focus on new data organization and management for database systems.

 

Department of Electrical and Computer EngineeringCarnegie Mellon UniversitySchool of Computer Science