Loop Merging for Signal Transforms
Tuesday May 3, 2005
Hamerschlag Hall D-210
Carnegie Mellon University
A critical optimization in the domain of linear signal transforms,
such as the discrete Fourier transform (DFT), is loop merging, which
increases data locality and reuse and thus performance. In particular,
this includes the conversion of shuffle operations into array reindexings.
To date, loop merging is well understood only for the DFT, and only
for Cooley-Tukey FFT based algorithms, which excludes DFT sizes
divisible by large primes. In this talk, we present a formal loop
merging framework for general signal transforms and its implementation
within the SPIRAL code generator. The framework consists of Sigma-SPL,
a mathematical language to express loops and index mappings; a rewriting
system to merge loops in Sigma-SPL; and a compiler that translates
Sigma-SPL into code. We apply the framework to DFT sizes that cannot
be handled using only the Cooley-Tukey FFT and compare our method
to FFTW 3.0.1 and the vendor library Intel MKL 7.2.1. Compared to
FFTW our generated code is a factor of 2-4 faster under equal implementation
conditions (same algorithms, same unrolling threshold). Further,
we give a detailed comparison against the Intel vendor library MKL;
our generated code is between 2 times faster and 4.5 times slower.
Yevgen Voronenko is an ECE Ph.D. candidate at Carnegie Mellon,
where his research interests include code generation, domain-specific
compilation, computer algebra, rewriting systems and software engineering.
He holds a B.S. degree in Computer Science from Drexel University.
He is currently working on the SPIRAL project, that aims to automate
code generation and platform adaptation for linear signal transforms,
that include the discrete Fourier transform, discrete cosine transforms,
and many other transforms.