Formal Vectorization of Digital Signal Processing Transforms ------------------------------------------------------------ Recently major vendors of general purpose microprocessors have included short vector SIMD (single instruction, multiple data) extensions into their instruction set architecture. Examples of SIMD extensions include Intel's MMX and SSE family, AMD's 3DNow! family as well Motorola's AltiVec extension, and last but not least IBM's "Double FPU" floating-point unit for BlueGene/L machines. SIMD extensions have the potential to speed up implementations in areas where the relevant algorithms exhibit fine grain parallelism but are a major challenge to software developers. In this talk we use a mathematical approach to generate high performance short vector SIMD code for digital signal processing (DSP) transforms such as the fast Fourier transform (FFT) and multidimensional discrete trigonometric transforms (DCT and DST). We extend Spiral (www.spiral.net) to support short vector extensions and thus enable automatic performance tuning for SIMD extensions. Spiral represents and generates fast DSP transform algorithms as mathematical formulas, and translates them into code. Adaptation is achieved by searching in the space of algorithmic and coding alternatives for the fastest implementation on the given platform. We explain the mathematical foundation that relates formula constructs to vector code, and overview the vector code extension of Spiral. Experimental results show excellent speed-ups compared to ordinary C code for a variety of transforms and computing platforms. For the DFT on Pentium 4, our automatically generated code compares to the hand-tuned Intel MKL vendor library.