Scheduling and Register Allocation for Inner Kernels
Tuesday September 19, 2006
Hamerschlag Hall D-210
Carnegie Mellon University
Intel performance libraries IPP and MKL contain a large number of highly optimized
functions for numeric computations. To attain peak performance, these functions rely on
handwritten assembly code in all inner loops (inner kernels). This approach is very
costly, since each inner kernel has to be tuned for every new processor architecture.
The goal of our work is to automate part of the developer effort in optimization of the
inner kernels. The main idea is to convert the problem of optimal scheduling across
multiple functional units into the integer linear programming (ILP), which can be solved by
a commercial off-the-shelf ILP solver.
We have extended previously published ILP formulation of scheduling with software
pipelining, register allocation, and register coldness constraints. We have built an ILP
model generator, which creates an input for the ILP solver from the user-specified loop
body. The resulting ILP problem can be solved for the minimal number of cycles, minimal
number of registers, or minimal number of registers under a given number of cycles.
Yevgen Voronenko is a graduate student at Carnegie Mellon University and the principal
engineer of the Spiral code generation system. His research interests include automatic
performance optimization, compiler design, and software architecture. He worked at Intel as a
research intern in summers of 2005 and 2006.