Link to CALCM Home  

Optimal Scheduling and Register Allocation for Inner Kernels

Tuesday September 19, 2006
Hamerschlag Hall D-210
4:30 pm

Yevgen Voronenko
Carnegie Mellon University

Intel performance libraries IPP and MKL contain a large number of highly optimized functions for numeric computations. To attain peak performance, these functions rely on handwritten assembly code in all inner loops (inner kernels). This approach is very costly, since each inner kernel has to be tuned for every new processor architecture.

The goal of our work is to automate part of the developer effort in optimization of the inner kernels. The main idea is to convert the problem of optimal scheduling across multiple functional units into the integer linear programming (ILP), which can be solved by a commercial off-the-shelf ILP solver.

We have extended previously published ILP formulation of scheduling with software pipelining, register allocation, and register coldness constraints. We have built an ILP model generator, which creates an input for the ILP solver from the user-specified loop body. The resulting ILP problem can be solved for the minimal number of cycles, minimal number of registers, or minimal number of registers under a given number of cycles.

Yevgen Voronenko is a graduate student at Carnegie Mellon University and the principal engineer of the Spiral code generation system. His research interests include automatic performance optimization, compiler design, and software architecture. He worked at Intel as a research intern in summers of 2005 and 2006.


Department of Electrical and Computer EngineeringCarnegie Mellon UniversitySchool of Computer Science