CALCM - Computer Architecture Lab at Carnegie Mellon


	Faculty Students Projects Seminar Reports Links Contacts

HASTE: Hybrid Architectures with a Single Transformable Executable

Tuesday January 28, 2003
Hamerschlag Hall 1112
4:30 p.m.

Benjamin A. Levine
Carnegie Mellon University

Reconfigurable computing architectures can implement custom datapaths and other logic with performance close to that of a comparable ASIC, combined with the ability to be quickly reprogrammed for new tasks like a conventional CPU. In particular, reconfigurable fabrics, which are composed of a large number of identical tiled units, have many advantages in terms of design reuse and efficiency, manufacturability, testing, and other areas that present particular challenges in current and future process technologies. However, reconfigurable fabrics are inefficient for applications with lots of branching or with irregular data access patterns. This has typically been solved by coupling a general purpose processor with a reconfigurable fabric. This sort of hybrid architecture has had some success, but has also raised new problems. The interface between the processor and the fabric is crucial to performance and difficult to implement well. Partitioning the application between the processor and fabric is a difficult task, typically complicated by entirely different programming models, heterogeneous interfaces to external resources, and incompatible representations of applications. It is arguable that these interface and partitioning problems, along with the requirement to produce and maintain two separate executables, have delayed the widespread acceptance of reconfigurable computing hardware.

A novel hybrid architecture and an associated representation of applications, called HASTE (Hybrid Architecture with a Single Transformable Executable) solves many of these difficulties. Using a technique we call "hardware compilation in hardware", a single executable can represent all parts of an application, both those that are control bound and better suited for a traditional processor, and those that have lots of parallelism and can be run efficiently on a reconfigurable fabric. This executable can execute in its entirety on the processor, but for best performance a hardware compilation unit maps suitable portions of the application onto a reconfigurable fabric at run-time. All portions of the executable have a valid sequential semantic, and thus a single programming model can be used for the entire application. The application representation is a key to this concept. It must be able to represent a sufficiently wide range of configurations so as to fully utilize the computing power of the fabric, while still having a valid sequential semantic. The overhead of representing the spatial information needed for the fabric cannot be so large as to make the sequential code slow and inefficient. Several different application representations have been examined, using both a conventional register instruction set architecture (ISA) and a queue ISA. The queue ISA targets a queue processor, which resembles a stack processor with the stack replaced by a queue. The queue ISA enables very simple hardware compilation, but can be quite inefficient in representing applications. Register ISAs have an efficient application representation, but require more complex compilation hardware. An ISA using a modified form of register addressing has hardware compilation nearly as simple as the queue ISA, while allowing for efficient application representation and a wide range of compatible fabric architectures.

Benjamin Levine received his BS in 1997 and his MS in 1999, both in Electrical Engineering, from the University of Tennessee, Knoxville. He is now completing his Ph.D. in Electrical and Computer Engineering at Carnegie Mellon University, where he is an IBM/SRC Graduate Fellow and is advised by Dr. Herman Schmit. His Ph.D. thesis explores HASTE, a novel reconfigurable processor architecture for high-performance embedded systems. He previously worked on the PipeRench reconfigurable architecture project at CMU and helped design, implement, and test the 3.7 million transistor PipeRench prototype.