The historical improvements in the performance of general-purpose processors have long provided opportunities for application innovation. Word processing, spreadsheets, desktop publishing, networking and various game genres are just some of the many applications that have arisen because of the increasing capabilities and the versatility of general-purpose processors. Key to these innovations is the fact that general-purpose processors do not predefine the applications that they are going to run.
The constantly improving performance of conventional single core processors, aided significantly by Moore's law, has fueled these application innovations. However, as we start to see a slowing of Moore's law we are seeing alternative approaches for improving performance. This has included using programable accelerators, such as GPUs. It has also included increasing use of dedicated accelerators. Unfortunately, while this improves performance it sacrifices generality. More specifically, the time, difficulty and cost of special purpose design preclude dedicated logic from serving as a viable avenue for application innovation.
There recently has been interest in addressing this dilemma between providing programmability and higher performance via an interesting middle ground between fully general-purpose computing and dedicated logic. In specific, spatial computing uses arrays of small programmable processing elements (PEs) that operate in dataflow fashion to provide a very efficient execution engine. We have been exploring the possibilities for spatial computing as an ingredient of general-purpose computation, and we will discuss the instrisic operational characteristics of such an approach.
In this talk, we will also describe a specific architecture that uses the twin ideas of triggered instructions and latency-insensitive channels to create an efficient control and communication environment for a spatial processor. Triggered instructions completely eliminate the program counter and allow programs to transition concisely between states without explicit branch instructions. Latency-insensitive channels allow efficient communication of inter-PE control information, while simultaneously enabling flexible code placement and improving tolerance for variable events, such as cache accesses. Our analysis shows that a spatial accelerator using triggered instructions can achieve over 8X greater area-normalized performance than a traditional general-purpose processor.
Dr. Joel S. Emer is a Senior Distinguished Research Scientist in Nvidia's Architecture Research group. He is responsible for exploration of future architectures as well as modeling and analysis methodologies. In his spare time, he is a Professor of the Practice at MIT, where he teaches computer architecture and supervises graduate students. Prior to joining Nvidia he worked at Intel where he was an Intel Fellow and Director of Microarchitecture Research. Even earlier, he worked at Compaq and Digital Equipment Corporation.
Dr. Emer has held various research and advanced development positions investigating processor microarchitecture and developing performance modeling and evaluation techniques. He has made architectural contributions to a number of VAX, Alpha and X86 processors and is recognized as one of the developers of the widely employed quantitative approach to processor performance evaluation. More recently, he has been recognized for his contributions in the advancement of simultaneous multithreading technology, processor reliability analysis, cache organization and spatial architectures.
Dr. Emer received a bachelor's degree with highest honors in electrical engineering in 1974, and his master's degree in 1975 – both from Purdue University. He earned a doctorate in electrical engineering from the University of Illinois in 1979. He has received numerous public recognitions, including being named a Fellow of both the ACM and IEEE, and he was the 2009 recipient of the Eckert-Mauchly award for lifetime contributions in computer architecture.