All of computing today relies on an abstraction where software expects the hardware to behave flawlessly for all inputs under all conditions. While the abstraction worked historically due to the relatively small magnitude of variations in hardware and environment, computing will increasingly be done with devices and circuits which are inherently stochastic or whose behavior is stochastic due to manufacturing and environmental uncertainties. Couple it with the fact that there is an unprecedented cost and power pressure on the computing devices of future, the cost of maintaining the abstraction of flawless hardware for such emerging circuits/devices will be prohibitive and we will need to fundamentally rethink the correctness contract between hardware and software.
In our group, we are exploring a vision of computing systems where a) hardware is allowed to produce errors that are exposed to the highest layers of software, and b) hardware and software is optimized to maximize power savings afforded by relaxed correctness. We call the under-designed processors that produce stochastically correct results even under nominal conditions, stochastic processors. In this talk, I will present two example methodologies for building processors that are optimized for non-zero error rates. In the first example, the processor is optimized for timing errors that are assumed to be detected/corrected using a hardware error resilience mechanism. In the second example, GPU is allowed to produce certain control and data errors that the error resilient GPU applications can tolerate. I will also discuss two example of building applications for stochastic processors. In the first example, applications are re-formulated as stochastic optimization problems that can tolerate numerical errors. In the second example, algorithmic techniques are used to derive approximate detection and correction schemes for sparse linear algebra problems. The significant power and reliability benefits in the different scenarios suggest that there may indeed be hope for software to save hardware when it comes to the power and the reliability problems of future.
Rakesh Kumar is an Assistant Professor in the Electrical and Computer Engineering Department at the University of Illinois at Urbana Champaign. He received a B.Tech. degree in Computer Science and Engineering from the Indian Institute of Technology (IIT), Kharagpur in 2001 and a Ph.D. degree in Computer Engineering from the University of California, San Diego in September 2006. Prior to moving to Champaign in 2007, he was a visiting researcher with Microsoft Research at Redmond. His past research on heterogeneous multi-core architecture and conjoined-core architectures has directly influenced processor products and roadmaps from several companies. He hopes to have a similar impact with his current research on error resilient computer systems and low power computer architectures for emerging workloads.
Back to the seminar page