Reliable Processors and Systems

From James Hoe

Jump to: navigation, search

This research investigates the impact of soft-error tolerance in future deep-submicron microprocessor designs. The study investigates different options to achieve the desired level of protection against soft errors. This research effort is in part supported by NSF through a CAREER Award. The TRUSS Project (Total Reliability Using Scalable Servers) develops a reliable, available, and serviceable (RAS) hardware platform based on a distributed cluster of commodity blade servers. The goal of the project is to leverage the cost-effectiveness of commodity processor and memory modules in a reliable server design that achieves both performance and cost scalability. This research effort is in part supported by NSF through an ITR Award and by Intel Corp. (Go to the TRUSS Project Page.)

  • OpenSPARC: An Open Platform for Hardware Reliability Experimentation. Ishwar Parulkar, Alan Wood, James C. Hoe, Babak Falsafi, Sarita V. Adve and Josep Torrellas. Fourth Workshop on Silicon Errors in Logic-System Effects (SELSE), April 2008. (pdf)
  • Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding. Jangwoo Kim, Nikos Hardavellas, Ken Mai, Babak Falsafi and James C. Hoe. ACM/IEEE International Symposium on Microarchitecture (MICRO), December 2007. (pdf)
  • PAI: A Lightweight Mechanism for Single-Node Memory Recovery in DSM Servers. Jangwoo Kim, Jared C. Smolens, Babak Falsafi and James C. Hoe. IEEE Pacific Rim International Symposium on Dependable Computing (PRDC), December 2007. (pdf)
  • Detecting Emerging Wearout Faults. Jared C. Smolens, Brian T. Gold, James C. Hoe, Babak Falsafi, and Ken Mai. The Third Workshop on Silicon Errors in Logic - System Effects (SELSE), April 2007. (pdf)
  • Reunion: Complexity-Effective Multicore Redundancy. Jared C. Smolens, Brian T. Gold, Babak Falsafi, and James C. Hoe. International Symposium on Microarchitecture (MICRO), December 2006.(pdf)
  • TRUSS: Reliable, Scalable Server Architecture. Brian T. Gold, Jared C. Smolens, Jangwoo Kim, Eric S. Chung, Vasileios Liaskovitis, Eriko Nurvitadhi, Babak Falsafi, James C. Hoe, and Andreas G. Nowatzyk. IEEE Micro, Volume 25, Number 6, November/December 2005. (pdf)
  • Understanding the Performance of Concurrent Error Detecting Superscalar Microarchitectures. Jared C. Smolens, Jangwoo Kim, James C. Hoe, and Babak Falsafi. Invited paper at IEEE Symposium on Signal Processing and Information Technology, December 2005. (pdf)
  • Fingerprinting: Bounding Soft-Error-Detection Latency and Bandwidth. Jared C. Smolens, Brian T. Gold, Jangwoo Kim, Babak Falsafi, James C. Hoe, and Andreas G. Nowatzyk. IEEE Micro, Volume 24, Number 6, November/December 2004. (pdf)
  • Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures. Jared C. Smolens, Jangwoo Kim, James C. Hoe, and Babak Falsafi. International Symposium on Microarchitecture (MICRO), November 2004. (pdf)
  • Fingerprinting: Bounding Soft-Error Detection Latency and Bandwidth. Jared C. Smolens, Brian T. Gold, Jangwoo Kim, Babak Falsafi, James C. Hoe, and Andreas G. Nowatzyk. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2004. (pdf)
  • Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery. Joydeep Ray, James C. Hoe and Babak Falsafi. International Symposium on Microarchitecture (MICRO), December 2001. (pdf)
Personal tools