January 17, 2006
"TRUSS: A Reliable, Scalable Server Architecture" was published in the November/December 2005 special issue of IEEE Micro, Reliability-Aware Microarchitecture. The paper describes the Total Reliability Using Scalable Servers (TRUSS) project and how it will address reliability in future computer systems. Co-authors of the article are ECE graduate students Brian Gold, Jangwoo Kim, Jared Smolens, Eric Chung, Vasileios Liaskovitis, and Eriko Nurvitadhi, and faculty Babak Falsafi, James Hoe, and Andreas Nowatzyk. A committee of industrial researchers and experts chose only six manuscripts on the reliability-aware microarchitecture theme for this edition of the computer hardware periodical.
As information processing and storage have become a key pillar of a modern society's infrastructure, server availability and reliability are now critical aspects of computing systems. Unfortunately, while availability and reliability are becoming increasingly crucial, it is also ever more challenging to design, manufacture, and market reliable server platforms due to technology-scaling trends. TRUSS is an industry and government funded research project designing scalable non-stop computer systems.
The key premise behind TRUSS is to use inexpensive hardware building blocks--e.g., Intel-based computer blades--redundantly to build systems that have no single point of failure while scaling both in cost and performance. Today's reliable mainframes (e.g., IBM z900 or HP Himalayas) suffer from huge price premiums, scale only to a small number of processors, and have many single points of failure (e.g., a monolithic memory controller). In short, TRUSS plans to lower the cost and improve the reliability of server computing just as the Redundant Arrays of Independent Disks (RAID) project did for disk storage.
TRUSS research is conducted in the Computer Architecture Lab at Carnegie Mellon (CALCM). Last year, another paper within the TRUSS project was published in IEEE Micro's year-end issue, Micro's Top Picks from Computer Architecture Conferences, released in November/December. The work was also presented at the ACM/IEEE International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).