Bounding Soft-Error Detection Latency and Bandwidth
Tuesday October 5, 2004
Hamerschlag Hall D-210
This is a practice talk for the upcoming ASPLOS XI conference
in Boston, MA, October 9-13.
Carnegie Mellon University
Recent studies have suggested that the soft-error rate in microprocessor
logic will become a reliability concern by 2010. This paper proposes
an efficient error detection technique, called fingerprinting, that detects
differences in execution across a dual modular redundant (DMR) processor
pair. Fingerprinting summarizes a processor's execution history in a
hash-based signature; differences between two mirrored processors are
exposed by comparing their fingerprints. Fingerprinting tightly bounds
detection latency and greatly reduces the interprocessor communication
bandwidth required for checking. This paper presents a study that evaluates
fingerprinting against a range of current approaches to error detection.
The result of this study shows that fingerprinting is the only error
detection mechanism that simultaneously allows high-error coverage, low
error detection bandwidth, and high I/O performance.
Jared Smolens is a third year PhD student in the Computer Architecture
Laboratory at Carnegie Mellon, where he is advised by Prof. James Hoe.
His time is primarily devoted to the TRUSS project. His research interests
include multiprocessor and microprocessor architecture, fault tolerance,
and performance modeling.