Fault-Tolerant Communication in Networks-on-Chip
The network-on-chip (NoC) architecture proposes to connect multiple heterogeneous cores using an on-chip network instead of a shared bus, and requires network protocols with end-to-end reliability guarantees. The design of NoC protocols must revisit the core assumptions of large-scale networking: because high bandwidth is available and computational resources are scarce, NoC communication can utilize excess network capacity rather than implement sophisticated fault-tolerance schemes [ASP-DAC 2003]. We introduced the first pragmatic approach for fault-tolerant communication in NoC, stochastic communication, based on randomized gossip protocols. Stochastic communication provides sustainable throughput and gracefully degrading latency with up to 70% of network packets corrupted by soft errors [DATE 2003; VLSI Design 2007]. Stochastic communication advocated a fundamental paradigm shift from traditional chip-design approaches, which guarantee the correctness of devices and interconnects, by tolerating network-on-chip faults at the system level.
Publications
Journal Articles and Book Chapters
-
P. Bogdan, T. Dumitraş, R. Mărculescu
Stochastic Communication: A New Paradigm for Fault-Tolerant Networks-on-Chip
VLSI Design, special issue on Networks-on-Chip, Hindawi, 2007
- T. Dumitraş and R. Mărculescu
On-Chip Stochastic Communication
Embedded Software for SoC, A. Jerraya et al., eds., Kluwer, 2003
Conference Papers
-
T. Dumitraş, S. Kerner and R. Mărculescu
Enabling On-Chip Diversity through Architectural Communication Design
IEEE/ACM ASP-DAC, Jan. 2004
-
T. Dumitraş and R. Mărculescu
On-Chip Stochastic Communication
EDAA/IEEE/ACM DATE Conference, Mar. 2003
-
T. Dumitraş, S. Kerner and R. Mărculescu
Toward On-Chip Fault-Tolerant Communication
IEEE/ACM ASP-DAC, Jan. 2003
Best Paper Award
Theses
-
T. Dumitraş
On-Chip Stochastic Communication
MS Thesis, Carnegie Mellon University, May 2003