Tuesday Feb. 8, 2011
Hamerschlag Hall D-210
Carnegie Mellon University
As Chip Multiprocessors (CMPs) scale to tens or hundreds of nodes, the interconnect becomes a signiﬁcant factor in cost, energy consumption and performance. Recent work has explored many design tradeoffs for networks-on-chip (NoCs) with novel router architectures to reduce hardware cost. In particular, recent work proposes bufferless deﬂection routing to eliminate router buffers. The high cost of buffers makes this choice potentially appealing, especially for low-to-medium network loads. However, current bufferless designs usually add complexity to control logic. Deﬂection routing introduces a sequential dependence in port allocation, yielding a slow critical path. Explicit mechanisms are required for livelock freedom due to the non-minimal nature of deﬂection. Finally, deﬂection routing can fragment packets, and the reassembly buffers require large worst-case sizing to avoid deadlock, due to the lack of network backpressure. The complexity that arises out of these three problems has discouraged practical adoption of bufferless routing.
To counter this, we propose CHIPPER (Cheap-Interconnect Partially Permuting Router), a simpliﬁed router microarchitecture that eliminates in-router buffers and the crossbar. We introduce three key insights: ﬁrst, that deﬂection routing port allocation maps naturally to a permutation network within the router; second, that livelock freedom requires only an implicit token-passing scheme, eliminating expensive age-based priorities; and ﬁnally, that ﬂow control can provide correctness in the absence of network backpressure, avoiding deadlock and allowing cache miss buffers (MSHRs) to be used as reassembly buffers. Using multiprogrammed SPEC CPU2006, server, and desktop application workloads and SPLASH-2 multithreaded workloads, we achieve an average 54.9% network power reduction for 13.6% average performance degradation (multiprogrammed) and 73.4% power reduction for 1.9% slowdown (multithreaded), with minimal degradation and large power savings at low-to-medium load. Finally, we show 36.2% router area reduction relative to buffered routing, with comparable timing.
Chris Fallin is a second-year Ph.D. student in Electrical & Computer
Engineering at Carnegie Mellon University. He is advised by Dr. Onur
Mutlu and is a member of the SAFARI research group within CALCM
(Computer Architecture Laboratory at Carnegie Mellon). He studies
interconnect and memory system design in large CMPs. Chris is
currently supported by an NSF Graduate Research Fellowship. He
received a B.S. in Computer Engineering from the University of Notre
Dame in 2009.
Back to the seminar page