This is an old revision of the document!


A Communication-driven Approach to Designing Flexible DNN Accelerators

Abstract

Deep neural networks (DNN) have demonstrated highly promising results across computer vision and speech recognition, and are becoming foundational for ubiquitous AI. The high computational demands of DNNs coupled with their pervasiveness across both cloud and IoT platforms has led to a rise in specialized DNN accelerators (e.g., Google TPU, Apple Neural Engine, MIT Eyeriss, and so on). Most of these accelerators are spatial in nature, i.e., they are built with hundreds of processing elements (PE) to provide high throughput, and rely on local data forwarding and reuse to provide energy-efficiency (due to reduced memory accesses). Keeping the PEs fully utilized is imperative to the performance and energy-efficiency of these accelerators.

Most DNN accelerator datapaths today are highly optimized for regular dataflows, such as those emanating from dense matrix multiplications in convolutional layers of the DNN. However, continued innovations in DNN structures (myriad layer types, cross-layer fusion, sparsity, and so on) lead to irregular data flow patterns within the accelerator substrate that current architectures are ill equipped to handle due to rigid tightly-coupled connections between the PEs and buffers. This leads to under-utilization and stalls.

In this talk, we will present MAERI* (Multiply-Accumulate Engine with Reconfigurable Interconnects), a modular design-flow for architecting DNN accelerators that are future-proofed against arbitrary irregular dataflows. MAERI’s unique approach is to provide a highly flexible interconnect between on-chip compute and memory resources. We augment each compute and memory unit with tiny-switches that are connected together via novel tree-bases topologies that provide full non-blocking bandwidth between on-chip SRAM and the compute blocks. A reconfiguration controller enables on-demand allocation of arbitrary sized pools of multipliers and adders as required by the current data flow. MAERI provides 8-459% better utilization across multiple data flow mappings over state-of-the-art baselines, leading to 42% runtime and 28% energy reduction, with modest area and power overheads. MAERI is written in synthesizable Bluespec System Verilog, and will be released at ISCA 2018 for open-source download and use.(http://synergy.ece.gatech.edu/maeri)

Bio

Hyoukjun Kwon is a Ph.D. student at School of Computer Science, Georgia Institute of Technology. He received B.S. at Seoul National University in computer science and engineering and environmental material science. His research interest includes computer architecture, network-on-chip, and spatial accelerators for deep learning and graph applications.​