18742: Reading List and Course Plan
(Required reading papers are indicated with a *)
Part I: Parallel Computer Architectures
Course Intro, Architecture Review, Amdahl's Law (1/17)
*Cramming More Components onto Integrated Circuits (AKA: Moore's Law)
*Parallel Architectures (AKA: Flynn's Taxonomy)
Parallel Architectures (1/19)
*The Case for a Single-chip Multiprocessor
Parallel Execution Strategies
Dataflow and Tiled Architectures (1/24)
*An Evaluation of the TRIPS computer system
Dataflow execution of sequential imperative programs on multicore architectures
Evaluation of the RAW Microprocessor: An Exposed Wire-delay Architecture for ILP and Streams
Throughput Computing (1/26)
*Larrabee: a many-core x86 architecture for visual computing
*Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Writing and Executing Parallel Programs
Lecture: Parallel programming overview (1/31)
*How to make a multiprocessor computer that correctly executes multiprocess programs
*Time, clocks and the ordering of events in a distributed system
Cache Coherence and Memory Consistency (2/9)
*Why On-chip Cache Coherence is here to stay
*Token Coherence: Decoupling Performance and Correctness
Memory consistency and event ordering in scalable shared-memory multiprocessors
Memory Consistency Models (2/14)
*Foundations of the C++ concurrency Memory Model
*x86-TSO: a rigorous and usable programmer’s model for x86 multiprocessors
Synchronization and Transaction Memory
Optimizing Synchronization (2/16)
*Speculative lock elision: enabling highly concurrent multithreaded execution
*Inferential queueing and speculative push for reducing critical communication latencies
Hardware Transactional Memory (2/21)
*Making the fast case common and the uncommon case simple in unbounded transactional memory
Hardware Transactional Memory Implementations (2/23)
*Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack
Software Transactional Memory (2/28)
*Software Transactional Memory
*Software Transactional Memory: Why is it only a research toy?
Synthesis Lectures on Transactional Memory (AKA: the TM Book)
Memory Consistency Enforcement Mechanisms
Data-race-free and Speculative Models (3/2)
*DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism
*Transactional Memory Coherence and Consistency
BulkSC: bulk enforcement of sequential consistency
SARC Coherence: Scaling Directory Cache Coherence in Performance and Power
Threads gone wild: Dealing with Concurrent Software Bugs
Lecture: Overview of Concurrency Bugs (3/7)
*Learning from mistakes: a comprehensive study on real world concurrency bug characteristics
Deterministic Execution (3/9)
*DMP: deterministic shared memory multiprocessing*Grace: safe multithreaded programming for C/C++
CoreDet: a compiler and runtime system for deterministic multithreaded execution
A type and effect system for deterministic parallel java
A "flight data recorder" for enabling full-system multiprocessor deterministic replay
Spring Break - No Class (3/14)
Spring Break - No Class (3/16)
Detecting and Avoiding Concurrency Bugs (3/21)
*AVIO: detecting atomicity violations via access interleaving invariants
*A Case for an interleaving constrained shared-memory multi-processor
Cooperative, Empirical Failure Avoidance for Multithreaded Programs
Finding Concurrency Bugs with Context-aware Communication Graphs
Flexible, Hardware Acceleration for Instruction-Grain Lifeguards
Atom-aid: detecting and surviving atomicity violations
Power and Energy
Energy Modeling, Profiling, Analysis (3/23)
*Power: A First-class Architectural Design Constraint
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis
Flicker: a dynamically adaptive architecture for power limited multicore systems
Thread Motion: fine-grained power management for multi-core systems
Dark Silicon: The beginning of the end (3/28)
*Amdahl's Law in the Multicore Era
*Dark Silicon and the End of Multicore Scaling
(*Skim) Design of Ion-Implanted MOSFET’S with Very Small Physical Dimensions (AKA: Dennard Scaling)
Power Challenges May End the Multicore Era
Part II: Heterogeneity, Specialization, and Acceleration
Fused and Composable Heterogeneous Cores (3/30)
*Core-fusion: accomodating software diversity in chip multiprocessors
*Composable, light-weight processors
CoreGenesis: erasing core boundaries for robust and configurable performance
Enabling Dynamic Heterogeneity Through Core-on-core Stacking
Dynamic heterogeneity and the need for multicore virtualization
Specialization
Accelerators for Everything (4/4)
*Conservation cores: reducing the energy of mature computations
*QsCores: Trading Dark Silicon for Scalable Energy with Quasi-specific Cores
CHARM: a composable, heterogeneous, accelerator-rich microprocessor
Instructor Travel - Guest Lecture by Mike Bond (4/6)
*DRFx: a simple and efficient memory model for concurrent programming languages
Hyper-optimized Application-specific Accelerators (4/11)
*Q100: The Architecture and Design of a Database Processing Unit
*EIE: efficient inference engine on compressed deep neural network
Reconfigurable Accelerators
Reconfigurable Accelerators (4/13)
*Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?
*A reconfigurable fabric for accelerating large-scale datacenter services (AKA: The Bing Paper)
Instructor Travel - No Class (4/18)
Carnival - No Class (4/20)
Reconfigurable Memory Systems (4/25)
*LEAP scratchpads: automatic memory and cache management for reconfigurable logic
*CoRAM: an in-fabric memory architecture for FPGA-based computing
Part IV: Emerging and alternative computing
Intermittent Computing
Lecture: Programming intermittent computers (4/27)
*Mementos: system support for long-running computation on RFID-scale devices
*A simpler, safer programming and execution model for intermittent systems
*Ambient Energy-harvesting nonvolatile processors: from circuit to system
An Energy-interference-free Hardware-Software Debugger for Intermittent Energy-harvesting Systems