====== Readings ======

Post your paper reviews for the papers marked **[Review Required]** on the [[http://sourcery.cmcl.cs.cmu.edu:3500/collections/show/50|paper review site]]. The expectations for these reviews are described in module 0. You can view reviews by other students only after the deadline. 

===== Module 0-1 =====
**Mentioned during module:**
  * {{moscibroda.pdf|Moscibroda, T., & Mutlu, O. (2007). Memory performance attacks: denial of memory service in multi-core systems. Proceedings of 16th USENIX Security Symposium.}}

===== Module 0-3 =====
**Required:**
  * {{reviewing-smith.pdf|Smith, "The Task of the Referee," IEEE Computer 1990.}}

**Mentioned during module:**
  * {{http://www.cs.utexas.edu/users/mckinley/notes/reviewing.html|Hill and McKinley, "Notes on Constructive and Positive Reviewing".}}
  * {{writing-papers.pdf|Levin and Redell, "How (and how not) to write a good systems paper," OSR 1983.}}
  * {{http://research.microsoft.com/en-us/um/people/simonpj/papers/giving-a-talk/writing-a-paper-slides.pdf|Jones, "How to Write a Great Research Paper".}}

===== Module 0-4 =====
**Required:**
  * {{http://www.cs.virginia.edu/~robins/YouAndYourResearch.html|Hamming, "You and Your Research," Bell Communications Research Colloquium Seminar, 7 March 1986.}} **[Review Required]**
  * {{memory-scaling_memcon13.pdf|Mutlu, "Memory Scaling: A Systems Architecture Perspective," Technical talk at MemCon 2013 (MEMCON), Santa Clara, CA, August 2013.}} **[Review Required]**
  * {{r0_patt.pdf|Patt, "Requirements, bottlenecks, and good fortune: agents for microprocessor evolution," Proceedings of the IEEE, vol. 89, no. 11, 2001.}} **[Review Required]**

**Mentioned during module:**
  * {{r1_amdahl.pdf|Amdahl "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS Conference, April 1967.}}
  * {{r1_moore.pdf|G. E. Moore, "Cramming more components onto integrated circuits," Electronics, April 1965.}}
  * {{r0_ronen.pdf|Ronen et al., "Coming Challenges in Microarchitecture and Architecture," Proceedings of the IEEE, vol. 89, no. 11, 2001.}}

===== Module 0-5 =====
**Mentioned during module:**
  * {{rmeta_fong.pdf|Fong, "How to Write a CS Research Paper: A Bibliography".}}


===== Module 1-1 =====
**Required:**
  * {{:reading_hill_551_560.pdf|Hill, Jouppi, Sohi, “Multiprocessors and Multicomputers,” pp. 551-560 in Readings in Computer Architecture.}}
  * {{:reading_hill_309_314.pdf|Hill, Jouppi, Sohi, “Dataflow and Multithreading,” pp. 309-314 in Readings in Computer Architecture.}}
  * {{:suleman09-acs.pdf|Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures,” ASPLOS 2009.}} **[Review Required] due 9/18**
  * {{:joao12-bottleneck.pdf|Joao et al., “Bottleneck Identification and Scheduling in Multithreaded Applications,” ASPLOS 2012.}}
  * {{:amdahl67_singleproc.pdf|Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” AFIPS 1967.}} **[Review Required] due 9/20**

**Mentioned during module:**
  * {{:flynn66_computing.pdf|Mike Flynn, “Very High-Speed Computing Systems,” Proc. of IEEE, 1966.}}
  * {{:thornton_cdc6600.pdf|Thornton, “CDC 6600: Design of a Computer,” 1970.}}
  * {{:annavaram05_amdahl.pdf|Annavaram et al., “Mitigating Amdahl’s Law Through EPI Throttling,” ISCA 2005.}}
  * {{:ipek07-fusion.pdf|Ipek et al., “Core Fusion: Accommodating Software Diversity in Chip Multiprocessors,” ISCA 2007.}}
  * {{:hill08_amdahl.pdf|Hill and Marty, “Amdahl’s Law in the Multi-Core Era,” IEEE Computer 2008.}}
  * {{:eyerman_critsectamdahl.pdf|Eyerman and Eeckhout, “Modeling critical sections in Amdahl's law and its implications for multicore design,” ISCA 2010.}}
  * {{:suleman_feedback.pdf|Suleman et al., “Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs,” ASPLOS 2008.}}
===== Module 1-2 =====
**Mentioned during module:**
  * {{:kumar07-carbon.pdf|Kumar et al., “Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors,” ISCA 2007.}}

===== Module 1-3 =====
**Mentioned during module:**

  * {{:joao12-bottleneck.pdf|Joao et al., “Bottleneck Identification and Scheduling in Multithreaded Applications,” ASPLOS 2012.}}
  * {{:r1_moore.pdf|Moore, "Cramming more components onto integrated circuits," Electronics, 1965.}}
  * {{:olukutun96_cmp.pdf|Olukotun et al., "The Case for a Single-Chip Multiprocessor," ASPLOS 1996.}}
  * {{:tullsen_simulmthd95.pdf|Tullsen et al., “Simultaneous Multithreading: Maximizing On-Chip Parallelism,” ISCA 1995.}}
  * {{:kessler99-alpha21264.pdf|Kessler, "The Alpha 21264 Microprocessor," IEEE Micro 1999.}}
  * {{:tr-hps-2005-003.pdf|Brown, “Reducing Critical Path Execution Time by Breaking Critical Loops,” UT-Austin 2005.}}
  * {{:palacharla97-complexity.pdf|Palacharla et al., "Complexity-effective superscalar processors," ISCA 1997.}}

===== Module 1-4 =====
**Mentioned during module:**

  * {{:grochowski_latthrough.pdf|Grochowski et al., "Best of both Latency and Throughput," ICCD 2004.}}
  * {{:barroso00_piranha.pdf|Barroso et al., "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing," ISCA 2000.}}
  * {{:barroso98-workloads.pdf|Barroso et al., "Memory system characterization of commercial workloads," ISCA 1998.}}
  * {{:ranganathan98-workloads.pdf|Ranganathan et al., "Performance of database workloads on shared-memory systems with out-of-order processors," ASPLOS 1998.}}
  * {{:kongetira05_niagara.pdf|Kongetira et al., “Niagara: A 32-Way Multithreaded SPARC Processor,” IEEE Micro 2005.}}
  * {{:chaudhry_rock.pdf|Chaudhry et al., “Rock: A High-Performance Sparc CMT Processor,” IEEE Micro, 2009.}}
  * {{:chaudhry_specthread.pdf|Chaudhry et al., “Simultaneous Speculative Threading: A Novel Pipeline Architecture Implemented in Sun's ROCK Processor,” ISCA 2009.}}
  * {{:mutlu_runahead.pdf|Mutlu et al., “Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors,” HPCA 2003.}}
  * {{:mutlu06_efficient.pdf|Mutlu et al., “Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance,” IEEE Micro Jan/Feb 2006.}}
  * {{:tendler_power4.pdf|Tendler et al., "POWER4 system microarchitecture," IBM J R&D, 2002.}}
  * {{:kalla04_power5.pdf|Kalla et al., "IBM Power5 Chip: A Dual-Core Multithreaded Processor," IEEE Micro 2004.}}
  * {{:le_power6.pdf|Le et al., "IBM POWER6 Microarchitecture," IBM J R&D, 2007.}}
  * {{:kalla_power7.pdf|Kalla et al., "Power7: IBM’s Next-Generation Server Processor," IEEE Micro 2010.}}
  * {{:annavaram05_amdahl.pdf|Annavaram et al., “Mitigating Amdahl’s Law Through EPI Throttling,” ISCA 2005.}}
  * {{:kumar_singleisaheterog.pdf|Kumar et al., “Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction,” MICRO 2003.}}

===== Module 2-1 =====
**Required:**
  * {{:suleman09-acs.pdf|Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures,” ASPLOS 2009.}}
  * {{:suleman10-marshaling.pdf|Suleman et al., "Data marshaling for multi-core architectures," ISCA 2010.}}
  * {{:suleman11-marshaling.pdf|Suleman et al., "Data Marshaling for Multicore Systems," IEEE Micro 2011.}}
  * {{:joao12-bottleneck.pdf|Joao et al., “Bottleneck Identification and Scheduling in Multithreaded Applications,” ASPLOS 2012.}}
  * {{:joao_uba_acmp.pdf|Joao et al., “Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs,” ISCA 2013.}}

**Mentioned during module:**
  * {{:r1_amdahl.pdf|Amdahl "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS Conference, April 1967.}}
  * {{:olukutun96_cmp.pdf|Olukotun et al., "The Case for a Single-Chip Multiprocessor," ASPLOS 1996.}}
  * {{:mutlu_runahead.pdf|Mutlu et al., “Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors,” HPCA 2003.}}
  * {{:mutlu_efficient_processing_runahead.pdf|Mutlu et al., "Techniques for Efficient Processing in Runahead Execution Engines," ISCA 2005.}}


===== Module 2-2 =====
**Mentioned during module:**
  * {{:reading_hill_551_560.pdf|Hill, Jouppi, Sohi, “Multiprocessors and Multicomputers,” pp. 551-560 in Readings in Computer Architecture.}}
  * Culler & Singh, Chapter 1
  * {{:li_coherencesharedmem.pdf|Li and Hudak, "Memory Coherence in Shared Virtual Memory Systems, " ACM TOCS 1989.}}
  * {{:hillis_cm5.pdf|Hillis and Tucker, "The CM-5 Connection Machine: a scalable supercomputer," CACM 1993.}}
  * {{:batcher_massparproc.pdf|Batcher, "Architecture of a massively parallel processor," ISCA 1980.}}
  * {{:tucker_connection.pdf|Tucker and Robertson, "Architecture and Applications of the Connection Machine," IEEE Computer 1988.}}
  * {{:seitz_cosmiccube.pdf|Seitz, “The Cosmic Cube,” CACM 1985.}}

===== Module 2-3 =====
**Required:**
  * {{:lamport_multiprocess_1979.pdf|Lamport, “How to Make a Multiprocessor Computer That 
Correctly Executes Multiprocess Programs,” IEEE Transactions on Computers, 1979.}} **[Review Required] due 9/20**

**Mentioned during module:**
  * {{:gharachorloo_mem_consistency.pdf|Gharachorloo et al., "Memory Consistency and Event Ordering 
in Scalable Shared-Memory Multiprocessors," ISCA 1990.}}
  * {{:gharachorloo_icpp.pdf|Gharachorloo et al., "Two Techniques to Enhance the 
Performance of Memory Consistency Models," ICPP 1991.}}
  * {{:ceze_bulksc.pdf|Ceze et al., “BulkSC: bulk enforcement of sequential 
consistency,” ISCA 2007.}}
===== Module 2-4 =====
**Required:**
  * Culler & Singh, Chapter 5.1
  * Culler & Singh, Chapter 5.3
  * Patterson & Hennessy, Chapter 5.8
  * {{:p348-papamarcos.pdf|Papamarcos and Patel, “A low-overhead coherence solution for multiprocessors with private cache memories,” ISCA 1984.}} **[Review Required] due 9/22**
  * {{:p241-laudon.pdf|Laudon and Lenoski, "The SGI Origin: a ccNUMA highly scalable server," ISCA 1997.}}
  * {{:p182-martin.pdf|Martin et al., "Token coherence: decoupling performance and correctness," ISCA 2003.}} **[Review Required] due 9/22**

**Mentioned during module:**
  * {{:censier_multicache.pdf|Censier and Feautrier, "A new solution to coherence problems in multicache systems," IEEE Transactions on Computers, 1978.}}
  * {{:p124-goodman.pdf|Goodman, "Using cache memory to reduce processor-memory traffic," ISCA 1983.}}
  * {{:lenoski_stanford_dash.pdf|Lenoski et al., "The Stanford DASH Multiprocessor," IEEE Computer, 1992.}}
  * {{:p73-baer.pdf|Baer and Wang, "On the inclusion properties for multi-level cache hierarchies," ISCA 1988.}}
  * {{:bloom_space_time.pdf|Bloom, "Space/time trade-offs in hash coding with allowable errors," CACM 1970.}}
  * {{:pact12_seshadri.pdf|Seshadri et al., "The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing," PACT 2012.}}
  * {{:liu_raidr.pdf|Liu et al., "RAIDR: Retention-Aware Intelligent DRAM Refresh," ISCA 2012.}}
===== Module 2-5 =====
**Required:**
  * {{:sohi95.pdf|Sohi et al., “Multiscalar Processors,” ISCA 1995.}} **[Review Required] due 9/24**
  * {{:zhou_scaleinstwindow00.pdf|Zhou, “Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window,” PACT 2005.}} **[Review Required] due 9/26**
  * {{:rajwar01.pdf|Rajwar and Goodman, “Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution,” MICRO 2001.}} **[Review Required] due 9/28**
  * {{:herlihy93.pdf|Herlihy and Moss, “Transactional Memory: Architectural Support for Lock-Free Data Structures,” ISCA 1993.}} **[Review Required] due 9/30**
**Mentioned during module:**
  * {{:colohan00.pdf|Colohan et al., “A Scalable Approach to Thread-Level Speculation,” ISCA 2000.}}
  * {{:akkary_dynmthread98.pdf|Akkary and Driscoll, “A dynamic multithreading processor,” MICRO 1998.}}
  * {{:smith78_hep.pdf|Burton Smith, “A pipelined, shared resource MIMD computer,” ICPP 1978.}}
  * {{:gopal_speculativeversioning98.pdf|Gopal et al., “Speculative Versioning Cache,” HPCA 1998.}}
  * {{:colohan00.pdf|Steffan et al., “A Scalable Approach to Thread-Level Speculation,” ISCA 2000.}}
  * {{:franklin_arb96.pdf|Franklin and Sohi, “ARB: A hardware mechanism for dynamic reordering of memory references,” IEEE TC 1996.}}
  * {{:moshovos_datadep97.pdf|Moshovos et al., “Dynamic Speculation and Synchronization of Data Dependences,” ISCA 1997.}}
  * {{:chrysos_memorydependence98.pdf|Chrysos and Emer, “Memory Dependence Prediction using Store Sets,” ISCA 1998.}}
  * {{:dubois_assisted98.pdf|Dubois and Song, “Assisted Execution,” USC Tech Report 1998.}}
  * {{:chappell_ssmt99.pdf|Chappell et al., “Simultaneous Subordinate Microthreading (SSMT),” ISCA 1999.}}
  * {{:zilles_specprediction01.pdf|Zilles and Sohi, “Execution-based Prediction Using Speculative Slices”, ISCA 2001.}}
  * {{:mutlu_runahead.pdf|Mutlu et al., “Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors,” HPCA 2003.}}
  * {{:sundaramoorthy_slipstream00.pdf|Sundaramoorthy et al., “Slipstream Processors: Improving both Performance and Fault Tolerance,“ ASPLOS 2000.}}
  * {{:zhou_scaleinstwindow00.pdf|Zhou, “Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window,” PACT 2005.}}
  * {{:martinez_specsync02.pdf|Martinez and Torrellas, "Speculative Synchronization: Applying Thread-Level Speculation to Explicitly Parallel Applications," ASPLOS 2002.}}
  * {{:rajwar_tlr02.pdf|Rajwar and Goodman, "Transactional Lock-Free Execution of Lock-Based Programs," ASPLOS 2002.}}
  * {{:suleman09-acs.pdf|Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures,” ASPLOS 2009.}}
  * {{:dice09-transactional.pdf|Dice et al., "Early experience with a commercial hardware transactional memory implementation," ASPLOS 2009.}}
  * {{:wang12-transactional.pdf|Wang et al., "Evaluation of blue Gene/Q hardware support for transactional memories," PACT 2012.}}
  * {{:212157.pdf|Jensen et al., “A New Approach to Exclusive Data Access in Shared Memory Multiprocessors,” LLNL Tech Report 1987.}}
  * {{:jsg12_tx.pdf|Jacobi et al., “Transactional Memory Architecture and Implementation for IBM System Z,” MICRO 2012.}}
  * {{:asf_micro2010.pdf|Chung et al., “ASF: AMD64 Extension for Lock-Free Data Structures and Transactional Memory,” MICRO 2010.}}

===== Module 3-1 =====
**Mentioned during module:**
  * {{denning_-_1970_-_virtual_memory.pdf|Denning, P. J. (1970). Virtual Memory. ACM Computing Surveys, 2(3).}}
  * {{00710872.pdf|Jacob, B., & Mudge, T. (1998). Virtual memory in contemporary microprocessors. IEEE Micro.}}
  * Patterson & Hennessy's Computer Organization and Design: The Hardware/Software Interface Chapter 5.4
  * Computer Organization by Hamacher, Vranesic, and Zaky, McGraw-Hill. Chapter 8.8

===== Module 3-2 =====
**Required:**
  * {{wilkes_-_1965_-_slave_memories_and_dynamic_storage_allocation.pdf|Wilkes, M. V. (1965). Slave Memories and Dynamic Storage Allocation. IEEE Transactions on Electronic Computers.}} **[Review Required] due 10/1**
**Mentioned during module:**
  * Patterson & Hennessy's Computer Organization and Design: The Hardware/Software Interface Chapter 5.1-5.3
  * Computer Organization by Hamacher, Vranesic, and Zaky, McGraw-Hill. Chapter 8.1-8.7
  * {{liptay_-_1968_-_structural_aspects_of_the_system360_model_85_ii_the_cache.pdf|Liptay, J. S. (1968). Structural aspects of the system/360 model 85: II the cache. IBM Syst. J.}}

===== Module 3-3 =====
**Required:**
  * {{05388441.pdf|Belady, L. A. (1966). A study of replacement algorithms for a virtual-storage computer. IBM Syst. J.}} **[Review Required] due 10/1**
**Mentioned during module:**
  * {{26080167.pdf|Qureshi, M. K., Lynch, D. N., Mutlu, O., & Patt, Y. N. (2006). A Case for MLP-Aware Cache Replacement. Proceedings of the 33rd annual international symposium on Computer Architecture.}}
  * {{jouppi_-_1990_-_improving_direct-mapped_cache_performance_by_the_addition_of_a_small_fully-associative_cache_and_prefetch_buffers.pdf|Jouppi, N. P. (1990). Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. Proceedings of the 17th annual international symposium on Computer Architecture.}}
  * {{p74-rau.pdf|Rau, B. R. (1991). Pseudo-randomly interleaved memory. Proceedings of the 18th annual international symposium on Computer architecture.}}
  * {{p169-seznec.pdf|Seznec, A. (1993). A case for two-way skewed-associative caches. Proceedings of the 20th annual international symposium on computer architecture.}}

===== Module 3-4 =====
**Required:**
  * {{:papers-mlp.pdf|Qureshi et al., “A Case for MLP-Aware Cache Replacement,” ISCA 2005.}} **[Review Required] due 10/3**
  * {{:pekhimenko12-bdi.pdf|Pekhimenko et al., "Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches," PACT 2012.}} **[Review Required] due 10/6**
  * {{:qureshi06-UCP.pdf|Qureshi and Patt, “Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches,” MICRO 2006.}} **[Review Required] due 10/8**
  * {{:pact12_seshadri.pdf|Seshadri et al., “The Evicted-Address Filter: A Unified Mechanism to Address both Cache Pollution and Thrashing,” PACT 2012.}}

**Mentioned during module:**
  * {{:lcp.pdf|Pekhimenko et al., “Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency,” MICRO 2013.}}
  * {{:suh02-partitioning.pdf|Suh et al., “A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning,” HPCA 2002.}}
  * {{:kim04-faircache.pdf|Kim et al., “Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture,” PACT 2004.}}
  * {{:lin08-partitioning.pdf|Lin et al., “Gaining Insights into Multi-Core Cache Partitioning: Bridging the Gap between Simulation and Real Systems,” HPCA 2008.}}
  * {{:cho06-coloring.pdf|Cho and Jin, “Managing Distributed, Shared L2 Caches through OS-Level Page Allocation,” MICRO 2006.}}
  * {{:qureshi09-asr.pdf|Qureshi, “Adaptive Spill-Receive for Robust High-Performance Caching in CMPs,” HPCA 2009.}}
  * {{:hardavellas09_rnuca.pdf|Hardavellas et al., “Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches,” ISCA 2009.}}
  * {{:fairsched.pdf|Fedorova et al., “Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler,” PACT 2007.}}


===== Module 3-5 =====
**Required:**
  * {{:TLDRAM-Lee.pdf|Lee et al., “Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture,” HPCA 2013.}} **[Review Required] due 10/10**
  * {{:isca08.pdf|Ipek et al., “Self Optimizing Memory Controllers: A Reinforcement Learning Approach,” ISCA 2008.}} **[Review Required] due 10/13**

**Mentioned during module:**
  * {{:raidr-isca12.pdf|Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012.}}
  * {{:2012_isca_salp.pdf|Kim et al., “A Case for Exploiting Subarray-Level Parallelism in DRAM,” ISCA 2012.}}
  * {{:dram-retention-time-characterization_isca13.pdf|Liu et al., “An Experimental Study of Data Retention Behavior in Modern DRAM Devices,” ISCA 2013.}}
  * {{:CMU-CS-13-108.pdf|Seshadri et al., “RowClone: Fast and Efficient In-DRAM Copy and Initialization of Bulk Data,” CMU CS Tech Report 2013.}}
  * {{:memory-dvfs_icac11.pdf|David et al., “Memory Power Management via Dynamic Voltage/Frequency Scaling,” ICAC 2011. }}
  * {{:flash-error-analysis-and-management_itj13.pdf|Cai et al., “Error Analysis and Retention-Aware Error Management for NAND Flash Memory,” ITJ 2013.}}
  * {{:a2c-hpca13.pdf|Das et al., “Application-to-Core Mapping Policies to Reduce Memory System Interference in Multi-Core Systems,” HPCA 2013.}}
  * {{:ebrahimi_asplos10.pdf|Ebrahimi et al., “Fairness via Source Throttling,” ASPLOS 2010, ACM TOCS 2012.}}
  * {{:prefetch_aware_dram_controllers.pdf|Lee et al., “Prefetch-Aware DRAM Controllers,” MICRO 2008, IEEE TC 2011.}}

===== Module 3-6 =====
**Required:**
  * {{:pcm.pdf|Qureshi et al., “Scalable high performance main memory system using phase-change memory technology,” ISCA 2009.}} **[Review Required] due 10/15**

**Mentioned during module:**
  * {{:persistent-memory-management_weed13.pdf|Meza et al., “A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory,” WEED 2013.}}
  * {{:sttram_ispass13.pdf|Kultursay et al., “Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative,” ISPASS 2013.}}
  * {{:ISCA09.pdf|Lee, Ipek, Mutlu, Burger, “Architecting Phase Change Memory as a Scalable DRAM Alternative,” ISCA 2009, CACM 2010, Top Picks 2010.}}
  * {{:timber_cal12.pdf|Meza et al., “Enabling Efficient and Scalable Hybrid Memories,” IEEE Comp. Arch. Letters 2012.}}
  * {{:RBLA_ICCD12.pdf|Yoon et al., “Row Buffer Locality Aware Caching Policies for Hybrid Memories,” ICCD 2012 Best Paper Award.}}

===== Module 3-7 =====
**Required:**
  * {{:parbs_isca08.pdf|Mutlu and Moscibroda, “Parallelism-Aware Batch Scheduling,” ISCA 2008.}} **[Review Required] due 10/20**
  * {{:mise-hpca13.pdf|Subramanian et al., “MISE: Providing Performance Predictability and Improving Fairness in Shared Main Memory Systems,” HPCA 2013.}} **[Review Required] due 10/26**
  * {{:ebrahimi_asplos10.pdf|Ebrahimi et al., “Fairness via Source Throttling,” ASPLOS 2010, ACM TOCS 2012.}} **[Review Required] due 10/30**
**Mentioned during module:** 
  * {{:ebrahimi_micro11.pdf|Ebrahimi et al., “Parallel Application Memory Scheduling,” MICRO 2011.}}
  * {{:mph_usenix_security07.pdf|Moscibroda and Mutlu, “Memory Performance Attacks,” USENIX Security 2007.}}
  * {{:MICRO2007.pdf|Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling,” MICRO 2007.}}
  * {{:HPCA16_ATLAS.pdf|Kim et al., “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010.}}
  * {{:tcm_micro10.pdf|Kim et al., “Thread Cluster Memory Scheduling,” MICRO 2010, IEEE Micro 2011.}}
  * {{:MICRO_2011.pdf|Muralidhara et al., “Memory Channel Partitioning,” MICRO 2011.}}
  * {{:isca2012_sms.pdf|Ausavarungnirun et al., “Staged Memory Scheduling,” ISCA 2012.}}
  * {{:ebrahimi_isca11.pdf|Ebrahimi et al., “Prefetch-Aware Shared Resource Management for Multi-Core Systems,” ISCA 2011.}}

===== Module 4-1 and 4-2 =====
**Required:**
  * {{:bless_isca09.pdf|Moscibroda and Mutlu, “A Case for Bufferless Routing in On-Chip Networks,” ISCA 2009.}} **[Review Required] due 11/1**
  * {{:chipper_hpca2011.pdf|Fallin et al., “CHIPPER: A Low-Complexity Bufferless Deflection Router,” HPCA 2011.}} **[Review Required] due 11/1**
  * {{:minbd_nocs2012.pdf|Fallin et al., “MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect,” NOCS 2012.}}
  * {{:micro2009.pdf|Das et al., “Application-Aware Prioritization Mechanisms for On-Chip Networks,” MICRO 2009.}}
  * {{:isca2010.pdf|Das et al., “Aergia: Exploiting Packet Latency Slack in On-Chip Networks,” ISCA 2010, IEEE Micro 2011.}} **[Review Required] due 10/28**
  * {{:hat_sbacpad2012.pdf|Kevin Chang et al., "HAT: Heterogeneous Adaptive Throttling for On-Chip Networks," SBAC-PAD 2012}}
  * {{:41_4.pdf|Dally, “Route Packets, Not Wires: On-Chip Interconnection Networks,” DAC 2001.}}
**Recommended:**
  * {{:wentzlaff_tile64_noc_ieeemicro2007.pdf|Wentzlaff et al., “On-Chip Interconnection Architecture of the Tile Processor,” IEEE Micro 2007.}}
  * {{:mullins04.pdf|Mullins et al., “Low-Latency Virtual-Channel Routers for On-Chip Networks,” ISCA 2004.}}
  * {{:an_alternate_dimension_order_collective_communications_scheme_on_packet_switched.pdf|Tobias Bjerregaard, Shankar Mahadevan, “A Survey of Research and Practices of Network-on-Chip,” ACM Computing Surveys (CSUR) 2006.}}
**Mentioned during module:**
  * {{:pvc-qos_micro09.pdf|Grot et al. “Preemptive Virtual Clock: A Flexible, Efficient, and Cost-effective QOS Scheme for Networks-on-Chip,” MICRO 2009.}}
  * {{:kilonoc_isca11.pdf|Grot et al., “Kilo-NOC: A Heterogeneous Network-on-Chip Architecture for Scalability and Service Guarantees,” ISCA 2011, IEEE Micro 2012.}}
  * {{:p168-patel.pdf|Janak H. Patel, "Processor-Memory Interconnections for Multiprocessors," ISCA 1979}}
  * {{:nyuultracomputer.pdf|Gottlieb et al., "The NYU Ultracomputer-designing MIMD, shared-memory parallel machine," ISCA 1982}}
  * {{:tr-2012-004.pdf|Chris Fallin et al., "HiRD: A Low-COmplexity, Energy-Efficient Hierarchical Ring Interconnect" SAFARI Technical Report}}
  * {{:titlecprt.ps|Thinking Machines Corp., "The Connection Machine CM-5 Technical Summary," Jan. 1992}}
  * {{:r01-cosmic-cube-seitz-1985.pdf|Seitz, "The Cosmic Cube," CACM 1985}}
  * {{:l8-turnmodel-isca92.pdf|Glass and Ni, "The Turn Model for Adaptive Routing," ISCA 1992}}
  * {{:sigcomm2012_onchip.pdf|George Nychis et al., "On-Chip Network from a Networking Perspective: Congestion and Scalability in Many-core Interconnects," SIGCOMM 2012}}
  * {{:hotnets_2010.pdf|George Nychis et al., "Next Generation On-Chip Networks: What Kind of Congestion Control Do We Need?," HOTNETS 2010}}
  * {{:spider.pdf|Galles, “Spider: A High-Speed Network Interconnect,” IEEE Micro 1997}}
  * {{:virtual_channel.pdf|Dally, “Virtual Channel Flow Control,” ISCA 1990}}
  * {{:|M. Karo, et al., "Input Versus Output Queuing on a Space-Division Packet Switch," IEEE Transactions on Communications 1987}}


===== Module 5-1 =====

**Review Required: **
  * {{04523358.pdf|Lindholm, E., Nickolls, J., Oberman, S., & Montrym, J. (2008). NVIDIA Tesla: A Unified Graphics and Computing Architecture. Micro, IEEE.}}**[Review Required]: due 11/4 **
  * {{large-gpu-warps_micro11.pdf|Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov, Onur Mutlu, Yale N. Patt: Improving GPU performance via large warps and two-level warp scheduling. MICRO 2011: 308-317}}  **[Review Required]: due 11/4 **

**Mentioned during module:**
  * {{p50-fatahalian.pdf|p50-fatahalian.pdf|Fatahalian, K., & Houston, M. (2008). A closer look at GPUs. Commun. ACM.}}
  * {{01447203.pdf|Flynn, M. J. (1966). Very high-speed computing systems. Proceedings of the IEEE.}}
  * {{russell_-_1978_-_the_cray-1_computer_system.pdf|Russell, “The CRAY-1 computer system,” CACM 1978.}}
  * {{00526924.pdf|Peleg and Weiser, “MMX Technology Extension to the Intel Architecture,” IEEE Micro, 1996.}}
  * {{30470407.pdf|Fung, W. W. L., Sham, I., Yuan, G., & Aamodt, T. M. (2007). Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture.}}
  * {{p34-gurd.pdf|Gurd, J. R., Kirkham, C. C., & Watson, I. (1985). The Manchester prototype dataflow computer. Commun. ACM.}}
===== Module 5-2 =====
**Required:**
  * {{dennis74.pdf|Dennis and Misunas, “A preliminary architecture for a basic data flow processor,” ISCA 1974.}}**[Review Required]: 11/8**
  * {{06045685.pdf|Keckler et al., GPUs and the Future of Parallel Computing, IEEE Micro 2011.}} **[Review Required]: 11/8**
  * {{:arvind90.pdf|Arvind and Nikhil, "Executing a program on the MIT tagged-token dataflow architecture," IEEE TC 1990. }}
  * {{:patt85.pdf|Patt et al., "HPS, a new microarchitecture: rationale and introduction," MICRO 1985.}}
  * {{:patt85-hpsissues.pdf|Patt et al., "Critical issues regarding HPS, a high performance microarchitecture," MICRO 1985.}}

**Mentioned during module:**

  * {{:gurd95.pdf|Gurd et al., "The Manchester prototype dataflow computer," CACM 1985.}}
  * {{:lee_dataflow94.pdf|Lee and Hurson, "Dataflow Architectures and Multithreading," IEEE Computer 1994.}}
  * {{:sankaralingam_itdlp03.pdf|Sankaralingam et al., “Exploiting ILP, TLP and DLP with the Polymorphous TRIPS Architecture,” ISCA 2003.}}
  * {{:burger_edge04.pdf|Burger et al., “Scaling to the End of Silicon with EDGE Architectures,” IEEE Computer 2004.}}
  * {{:dennis74.pdf|Dennis and Misunas, "A preliminary architecture for a basic data flow processor," ISCA 1974.}}
  * {{:treleaven82.pdf|Treleaven et al., “Data-Driven and Demand-Driven Computer Architecture,” ACM Computing Surveys 1982.}}
  * {{:veen86.pdf|Veen, “Dataflow Machine Architecture,” ACM Computing Surveys 1986. }}
  * {{:hwu86-hpsm.pdf|Hwu and Patt, “HPSm, a high performance restricted data flow architecture having minimal functionality,” ISCA 1986.}}
===== Module 5-3 =====
**Required:**
  * {{kung_-_1982_-_why_systolic_architectures.pdf|Kung, H. T. (1982). Why Systolic Architectures? IEEE Computer.}} **[due 11/11]**

**Mentioned during module:**

  * {{fisher_-_1983_-_very_long_instruction_word_architectures_and_the_eli-512.pdf|Fisher, J. A. (1983). Very Long Instruction Word architectures and the ELI-512. Proceedings of the 10th annual international symposium on Computer architecture.}}
  * {{smith-1982-decoupled-access-execute-computer-architectures.pdf|Smith, J. E. (1982). Decoupled access/execute computer architectures. Proceedings of the 9th annual symposium on Computer Architecture.}}
  * {{p289-smith.pdf|Smith, J. E. (1984). Decoupled access/execute computer architectures. ACM Trans. Comput. Syst.}}
  * {{p199-smith.pdf|Smith, J. E., Dermer, G. E., Vanderwarn, B. D., Klinger, S. D., & Rozewski, C. M. (1987). The ZS-1 central processor. Proceedings of the second international conference on Architectual support for programming languages and operating systems.}}
  * {{00030730.pdf|Smith, J. E. (1989). Dynamic instruction scheduling and the Astronautics ZS-1. IEEE Computer.}}
  * {{annaratone_et_al._-_1986_-_warp_architecture_and_implementation.pdf|Annaratone, M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. S. (1986). Warp architecture and implementation. Proceedings of the 13th annual international symposium on Computer architecture.}}
  * {{annaratone_et_al._-_1987_-_the_warp_computer_architecture_implementation_and_performance.pdf|Annaratone, M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. (1987). The warp computer: Architecture, implementation, and performance. IEEE Transactions on Computers.}}
  * {{HPL-92-132.pdf|Rau and Fisher, “Instruction-level parallel processing: history, overview, and perspective,” Journal of Supercomputing, 1993.}}
  * {{00964443.pdf|Faraboschi et al., “Instruction Scheduling for Instruction Level Parallel Processors,” Proc. IEEE, Nov. 2001. }}


=====Module 6-1=====

**Review Required:**
  * {{:smith78_hep.pdf|Smith, “A pipelined, shared resource MIMD computer,” ICPP 1978. }} **[due 11/13]**
  * {{:tullsen96_smt.pdf|Tullsen et al., “Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor,” ISCA 1996. }} **[due 11/13]**

**Recommended:**
  * {{hep_arch.pdf|Burton J. Smith, "Architecture and applications of the HEP multiprocessor computer system}}

**Mentioned during module:**
  * {{:spracklen05_mt.pdf|Spracklen and Abraham, “Chip Multithreading: Opportunities and Challenges,” HPCA Industrial Session 2005.}}
  * {{:kalla04_power5.pdf|Kalla et al., “IBM Power5 Chip: A Dual-Core Multithreaded Processor,” IEEE Micro 2004.}}
  * {{:eyerman07_mlp.pdf|Eyerman and Eeckhout, “A Memory-Level Parallelism Aware Fetch Policy for SMT Processors,” HPCA 2007.}}
  * {{:hirata92_smt.pdf|Hirata et al., “An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads,” ISCA 1992.}}
  * {{:gabor_fairthru06.pdf|Gabor et al., “Fairness and Throughput in Switch on Event Multithreading,” MICRO 2006.}}
  * {{:agarwal90-april.pdf|Agarwal et al., “APRIL: A Processor Architecture for Multiprocessing,” ISCA 1990.}}
  * {{:kim10-tcm.pdf|Kim et al., “Thread Cluster Memory Scheduling,” MICRO 2010}}
  * {{:kim11-tcm.pdf|Kim et al., “Thread Cluster Memory Scheduling,” IEEE Micro 2011.}}
  * {{:ausavarungnirun12-sms.pdf|Ausavarungnirun et al., “Staged memory scheduling: achieving high performance and scalability in heterogeneous systems,” ISCA 2012.}}
  * {{:ebrahimi11-parallel.pdf|Ebrahimi et al., “Parallel Application Memory Scheduling,” MICRO 2011.}}
  * {{:meza12-timber.pdf|Meza et al., “Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management,” IEEE CAL 2012.}}
  * {{:yoon12-rbla.pdf|Yoon et al., “Row Buffer Locality Aware Caching Policies for Hybrid Memories,” ICCD 2012.}}
  * {{:thornton_cdc6600.pdf|Thornton, "Design of a Computer: The Control Data 6600," 1970.}}
  * {{:thornton_parallelcd64.pdf|Thornton, "Parallel Operation in the Control Data 6600," AFIPS 1964.}}
  * {{:mcnairy_montecito05.pdf|McNairy and Bhatia, “Montecito: A Dual-Core, Dual-Thread Itanium Processor,” IEEE Micro 2005.}}

=====Module 6-2=====

**Mentioned during module:**
  * {{:yamamoto_perfest94.pdf|Yamamoto et al., “Performance Estimation of Multistreamed, Supersealar Processors,” HICSS 1994.}}
  * {{:tullsen_simulmthd95.pdf|Tullsen et al., “Simultaneous Multithreading: Maximizing On-Chip Parallelism,” ISCA 1995.}}
  * {{:snavely_symbioticsched00.pdf|Snavely and Tullsen, "Symbiotic Jobscheduling for a Simultaneous Multithreading Processor," ASPLOS 2000.}}
  * {{:jacobsen96-confidence.pdf|Jacobsen et al., "Assigning confidence to conditional branch predictions," MICRO 1996.}}
  * {{:brown_longlatsmt01.pdf|Brown and Tullsen, “Handling Long-latency Loads in a Simultaneous Multithreading Processor,” MICRO 2001.}}
  * {{:elmoursy_frontendsmt03.pdf|El-Moursy and Albonesi, “Front-End Policies for Improved Issue Efficiency in SMT Processors,” HPCA 2003.}}
  * {{:raasch_resourcepartsmt03.pdf|Raasch and Reinhardt, “The Impact of Resource Partitioning on SMT Processors,” PACT 2003.}}
  * {{:eyerman07_mlp.pdf|Eyerman and Eeckhout, “A Memory-Level Parallelism Aware Fetch Policy for SMT Processors,” HPCA 2007.}}
  * {{:ramirez_runaheadsmt08.pdf|Ramirez et al., “Runahead Threads to Improve SMT Performance,” HPCA 2008.}}
  * {{:vancraeynest09-mlprunahead.pdf|Van Craeynest et al., "MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor," HiPEAC 2009.}}
  * {{:kalla04_power5.pdf|Kalla et al., “IBM Power5 Chip: A Dual-Core Multithreaded Processor,” IEEE Micro 2004.}}
  * {{:lebeck02-wib.pdf|Lebeck et al., "A Large, Fast Instruction Window for Tolerating Cache Misses," ISCA 2002.}}
  * {{:marr_hyperthread02.pdf|Marr et al., “Hyper-Threading Technology Architecture and Microarchitecture,” Intel technology Journal 2002.}}

=====Module 6-3=====

**Review Required:**
  * {{:chappell_ssmt99.pdf|Chappell et al., “Simultaneous Subordinate Microthreading (SSMT),” ISCA 1999. }} **[due 11/15]**
  * {{:snavely_symbioticsched00.pdf|Snavely and Tullsen, “Symbiotic Jobscheduling for a Simultaneous Multithreading Processor,” ASPLOS 2000. }} **[due 11/15]**

**Mentioned during module:**
  * {{:reinhardt_faultdetectsmt00.pdf|Reinhardt and Mukherjee, “Transient Fault Detection via Simultaneous Multithreading,” ISCA 2000.}}
  * {{:rotenberg99-ar-smt.pdf|Rotenberg, "AR-SMT: a microarchitectural approach to fault tolerance in microprocessors," Fault-Tolerant Computing 1999.}}
  * {{:mukherjee_redunmt02.pdf|Mukherjee et al., “Detailed Design and Evaluation of Redundant Multithreading Alternatives,” ISCA 2002.}}
  * {{:kessler99-alpha21264.pdf|Kessler, “The Alpha 21264 Microprocessor,” IEEE Micro 1999.}}
  * {{:austin_diva99.pdf|Austin, “DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design,” MICRO 1999.}}
  * {{:qureshi_faulttol05.pdf|Qureshi et al., “Microarchitecture-Based Introspection: A Technique for Transient-Fault Tolerance in Microprocessors,” DSN 2005.}}
  * {{:zilles_exception99.pdf|Zilles et al., “The use of multithreading for exception handling,” MICRO 1999.}}
  * {{:dubois_assisted98.pdf|Dubois and Song, “Assisted Execution,” USC Tech Report 1998.}}
  * {{:chappell02-prediction.pdf|Chappell et al., "Difficult-path branch prediction using subordinate microthreads," ISCA 2002.}}
  * {{:zilles_specprediction01.pdf|Zilles and Sohi, “Execution-based Prediction Using Speculative Slices”, ISCA 2001.}}

=====Module 6-4=====

**Mentioned during module:**
  * {{:colohan00.pdf|Colohan et al., “A Scalable Approach to Thread-Level Speculation,” ISCA 2000.}}
  * {{:sundaramoorthy_slipstream00.pdf|Sundaramoorthy et al., “Slipstream Processors: Improving both Performance and Fault Tolerance,“ ASPLOS 2000.}}
  * {{:zhou_scaleinstwindow00.pdf|Zhou, “Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window,” PACT 2005.}}
=====Module 7-1=====

**Review Required:**
  * {{00476078.pdf|Smith, J. E., & Sohi, G. S. (1995). The microarchitecture of superscalar processors. Proceedings of the IEEE.}} **[due 11/18]**

**Mentioned during module:**
  * {{00004607.pdf|Smith, J. E., & Pleszkun, A. R. (1988). Implementing precise interrupts in pipelined processors. Computers, IEEE Transactions on.}}
  * {{p18-hwu.pdf|Hwu, W. W., & Patt, Y. N. (1987). Checkpoint repair for out-of-order execution machines. Proceedings of the 14th annual international symposium on Computer architecture.}}
  * {{patt_hwu_shebanow_-_1985_-_hps_a_new_microarchitecture_rationale_and_introduction.pdf|Patt, Y. N., Hwu, W. M., & Shebanow, M. (1985). HPS, a new microarchitecture: rationale and introduction. Proceedings of the 18th annual workshop on Microprogramming.}}
  * {{tomasulo_-_1967_-_an_efficient_algorithm_for_exploiting_multiple_arithmetic_units.pdf|Tomasulo, R. M. (1967). An Efficient Algorithm for Exploiting Multiple Arithmetic Units. IBM Journal of Research and Development.}}
  * {{p109-patt.pdf|Patt, Y. N., Melvin, S. W., Hwu, W. M., & Shebanow, M. C. (1985). Critical issues regarding HPS, a high performance microarchitecture. Proceedings of the 18th annual workshop on Microprogramming.}}
  * {{p248-sazeides.pdf |Sazeides, Y., & Smith, J. E. (1997). The predictability of data values. Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture.}}

=====Module 7-2=====

**Review Required:**
  * {{kessler_-_1999_-_the_alpha_21264_microprocessor.pdf|Kessler, R. E. (1999). The Alpha 21264 Microprocessor. IEEE Micro.}} **[due 11/20]**
  * {{complexityeffectivesuperscalar.pdf|Palacharla et al., Complexity-Effective Superscalar Processors, ISCA 1997}} **[due 11/27]**


**Mentioned during module:**
  * {{:moshovos_datadep97.pdf|Moshovos et al., “Dynamic Speculation and Synchronization of Data Dependences,” ISCA 1997.}}
  * {{:chrysos_memorydependence98.pdf|Chrysos and Emer, “Memory Dependence Prediction using Store Sets,” ISCA 1998.}}
  * {{hinton_et_al._-_2001_-_the_microarchitecture_of_the_pentium_4_processor.pdf|Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., & Roussel, P. (2001). The Microarchitecture of the Pentium 4 Processor. Intel Technology Journal.}}
  * {{00491460.pdf|Yeager, K. C. (1996). The MIPS R10000 Superscalar Microprocessor. IEEE Micro.}}
  * {{:tendler_power4.pdf|Tendler et al., "POWER4 system microarchitecture," IBM J R&D, 2002.}}
  * {{p34-gurd.pdf|Gurd, J. R., Kirkham, C. C., & Watson, I. (1985). The Manchester prototype dataflow computer. Commun. ACM.}}
=====Module 7-3=====

**Review Required:**
  * {{hwu_jsuper93.pdf|Hwu et al., "The Superblock: An Effective Technique for VLIW and superscalar compilation," J of SC 1991.}} **[due 11/22]**

**Mentioned during module:**
  * {{kung_-_1982_-_why_systolic_architectures.pdf|Kung, H. T. (1982). Why Systolic Architectures? IEEE Computer.}}
  * {{smith-1982-decoupled-access-execute-computer-architectures.pdf|Smith, J. E. (1982). Decoupled access/execute computer architectures. Proceedings of the 9th annual symposium on Computer Architecture.}}
  * {{p289-smith.pdf|Smith, J. E. (1984). Decoupled access/execute computer architectures. ACM Trans. Comput. Syst.}}
  * {{fisher_trace.pdf|Fisher, "Trace scheduling: A technique for global microcode compaction," IEEE TC 1981.}}
  * {{mahlke_hyperblock.pdf|Mahlke et al., "Effective Compiler Support for Predicated Execution Using the Hyperblock," MICRO 1992.}}
  * {{melvin_patt_eis_block.pdf|Melvin and Patt, "Enhancing Instruction Scheduling with a Block-Structured ISA," IJPP 1995.}}
  * {{hao_fetchrate.pdf|Hao et al., "Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures," MICRO 1996.}}
  * {{huck_ia64.pdf|Huck et al., "Introducing the IA-64 Architecture," IEEE Micro 2000.}}

=====Module 7-4=====

**Review Required:**
  * {{mcfarling1993.pdf|S. Mcfarling, "Combining branch predictors," WRL Technical Note TN-36 (1993).}} **[due 12/2]**

**Mentioned during module:**
  * {{:pldi93.pdf|Ball and Larus, "Branch Prediction for Free," PLDI 1993.}}
  * {{p135-smith.pdf|Smith, J. E. (1981). A study of branch prediction strategies. Proceedings of the 8th annual symposium on Computer Architecture.}}
  * {{yeh_patt_-_1991_-_two-level_adaptive_training_branch_prediction.pdf|Yeh, T.-Y., & Patt, Y. N. (1991). Two-level adaptive training branch prediction. Proceedings of the 24th annual international symposium on Microarchitecture.}}
  * {{:kessler99-alpha21264.pdf|Kessler, "The Alpha 21264 Microprocessor," IEEE Micro 1999.}}
  * {{p22-chang.pdf|Chang, P.-Y., Hao, E., Yeh, T.-Y., & Patt, Y. (1994). Branch classification: a new mechanism for improving branch predictor performance. Proceedings of the 27th annual international symposium on Microarchitecture.}}
  * {{24400043.pdf|Kim, H., Mutlu, O., Stark, J., & Patt, Y. N. (2005). Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution. Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture.}}
  * {{kim_micro06.pdf|Kim et al., "Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths," MICRO 2006.}}
  * {{Riseman.1972.TC.pdf|Riseman and Foster, "The Inhibition of Potential Parallelism by Conditional Jumps," IEEE Transactions on Computers 1972.}}
  * {{Chang.1997.ISCA.pdf|Chang, Hao, and Patt, "Target Prediction for Indirect Jumps," ISCA 1997.}}
=====Module 7-5=====

**Review Required:**
  * {{mutlu_ieee_micro03.pdf|Onur Mutlu, Jared Stark, Chris Wilkerson, and Yale N. Patt, "Runahead Execution: An Effective Alternative to Large Instruction Windows," IEEE Micro Top Picks 2003.}} **[due 12/4]**
  * {{mutlu_ieee_micro06.pdf|Onur Mutlu, Hyesoon Kim, and Yale N. Patt, "Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance," IEEE Micro Top Picks 2006.}} **[due 12/4]**
  * {{mutlu_micro05.pdf|Onur Mutlu, Hyesoon Kim, and Yale N. Patt, "Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns," MICRO 2005.}} **Optional due 12/6**

**Mentioned during module:**
  * {{:mutlu_runahead.pdf|Mutlu et al., “Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors,” HPCA 2003.}}
  * {{wrongpath.pdf|Armstrong et al., "Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery," MICRO 2004.}}