===== Paper Reviews and Discussion ===== Post your reviews for the required readings in the paper review system. ===== For Lecture 1 ===== == Required Readings == * {{:lecture1-amdahl.pdf|G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS Conference, April 1967.}} * {{:lecture1-moore.1965.electronics.pdf|G. E. Moore, "Cramming more components onto integrated circuits," Electronics, April 1965.}} * {{:lecture1-ronen.2001.ieee.pdf|Ronen et al., "Coming Challenges in Microarchitecture and Architecture," Proceedings of the IEEE, vol. 89, no. 11, 2001.}} * {{:lecture1-requirementsbottlenecksandgoodfortune-patt.pdf|Y. N. Patt, "Requirements, bottlenecks, and good fortune: agents for microprocessor evolution," Proceedings of the IEEE, vol. 89, no. 11, 2001.}} ===== For Lecture 2 ===== == Required Reading == * {{instructionssetsandbeyond.pdf|Colwell et al., "Instruction Sets and Beyond: Computers, Complexity, and Controversy," Computer, September 1985.}} == Suggested Readings == On-Chip Networks * {{routepacketsnotwires.pdf|Dally et al., "Route Packets, Not Wires: On-Chip Interconnection Networks," DAC, June 2001.}} * {{onchipinterconnectionarchitectureoftileprocessor.pdf|Wentzlaff et al., "On-Chip Interconnection Architecture of the Tile Processor," Micro, IEEE, November 2007.}} * {{preemptivevirtualclock.pdf|Grot et al., "Preemptive Virtual Clock: A Flexible, Efficient, and Cost-effective QOS Scheme for Networks-on-a-Chip," Micro, December 2009.}} Main Memory Controllers * {{memoryperformanceattacks.pdf|Moscibroda et al., "Memory performance attacks: Denial of memory service in multi-core systems," Usenix Security Symposium, 2007.}} * {{memoryaccessscheduling.pdf|Rixner et al., "Memory Access Scheduling," ISCA, 2000.}} Architecture Reference Manuals * [[http://www.bitsavers.org/pdf/dec/vax/VAX_archHbkVol1_1977.pdf|Digital Equipment Corp., “VAX11 780 Architecture Handbook,” 1977-78]] * [[http://www.intel.com/products/processor/manuals/|Intel Corp. “Intel 64 and IA-32 Architectures Software Developer’s Manual”]] Compilers * {{compilersandcomputerarchitecture.pdf|Wulf, "Compilers and Computer Architecture," IEEE Computer, 1981.}} ===== For Lecture 3 ===== == Required Reading == * {{TransactionalMemory.pdf|Herlihy et al., "Transactional Memory: Architectural Support for Lock-free Data Structures," ISCA, 1993.}} == Suggested Readings == * {{electroniccomputingvonneumann.pdf|, "Preliminary discussion of the logical design of an electronic computing instrument," Institute for Advanced Study , 1946.}} ===== For Lecture 4 ===== == Required Reading == * {{pipelinedmimdcomputer.pdf|Burton Smith, "A pipelined, shared resource MIMD computer," ICPP 1978.}} * {{onchipopticaltechnology.pdf|Kirman et al., "On-Chip Optical Technology in Future Bus-Based Multicore Designs," IEEE Micro Top Picks 2007.}} == Suggested Readings == * {{predictabilityofdatavalues.pdf|Yiannakis Sazeides, James E. Smith: "The Predictability of Data Values," MICRO 1997: 248-258}} * {{valuelocalityandloadvalueprediction.pdf|Mikko H. Lipasti, Christopher B. Wilkerson, John Paul Shen: "Value Locality and Load Value Prediction," ASPLOS 1996: 138-147}} ===== For Lecture 5 ===== == Required Reading == * {{implementingpreciseinterrupts.pdf|Smith and Plezskun, “Implementing Precise Interrupts in Pipelined Processors,” IEEE Trans on Computers 1988 and ISCA 1985}} * {{microarchitectureofsuperscalar.pdf|Smith and Sohi, "The Microarchitecture of Superscalar Processors," Proc IEEE 1995}} == Suggested Readings == * {{checkpointrepairforoutoforder.pdf|Hwu and Patt, "Checkpoint Repair for Out-of-order Execution Machines," ISCA 1987}} ===== For Lecture 6 ===== == Required Reading == * {{virtualmemory.pdf|Jacob and Mudge, "Virtual Memory in Contemporary Microprocessors," IEEE Micro, vol. 18, no. 4, 1998}} ===== For Lecture 7 ===== * Hennessy and Patterson, Sections 2.1-2.10 (inclusive) == Modern Designs - Required Readings== * {{onpipeliningdynamicinstructionschedulinglogic.pdf|Stark, Brown, Patt, “On pipelining dynamic instruction scheduling logic,” MICRO 2000}} * {{Themicroarchitectureofthepentium4processor.pdf|Boggs et al., “The microarchitecture of the Pentium 4 processor,” Intel Technology Journal, 2001}} * {{21264microprocessor.pdf|Kessler, “The Alpha 21264 microprocessor,” IEEE Micro, March-April 1999}} * {{themipsr10000superscalarmicroprocessor.pdf|Yeager, “The MIPS R10000 Superscalar Microprocessor,” IEEE Micro, April 1996}} == Seminal Papers - Recommended Readings == * {{hps.pdf|Patt, Hwu, Shebanow, “HPS, a new microarchitecture: rationale and introduction,” MICRO 1985}} * {{criticalissuesregardinghps.pdf|Patt et al., “Critical issues regarding HPS, a high performance microarchitecture,” MICRO 1985}} * {{ibmsystem360.pdf|Anderson, Sparacio, Tomasulo, “The IBM System/360 Model 91: Machine Philosophy and Instruction Handling,” IBM Journal of R&D, Jan. 1967}} * {{anefficientalgorithmforexploitingmultiplearithmeticunits.pdf|Tomasulo, “An Efficient Algorithm for Exploiting Multiple Arithmetic Units,” IBM Journal of R&D, Jan. 1967}} ===== For Lecture 9 ===== == Required Readings == * {{improvingdirectmappedcacheperformance.pdf|Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” ISCA 1990}} * {{acaseformlpawarecachereplacement.pdf|Qureshi et al., “A Case for MLP-Aware Cache Replacement,“ ISCA 2006}} * {{slavememoriesanddynamicstorageallocation.pdf|Wilkes, “Slave Memories and Dynamic Storage Allocation,” IEEE Trans. On Electronic Computers, 1965}} ===== For Lecture 10 ===== == Required Readings == * {{runaheadexecution.pdf|Mutlu et al., "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors," HPCA 2003}} * {{dualcoreexecution.pdf|Zhou, Dual-Core Execution: "Building a Highly Scalable Single-Thread Instruction Window," PACT 2005}} * {{efficientrunaheadexecution.pdf|Mutlu et al., "Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance," IEEE Micro Top Picks 2006}} == Suggested Readings == * {{memorydependenceprediction.pdf|Chrysos and Emer, "Memory Dependence Prediction Using Store Sets," ISCA 1998}} ===== For Lecture 11 ===== == Required Readings == * Hennessy and Patterson, Appendix C.1-C.3 * {{improvingdirectmappedcacheperformance.pdf|Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” ISCA 1990}} * {{acaseformlpawarecachereplacement.pdf|Qureshi et al., “A Case for MLP-Aware Cache Replacement,“ ISCA 2006}} == Suggested Readings == * {{twowayskewedassociativecaches.pdf|Seznec, "A Case for Two-way Skewed Associative Caches," ISCA 1993}} * {{cacheconsciousstructuredefinition.pdf|Chilimbi et al., "Cache-conscious Structure Definition," PLDI 1999}} * {{cacheconsciousstructurelayout.pdf|Chilimbi et al., "Cache-conscious Structure Layout," PLDI 1999}} ===== For Lecture 13 ===== == Required Readings == * {{codetransformationsformlp.pdf|Pai et al., "Code Transformations to Improve Memory Parallelism," MICRO 1999}} == Recommended Readings == * {{datacachesforsuperscalar.pdf|Juan et al., "Data Caches for Superscalar Processors," ICS 1997}} ===== For Lecture 14 ===== == Required Readings == * {{markovpredictors.pdf|Joseph and Grunwald, "Prefetching using Markov Predictors,' ISCA 1997}} == Recommended Readings == * {{compileralgorithmforprefetching.pdf| Mowry et al., "Design and Evaluation of a Compiler Algorithm for Prefetching," ASPLOS 1992}} * {{feedbackdirectedprefetching.pdf| Srinath et al., "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers", HPCA 2007}} * {{runaheadexecution.pdf|Mutlu et al., "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors," HPCA 2003}} ===== For Lecture 15 ===== Same as previous lecture ===== For Lecture 16 ===== == Recommended Readings == * {{statelesscontentdirectedprefetching.pdf|Cooksey et al., "A stateless, content-directed data prefetching mechanism," ASPLOS 2002}} * {{bandwidthefficientprefetching.pdf|Ebrahimi et al., "Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems," HPCA 2009}} * {{softwarecontrolledpreexecution.pdf|Luk, "Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors," ISCA 2001}} ===== Guest Lecture by Thomas Moscibroda ===== == Recommended Readings == * {{bless.pdf|Moscibroda and Mutlu, "A Case for Bufferless Routing in On-Chip Networks", ISCA 2009}} * {{appawareprioritizationmechanismfornocs.pdf|Das et al., "Application-Aware Prioritization Mechanism for On-Chip Networks", MICRO 2009}} * {{aergia.pdf|Das et al. "Aergia: Exploiting Packet-Latency Slack in On-Chip Networks", ISCA 2010}} * {{nextgenerationnoc.pdf|Nychis et al., "Next Generation On-Chip Networks: What Kind of Congestion Control do we Need?", Hotnets 2010}} ===== For Lecture 17 ===== == Recommended Readings == * {{coordinatedprefetchermanagement.pdf|Ebrahimi et al., "Coordinated Management of Multiple Prefetchers in Multi-Core Systems," MICRO 2009}} ===== For Lecture 18 ===== == Required Readings == * {{utilitybasedcachepartitioning.pdf|Qureshi and Patt, "Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches," MICRO 2006}} == Recommended Readings == * {{gaininginsightsintocachepartitioning.pdf|Lin et al., "Gaining Insights into Multi-Core Cache Partitioning:Bridging the Gap between Simulation and Real Systems," HPCA 2008}} * {{adaptiveinsertionpolicies.pdf|Qureshi et al., "Adaptive Insertion Policies for High-Performance Caching," ISCA 2007}} ===== For Lecture 19 ===== == Required Readings == * {{parbs.pdf|Mutlu and Moscibroda, "Parallelism-Aware Batch Scheduling:Enabling High-Performance and Fair Memory Controllers," IEEE Micro Top Picks 2009}} * {{stalltimefairmemoryscheduling.pdf| Mutlu and Moscibroda, "Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors," MICRO 2007}} == Recommended Readings == * {{permutationbasedpageinterleaving.pdf|Zhang et al., "A Permutation-based Page Interleaving Scheme to Reduce Row-buffer Conflicts and Exploit Data Locality," MICRO 2000}} * {{prefetchawaredramcontrollers.pdf|Lee et al., "Prefetch-Aware DRAM Controllers," MICRO 2008}} * {{memoryaccessscheduling.pdf|Rixner et al., "Memory Access Scheduling," ISCA 2000}} ===== For Lecture 20 ===== Same as previous lecture ===== For Lecture 21 ===== == Required Readings == * {{evaluationoftracecachefetchmechanisms.pdf|Patel et al., "Evaluation of design options for the trace cache fetch mechanism," IEEE TC 1999}} * {{complexityeffectivesuperscalar.pdf|Palacharla et al., "Complexity Effective Superscalar Processors," ISCA 1997}} == Required Readings (old) == * {{microarchitectureofsuperscalar.pdf|Smith and Sohi, "The Microarchitecture of Superscalar Processors," Proc IEEE 1995}} * {{onpipeliningdynamicinstructionschedulinglogic.pdf|Stark, Brown, Patt, "On pipelining dynamic instruction scheduling logic," MICRO 2000}} * {{Themicroarchitectureofthepentium4processor.pdf|Boggs et al., "The microarchitecture of the Pentium 4 processor," Intel Technology Journal, 2001}} * {{21264microprocessor.pdf|Kessler, "The Alpha 21264 microprocessor," IEEE Micro, March-April 1999}} == Recommended Readings == * {{tracecache.pdf|Rotenberg et al., "Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching," MICRO 1996}} ===== For Lecture 21 ===== Same as previous lecture ===== For Lecture 22 ===== Same as previous lecture ===== For Lecture 23 ===== Same as previous lecture ===== For Lecture 24 ===== == Required Readings == * {{conbiningbranchpredictors.pdf|McFarling, "Combining Branch Predictors," DEC WRL TR, 1993}} * {{increasingprocessorperformance.pdf|Carmean and Sprangle, "Increasing Processor Performance by Implementing Deeper Pipelines," ISCA 2002}} == Recommended Readings == * {{analysisofcorrelationandpredictability.pdf|Evers et al., "An Analysis of Correlation and Predictability: What Makes Two-Level Branch Predictors Work," ISCA 1998}} * {{alternativeimplementationoftwolevelbp.pdf|Yeh and Patt, "Alternative Implementations of Two-Level Adaptive Branch Prediction," ISCA 1992}} * {{availableilpforsuperscalar.pdf|Jouppi and Wall, "Available instruction-level parallelism for superscalar and superpipelined machines," ASPLOS 1989}} * {{divergemergeprocessors.pdf|Kim et al., "Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths," MICRO 2006}} * {{dynamicbranchpredictionwithperceptrons.pdf|Jimenez and Lin, "Dynamic Branch Prediction with Perceptrons," HPCA 2001}} ===== For Lecture 25 ===== Same as previous lecture ===== For Lecture 26 ===== === Control Flow III === == Recommended Readings == * {{wishbranches.pdf|Kim et al., "Wish Branches: Enabling Adaptive and Aggressive Predicated Execution," IEEE Micro Top Picks, Jan/Feb 2006}} * {{divergemergeprocessors.pdf|Kim et al., "Diverge-Merge Processor: Generalized and Energy-Efficient Dynamic Predication," IEEE Micro Top Picks, Jan/Feb 2007}} === Alternative Approaches to Concurrency === == Required Readings == * {{vliweli.pdf|Fisher, "Very Long Instruction Word architectures and the ELI-512," ISCA 1983}} * {{introducingia64.pdf|Huck et al., "Introducing the IA-64 Architecture," IEEE Micro 2000}} == Recommended Readings == * {{cray1computersystem.pdf|Russell, "The CRAY-1 computer system," CACM 1978}} * {{ilpprocessing.pdf|Rau and Fisher, "Instruction-level parallel processing: history,overview, and perspective," Journal of Supercomputing, 1993}} * {{instructionschedulingforilpprocessors.pdf|Faraboschi et al., "Instruction Scheduling for Instruction Level Parallel Processors," Proc. IEEE, Nov. 2001}} ===== For Lecture 26 ===== Same as previous lecture (Alternative Approaches to Concurrency) ===== For Lecture 27 ===== == Required Readings == * {{nvidiatesla.pdf|Lindholm et al., "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro 2008}} * {{cray1computersystem.pdf|Russell, "The CRAY-1 computer system," CACM 1978}} == Recommended Readings == * {{dynamicwarpformation.pdf|Fung et al., "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow," MICRO 2007}} * {{qilin.pdf|Luk et al., "Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping," MICRO 2009}}