Readings

Lecture 1 (Wed, Jan 18, 2012)

Required:

  • (none)

Mentioned/Recommended during lecture:

  • Joseph Fisher's ISCA 1983 paper in lecture today while discussing vector machines and VLIW. For those who are curious, the original paper can be found here.
  • Moscibroda and Mutlu, “Memory Performance Attacks: Denial of memory service in multi-core systems,” USENIX Security 2007. pdf
  • Pettis and Hansen, “Profile Guided Code Positioning,” PLDI 1990. pdf

Lecture 2 (Mon, Jan 23, 2012)

Required:

  • Patt, “Requirements, bottlenecks, and good fortune: agents for microprocessor evolution,” Proceedings of the IEEE, vol. 89, no. 11, 2001. pdf
  • Patt and Patel, chapter 1 (Fundamentals) (Scanned copy placed on blackboard under assignments)
  • Patterson and Hennessy, chapters 1 and 2 (Intro, Abstractions, ISA, MIPS)

Mentioned/Recommended during lecture

  • Moore, “Cramming more components onto integrated circuits,” Electronics Magazine, 1965. pdf
  • Book: Kuhn, “The structure of scientific revolutions” (1962).
  • Burks, Goldstein, von Neumann, “Preliminary discussion of the logical design of an electronic computing instrument,” 1946. pdf

Lecture 3 (Wed, Jan 25, 2012)

Required:

  • Patt and Patel, chapter 4 (The von Neumann Model) (Scanned copy placed on blackboard under assignments)

Mentioned/Recommended during lecture:

  • Digital Equipment Corp., “VAX11 780 Architecture Handbook,” 1977-78. pdf

Lecture 4 (Mon, Jan 30, 2012)

Required:

  • Patterson and Hennessy, chapter 4.1-4.4
  • Patt and Patel, appendix A (The LC-3b ISA) pdf
  • Patt and Patel, appendix C (The Microarchitecture of the LC-3b) pdf

Mentioned during lecture:

  • Henry. M. Levy, “Capability-Based Computer Systems” online book.
  • Dynamic/Static interface: Stephen W. Melvin and Yale Patt, “A Clarification of the Dynamic/Static Interface”, HICSS'87. pdf
  • Wulf, “Compilers and Computer Architecture”. pdf
  • Radin, “The 801 Minicomputer”. pdf
  • Klaiber, “The Technology Behind Crusoe Processors” (Transmeta Whitepaper). pdf

Lecture 5 (Wed, Feb 1, 2012)

Required:

  • Patt and Patel, appendix C (The Microarchitecture of the LC-3b) pdf
  • Patterson and Hennessy, appendix D

Recommended:

  • Maurice Wilkes, “The Best Way to Design an Automatic Calculating Machine,” Manchester Univ. Computer Inaugural Conf., 1951. pdf

Lecture 7 (Wed, Feb 8, 2012)

Required:

  • Patterson & Hennessy, chapter 4.5-4.8

Lecture 9 (Wed, Feb 15, 2012)

Required:

  • Pipelined LC-3b Microarchitecture Handout pdf

Recommended:

  • Hamacher et al. book, Chapter 6, “Pipelining”.

Lecture 11 (Wed, Feb 22, 2012)

Recommended/Mentioned during lecture:

  • Book: “The soul of a new machine”, Tracy Kidder (1981)
  • Smith, “A Study of Branch Prediction Strategies,” ISCA 1981. pdf
  • Yeh and Patt, “Two-Level Adaptive Training Branch Prediction,” MICRO 1991. pdf
  • McFarling, “Combining Branch Predictors,” DEC WRL TR 1993. pdf

Lecture 12 (Mon, Feb 27, 2012)

Required:

  • Patterson & Hennessy, Chapter 4.9-4.11
  • Smith and Sohi, “The Microarchitecture of Superscalar Processors,” Proceedings of the IEEE, 1995. pdf

Recommended/Mentioned during lecture:

  • Kessler, “The Alpha 21264 Microprocessor.” IEEE Micro 1999. pdf
  • Riseman and Foster, “The inhibition of potential parallelism by conditional jumps,” IEEE Transactions on Computers, 1972. pdf
  • Chang et al., “Target prediction for indirect jumps,” ISCA 1997. pdf

Lecture 13 (Wed, Feb 29, 2012)

Recommended/Mentioned during lecture:

  • Smith and Plezskun, “Implementing Precise Interrupts in Pipelined Processors,” IEEE Trans on Computers 1988 pdf and ISCA 1985. pdf
  • Hwu and Patt, “Checkpoint Repair for Out-of-order Execution Machines,” ISCA 1987. pdf
  • Tomasulo, “An Efficient Algorithm for Exploiting Multiple Arithmetic Units,” IBM Journal of R&D, Jan. 1967. pdf
  • Patt et al., “HPS, a new microarchitecture: rationale and introduction,” MICRO 1985. pdf
  • Patt et al., “Critical issues regarding HPS, a high performance microarchitecture,” MICRO 1985. pdf

Lecture 15 (Mon, Mar 19, 2012)

Recommended/Mentioned during lecture:

  • Chrysos and Emer, “Memory Dependence Prediction Using Store Sets,” ISCA 1998. pdf
  • Moshovos et al., “Dynamic speculation and synchronization of data dependences,” ISCA 1997. pdf
  • Kessler, “The Alpha 21264 Microprocessor,” IEEE Micro, March-April 1999. pdf
  • Smith and Sohi, “The Microarchitecture of Superscalar Processors,” Proc. IEEE, Dec. 1995. pdf
  • Hwu and Patt, “HPSm, a high performance restricted data flow architecture having minimal functionality,” ISCA 1986. pdf
  • Gurd et al., “The Manchester prototype dataflow computer,” CACM 1985. pdf
  • Mike Flynn, “Very High-Speed Computing Systems,” Proc. of IEEE, 1966. pdf

Lecture 16 (Wed, Mar 21, 2012)

Required:

  • Cache chapters from Patterson & Hennessy: 5.1-5.3
  • Memory/cache chapters from Hamacher et al.: 8.1-8.7 (Scanned copy placed on blackboard under assignments)
  • Wilkes, “Slave Memories and Dynamic Storage Allocation,” IEEE Trans. On Electronic Computers, 1965. pdf

Mentioned/Recommended during lecture:

  • Fisher, “Very Long Instruction Word architectures and the ELI-512,” ISCA 1983. pdf
  • Russell, “The CRAY-1 computer system,” CACM 1978. pdf

Lecture 17 (Mon, Mar 26, 2012)

Mentioned/Recommended during lecture:

  • Liptay, “Structural aspects of the System/360 Model 85: II: The cache.” IBM Systems Journal 7(1), 1968. pdf
  • Qureshi et al., “A Case for MLP-Aware Cache Replacement,“ ISCA 2006. pdf

Lecture 18 (Wed, Mar 28, 2012)

Mentioned/Recommended during lecture:

  • Kroft, “Lockup-Free Instruction Fetch/Prefetch Cache Organization,” ISCA 1981. pdf
  • Juan et al., “Data caches for superscalar processors,” ICS 1997. pdf

Lecture 19 (Mon, Apr 2, 2012)

Mentioned/Recommended during lecture:

  • Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012. (on blackboard under assignments - raidr.pdf)
  • B. H. Bloom, “Space/time trade-offs in hash coding with allowable errors,” Communications of the ACM, vol. 13, no. 7, 1970. pdf

Lecture 20 (Wed, Apr 4, 2012)

Required:

  • Mutlu and Moscibroda, “Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems,” ISCA 2008. pdf

Mentioned/Recommended during lecture:

  • Seshadri et al., “The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing,” CMU Technical Report, 2012 (on blackboard under assignments - eaf-cache.pdf).
  • Moscibroda and Mutlu, “Memory Performance Attacks: Denial of memory service in multi-core systems,” USENIX Security 2007. pdf
  • Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007. pdf
  • W.E.Smith, “Various optimizers for single stage production,”Naval Research Logistics Quarterly, 1956. pdf
  • Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. pdf
  • Ebrahimi et al., “Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems,” ASPLOS 2010. pdf

Lecture 21 (Mon, Apr 9, 2012)

Required:

  • Section 5.4 in Patterson & Hennessy

Mentioned/Recommended during lecture:

  • Section 8.8 in Hamacher et al.

Lecture 22 (Mon, Apr 16, 2012)

Mentioned/Recommended during lecture:

  • Mowry et al., “Design and Evaluation of a Compiler Algorithm for Prefetching,” ASPLOS 1992. pdf
  • Jouppi, “Improving direct-Mapped cache performance by the addition of a small fully-associative cache and prefetch buffers,” ISCA 1990. pdf
  • Baer and Chen, “An effective on-chip preloading scheme to reduce data access penalty,” SC 1991. pdf
  • Srinath et al., “Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers“, HPCA 2007. pdf

Lecture 23 (Wed, Apr 18, 2012)

Required:

  • Mutlu et al., “Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors”, HPCA 2003. pdf

Mentioned/Recommended during lecture:

  • Joseph and Grunwald, “Prefetching using Markov Predictors,” ISCA 1997. pdf
  • Cooksey et al., “A stateless, content-directed data prefetching mechanism,” ASPLOS 2002. pdf
  • Ebrahimi et al., “Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems,” HPCA 2009. pdf
  • Dubois and Song, “Assisted Execution,” USC Tech Report 1998. pdf
  • Chappell et al., “Simultaneous Subordinate Microthreading (SSMT),” ISCA 1999. pdf
  • Zilles and Sohi, “Execution-based Prediction Using Speculative Slices”, ISCA 2001. pdf
  • Luk, “Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors,” ISCA 2001. pdf
  • Zilles and Sohi, ”Understanding the backward slices of performance degrading instructions,” ISCA 2000. pdf

Lecture 24 (Mon, Apr 23, 2012)

Required:

  • Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” AFIPS 1967. pdf
  • Hill, Jouppi, Sohi, “Multiprocessors and Multicomputers,” pp. 551-560 in Readings in Computer Architecture (PDF in AFS: /afs/ece/class/ece447/readings/hill_551_560.pdf).
  • Hill, Jouppi, Sohi, “Dataflow and Multithreading,” pp. 309-314 in Readings in Computer Architecture (PDF in AFS: /afs/ece/class/ece447/readings/hill_309_314.pdf).
  • Culler and Singh, Parallel Computer Architecture: Chapters 5.1, 5.3. (PDF in AFS: /afs/ece/class/ece447/readings/culler-mesi.pdf)
  • P&H, Chapter 5.8 (pp. 534 – 538).
  • Lamport, “How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs,” IEEE Transactions on Computers, 1979. pdf

Recommended:

  • Papamarcos and Patel, “A low-overhead coherence solution for multiprocessors with private cache memories,” ISCA 1984. pdf
  • Mutlu et al., “Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance,” ISCA 2005, IEEE Micro Top Picks 2006. pdf
  • Mutlu et al., “Address-Value Delta (AVD) Prediction,” MICRO 2005. pdf
  • Armstrong et al., “Wrong Path Events,” MICRO 2004. pdf
  • Horner, “A new method of solving numerical equations of all orders, by continuous approximation,” Philosophical Transactions of the Royal Society, 1819. pdf

Lecture 26 (Mon, Apr 30, 2012)

  • Goodman, “Using cache memory to reduce processor-memory traffic,” ISCA 1983. pdf
  • Censier and Feutrier, “A New Solution to Coherence Problems in Multicache Systems,” IEEE Transactions on Computers, Dec. 1978. pdf
  • Janak H. Patel,“Processor-Memory Interconnections for Multiprocessors,” ISCA 1979. pdf
  • Gottlieb et al. “The NYU Ultracomputer-designing a MIMD, shared-memory parallel machine,” ISCA 1982. pdf
  • Thinking Machines Corp., “The Connection Machine CM-5 Technical Summary,” Jan. 1992.
  • Seitz, “The Cosmic Cube,” CACM 1985. pdf

Lecture 27 (Wed, May 2, 2012)

  • Baran, “On Distributed Communications Networks,” IEEE Transactions on Communications Systems, March 1964. pdf
  • Grochowski et al., “Best of both Latency and Throughput,” ICCD 2004. pdf
  • Tendler et al., “POWER4 system microarchitecture,” IBM Journal of R&D, 2002. pdf
  • Kalla et al., “IBM Power5 Chip: A Dual-Core Multithreaded Processor,” IEEE Micro 2004. pdf
  • Kongetira et al., “Niagara: A 32-Way Multithreaded SPARC Processor,” IEEE Micro 2005. pdf
  • Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures,” ASPLOS 2009, IEEE Micro Top Picks 2010. pdf
  • Suleman et al., “Data Marshaling for Multi-Core Architectures,” ISCA 2010, IEEE Micro Top Picks 2011. pdf