Readings
Lecture 1 (Wed, Jan 18, 2012)
Required:
- (none)
Mentioned/Recommended during lecture:
- Joseph Fisher's ISCA 1983 paper in lecture today while discussing vector machines and VLIW. For those who are curious, the original paper can be found here.
- Moscibroda and Mutlu, “Memory Performance Attacks: Denial of memory service in multi-core systems,” USENIX Security 2007. pdf
- Pettis and Hansen, “Profile Guided Code Positioning,” PLDI 1990. pdf
Lecture 2 (Mon, Jan 23, 2012)
Required:
- Patt, “Requirements, bottlenecks, and good fortune: agents for microprocessor evolution,” Proceedings of the IEEE, vol. 89, no. 11, 2001. pdf
- Patt and Patel, chapter 1 (Fundamentals) (Scanned copy placed on blackboard under assignments)
- Patterson and Hennessy, chapters 1 and 2 (Intro, Abstractions, ISA, MIPS)
Mentioned/Recommended during lecture
Lecture 3 (Wed, Jan 25, 2012)
Required:
- Patt and Patel, chapter 4 (The von Neumann Model) (Scanned copy placed on blackboard under assignments)
Mentioned/Recommended during lecture:
- Digital Equipment Corp., “VAX11 780 Architecture Handbook,” 1977-78. pdf
Lecture 4 (Mon, Jan 30, 2012)
Required:
- Patterson and Hennessy, chapter 4.1-4.4
- Patt and Patel, appendix A (The LC-3b ISA) pdf
- Patt and Patel, appendix C (The Microarchitecture of the LC-3b) pdf
Mentioned during lecture:
- Henry. M. Levy, “Capability-Based Computer Systems” online book.
- Dynamic/Static interface: Stephen W. Melvin and Yale Patt, “A Clarification of the Dynamic/Static Interface”, HICSS'87. pdf
- Wulf, “Compilers and Computer Architecture”. pdf
- Radin, “The 801 Minicomputer”. pdf
- Klaiber, “The Technology Behind Crusoe Processors” (Transmeta Whitepaper). pdf
Lecture 5 (Wed, Feb 1, 2012)
Required:
- Patt and Patel, appendix C (The Microarchitecture of the LC-3b) pdf
- Patterson and Hennessy, appendix D
Recommended:
- Maurice Wilkes, “The Best Way to Design an Automatic Calculating Machine,” Manchester Univ. Computer Inaugural Conf., 1951. pdf
Lecture 7 (Wed, Feb 8, 2012)
Lecture 9 (Wed, Feb 15, 2012)
Required:
- Pipelined LC-3b Microarchitecture Handout pdf
Recommended:
- Hamacher et al. book, Chapter 6, “Pipelining”.
Lecture 11 (Wed, Feb 22, 2012)
Lecture 12 (Mon, Feb 27, 2012)
Required:
- Patterson & Hennessy, Chapter 4.9-4.11
- Smith and Sohi, “The Microarchitecture of Superscalar Processors,” Proceedings of the IEEE, 1995. pdf
Recommended/Mentioned during lecture:
Lecture 13 (Wed, Feb 29, 2012)
Recommended/Mentioned during lecture:
- Hwu and Patt, “Checkpoint Repair for Out-of-order Execution Machines,” ISCA 1987. pdf
- Tomasulo, “An Efficient Algorithm for Exploiting Multiple Arithmetic Units,” IBM Journal of R&D, Jan. 1967. pdf
- Patt et al., “HPS, a new microarchitecture: rationale and introduction,” MICRO 1985. pdf
- Patt et al., “Critical issues regarding HPS, a high performance microarchitecture,” MICRO 1985. pdf
Lecture 15 (Mon, Mar 19, 2012)
Recommended/Mentioned during lecture:
- Chrysos and Emer, “Memory Dependence Prediction Using Store Sets,” ISCA 1998. pdf
- Moshovos et al., “Dynamic speculation and synchronization of data dependences,” ISCA 1997. pdf
- Kessler, “The Alpha 21264 Microprocessor,” IEEE Micro, March-April 1999. pdf
- Smith and Sohi, “The Microarchitecture of Superscalar Processors,” Proc. IEEE, Dec. 1995. pdf
- Hwu and Patt, “HPSm, a high performance restricted data flow architecture having minimal functionality,” ISCA 1986. pdf
- Gurd et al., “The Manchester prototype dataflow computer,” CACM 1985. pdf
- Mike Flynn, “Very High-Speed Computing Systems,” Proc. of IEEE, 1966. pdf
Lecture 16 (Wed, Mar 21, 2012)
Required:
- Cache chapters from Patterson & Hennessy: 5.1-5.3
- Memory/cache chapters from Hamacher et al.: 8.1-8.7 (Scanned copy placed on blackboard under assignments)
- Wilkes, “Slave Memories and Dynamic Storage Allocation,” IEEE Trans. On Electronic Computers, 1965. pdf
Mentioned/Recommended during lecture:
Lecture 17 (Mon, Mar 26, 2012)
Lecture 18 (Wed, Mar 28, 2012)
Lecture 19 (Mon, Apr 2, 2012)
Mentioned/Recommended during lecture:
- Liu et al., “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012. (on blackboard under assignments - raidr.pdf)
- B. H. Bloom, “Space/time trade-offs in hash coding with allowable errors,” Communications of the ACM, vol. 13, no. 7, 1970. pdf
Lecture 20 (Wed, Apr 4, 2012)
Required:
- Mutlu and Moscibroda, “Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems,” ISCA 2008. pdf
Mentioned/Recommended during lecture:
- Seshadri et al., “The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing,” CMU Technical Report, 2012 (on blackboard under assignments - eaf-cache.pdf).
- Moscibroda and Mutlu, “Memory Performance Attacks: Denial of memory service in multi-core systems,” USENIX Security 2007. pdf
- Mutlu and Moscibroda, “Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,” MICRO 2007. pdf
- W.E.Smith, “Various optimizers for single stage production,”Naval Research Logistics Quarterly, 1956. pdf
- Muralidhara et al., “Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning,” MICRO 2011. pdf
- Ebrahimi et al., “Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems,” ASPLOS 2010. pdf
Lecture 21 (Mon, Apr 9, 2012)
Required:
- Section 5.4 in Patterson & Hennessy
Mentioned/Recommended during lecture:
- Section 8.8 in Hamacher et al.
Lecture 22 (Mon, Apr 16, 2012)
Mentioned/Recommended during lecture:
- Mowry et al., “Design and Evaluation of a Compiler Algorithm for Prefetching,” ASPLOS 1992. pdf
- Jouppi, “Improving direct-Mapped cache performance by the addition of a small fully-associative cache and prefetch buffers,” ISCA 1990. pdf
- Baer and Chen, “An effective on-chip preloading scheme to reduce data access penalty,” SC 1991. pdf
- Srinath et al., “Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers“, HPCA 2007. pdf
Lecture 23 (Wed, Apr 18, 2012)
Required:
- Mutlu et al., “Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors”, HPCA 2003. pdf
Mentioned/Recommended during lecture:
- Joseph and Grunwald, “Prefetching using Markov Predictors,” ISCA 1997. pdf
- Cooksey et al., “A stateless, content-directed data prefetching mechanism,” ASPLOS 2002. pdf
- Ebrahimi et al., “Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems,” HPCA 2009. pdf
- Dubois and Song, “Assisted Execution,” USC Tech Report 1998. pdf
- Chappell et al., “Simultaneous Subordinate Microthreading (SSMT),” ISCA 1999. pdf
- Zilles and Sohi, “Execution-based Prediction Using Speculative Slices”, ISCA 2001. pdf
- Luk, “Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors,” ISCA 2001. pdf
- Zilles and Sohi, ”Understanding the backward slices of performance degrading instructions,” ISCA 2000. pdf
Lecture 24 (Mon, Apr 23, 2012)
Required:
- Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” AFIPS 1967. pdf
- Hill, Jouppi, Sohi, “Multiprocessors and Multicomputers,” pp. 551-560 in Readings in Computer Architecture (PDF in AFS: /afs/ece/class/ece447/readings/hill_551_560.pdf).
- Hill, Jouppi, Sohi, “Dataflow and Multithreading,” pp. 309-314 in Readings in Computer Architecture (PDF in AFS: /afs/ece/class/ece447/readings/hill_309_314.pdf).
- Culler and Singh, Parallel Computer Architecture: Chapters 5.1, 5.3. (PDF in AFS: /afs/ece/class/ece447/readings/culler-mesi.pdf)
- P&H, Chapter 5.8 (pp. 534 – 538).
- Lamport, “How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs,” IEEE Transactions on Computers, 1979. pdf
Recommended:
- Papamarcos and Patel, “A low-overhead coherence solution for multiprocessors with private cache memories,” ISCA 1984. pdf
- Mutlu et al., “Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance,” ISCA 2005, IEEE Micro Top Picks 2006. pdf
- Mutlu et al., “Address-Value Delta (AVD) Prediction,” MICRO 2005. pdf
- Armstrong et al., “Wrong Path Events,” MICRO 2004. pdf
- Horner, “A new method of solving numerical equations of all orders, by continuous approximation,” Philosophical Transactions of the Royal Society, 1819. pdf
Lecture 26 (Mon, Apr 30, 2012)
- Goodman, “Using cache memory to reduce processor-memory traffic,” ISCA 1983. pdf
- Censier and Feutrier, “A New Solution to Coherence Problems in Multicache Systems,” IEEE Transactions on Computers, Dec. 1978. pdf
- Janak H. Patel,“Processor-Memory Interconnections for Multiprocessors,” ISCA 1979. pdf
- Gottlieb et al. “The NYU Ultracomputer-designing a MIMD, shared-memory parallel machine,” ISCA 1982. pdf
- Thinking Machines Corp., “The Connection Machine CM-5 Technical Summary,” Jan. 1992.
- Seitz, “The Cosmic Cube,” CACM 1985. pdf
Lecture 27 (Wed, May 2, 2012)
- Baran, “On Distributed Communications Networks,” IEEE Transactions on Communications Systems, March 1964. pdf
- Grochowski et al., “Best of both Latency and Throughput,” ICCD 2004. pdf
- Tendler et al., “POWER4 system microarchitecture,” IBM Journal of R&D, 2002. pdf
- Kalla et al., “IBM Power5 Chip: A Dual-Core Multithreaded Processor,” IEEE Micro 2004. pdf
- Kongetira et al., “Niagara: A 32-Way Multithreaded SPARC Processor,” IEEE Micro 2005. pdf
- Suleman et al., “Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures,” ASPLOS 2009, IEEE Micro Top Picks 2010. pdf
- Suleman et al., “Data Marshaling for Multi-Core Architectures,” ISCA 2010, IEEE Micro Top Picks 2011. pdf