

| Preview            |                                            |  |  |  |  |  |  |
|--------------------|--------------------------------------------|--|--|--|--|--|--|
| Course information |                                            |  |  |  |  |  |  |
|                    | • Goals                                    |  |  |  |  |  |  |
|                    | Administrative info                        |  |  |  |  |  |  |
|                    | Materials                                  |  |  |  |  |  |  |
|                    | Grading                                    |  |  |  |  |  |  |
| ٠                  | Computer trends                            |  |  |  |  |  |  |
|                    | • Why is memory hierarchy such a big deal? |  |  |  |  |  |  |
| •                  | Preview of course                          |  |  |  |  |  |  |

# **Course Goals**

# • Deep understanding of data flows & storage within computer systems

- Key architectural principles:
  - Latency / Bandwidth / Replication / Balance / Hierarchy
- Applied to:

 Cache memory; Main memory; Buses; Vector processing; Virtual memory Multiprocessor coherence

# Demonstrate and apply principles

- Design tradeoffs: prediction & measurement
- · Software speedup using advanced memory architecture understanding

# Practice research/architecture skills

- · Running simulations & interpreting experimental data
- Focus more on analysis+experimentation than on design synthesis

| A | Administrative Notes                                                                                                                        |                      |                                                                                          |                                                                                 |                                  |  |  |  |  |  |
|---|---------------------------------------------------------------------------------------------------------------------------------------------|----------------------|------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------|----------------------------------|--|--|--|--|--|
| ٠ | People:                                                                                                                                     |                      |                                                                                          |                                                                                 |                                  |  |  |  |  |  |
|   | •                                                                                                                                           | TA:                  | Prof. Phil Koopman<br>Erik Riedel<br>Greg Mann<br>Karen Lindenfelser                     | koopman@cmu.edu<br>riedel+@CMU.EDU<br>gm3g+@andrew.cmu.edu<br>karen@ece.cmu.edu | HH D-202<br>WeH 8114<br>HH D-204 |  |  |  |  |  |
| • | <ul> <li>Course web page:</li> <li>http://www.ece.cmu.edu/~ece548</li> <li>Contains office hours and other important information</li> </ul> |                      |                                                                                          |                                                                                 |                                  |  |  |  |  |  |
| • |                                                                                                                                             | cmu.ece.c<br>– Manda | munications:<br>lass.ece548<br>tory reading for announcer<br>lerated; OK to use for disc | ments<br>ussion of class-related issues                                         |                                  |  |  |  |  |  |

# **Required Textbook: Cragon**

- Memory Systems and Pipelined Processors Cragon
  - · Newer textbook that concentrates on memory first
  - Good details, but better read as a "reference" than as a novel
  - If there is some obscure issue, Cragon probably discusses it...

### Coverage:

- 1) Memory Systems
- 2) Caches
- 3) Virtual Memory
- 4) Memory Addressing and I/O Coherency
- 5) Interleaved Memory and Disk Systems
- 11) Vector Processors

# **Recommended Text: Hennessy & Patterson Computer Architecture: a quantitative approach Hennessy & Patterson (2nd. edition)**A update of a classic in the field; based on quantitative simulation For our purposes, more breadth than depth Use for orientation to area; but skips many details You ought to have read the first half of the book (or have equivalent knowledge) as a pre-requisite for this course We're *not* going to go into the DLX architecture **Memory-Hierarchy Design**Storage Systems Multiprocessors App. B) Vector Processors

# **Simulation Tools**

# Dinero -- cache simulator

- Special version will be used
  - Not limited to 32-bit operation
  - Modified for multi-level cache simulations
- · Compiles & runs on most Unix platforms
  - Only supported versions will be on IBM PPC workstations in the "HP Lab" & DEC Alphas we will provide accounts on
  - Must run on a 64-bit processor to handle 64-bit address traces

### Atom -- program annotation tool

- · Annotates programs to record/generate information
  - We'll be using primarily to generate address traces for cache simulations
  - Does many other nifty things (see the man page...)
- Only works on DEC Alphas -- we will provide accounts
  - ECE students must have an ECE account
  - We're working with ECE facilities to create accounts for CS students

# Grading

# Grade distribution:

- 20% first test
- 25% second test
- 25% third test
- 10% distributed among approximately 11 weekly homework sets
- 20% distributed among 5 lab assignments (note: no physical "lab" room)
- 5-point grading system per question

# • Assignments must be handed in on time

- Homeworks due Wednesday *in class* on due date; solutions handed out Fridays
- Labs due Friday afternoons at 3 PM to course secretary; solutions next Friday
- Late materials accepted until solutions handed out; 10% penalty/day late
- Don't run computer simulations at the last minute!
  - Expect machines to be overloaded the night before an assignment is due

# All products must be a result of your own efforts

• However, you may ask for general guidance from staff & fellow students



| Assignments |                                                                                                                                                                                                                                                                                                   |  |  |  |
|-------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| •           | <ul> <li>By next class read about Key Concepts:</li> <li>Hennessy &amp; Patterson: page 7, Section 1.7</li> <li>Cragon, Chapter 1</li> </ul>                                                                                                                                                      |  |  |  |
|             | • Supplemental reading: All of Hennessy & Patterson Chapter 1                                                                                                                                                                                                                                     |  |  |  |
| •           | Homework due by next class:<br>Send e-mail information to:<br>riedel+@cmu.edu, koopman@cmu.edu                                                                                                                                                                                                    |  |  |  |
|             | <ul> <li>Full name &amp; preferred nickname + pronunciation</li> <li>Preferred e-mail address (not necessarily ECE or even CMU)</li> <li>ECE account name if ECE student</li> <li>CS and Andrew account name if CS student</li> <li>One sentence on area of graduate research (if any)</li> </ul> |  |  |  |







| Computer Performance                                                                                                            |  |  |  |  |
|---------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| <ul> <li>task time = number of instructions executed</li> <li>* clocks per instruction (CPI)</li> <li>* clock period</li> </ul> |  |  |  |  |
| <ul> <li>Number of instructions depends on ISA, language, compiler</li> </ul>                                                   |  |  |  |  |
| Overall clocks per instruction:                                                                                                 |  |  |  |  |
| • Instruction complexity (usually 1 clock, but can be longer)                                                                   |  |  |  |  |
| • Instruction issue rate (superscalar may be > 1 issue/clock)                                                                   |  |  |  |  |
| Data dependence & resource stalls                                                                                               |  |  |  |  |
| • *Instruction fetch latency                                                                                                    |  |  |  |  |
| • *Data fetch/store latency                                                                                                     |  |  |  |  |
| Clock period                                                                                                                    |  |  |  |  |
| • Logic critical path (e.g., hardware multiplier) & clock distribution tree                                                     |  |  |  |  |
| • *First level cache cycle time                                                                                                 |  |  |  |  |
| * = emphasized in this course                                                                                                   |  |  |  |  |





















| Cache Organization                                                    |                   |                                |               |                 |                                          |  |  |  |  |  |
|-----------------------------------------------------------------------|-------------------|--------------------------------|---------------|-----------------|------------------------------------------|--|--|--|--|--|
| <ul> <li>Cache organized as sectors, blocks, and sets</li> </ul>      |                   |                                |               |                 |                                          |  |  |  |  |  |
| • Each sector corresponds to a location in memory (tag holds address) |                   |                                |               |                 |                                          |  |  |  |  |  |
| Cache lookup searches set to find matching address                    |                   |                                |               |                 |                                          |  |  |  |  |  |
|                                                                       |                   |                                |               |                 |                                          |  |  |  |  |  |
|                                                                       |                   |                                |               |                 |                                          |  |  |  |  |  |
| <ul> <li>Miss penalty clock cycles to process a miss</li> </ul>       |                   |                                |               |                 |                                          |  |  |  |  |  |
|                                                                       |                   |                                |               |                 |                                          |  |  |  |  |  |
|                                                                       | SECTOR 0 SECTOR 1 |                                |               |                 |                                          |  |  |  |  |  |
|                                                                       |                   |                                |               |                 |                                          |  |  |  |  |  |
|                                                                       | BLOCK 0 BLOCK 1   |                                |               | BLOCK 0 BLOCK 1 |                                          |  |  |  |  |  |
|                                                                       |                   |                                |               |                 |                                          |  |  |  |  |  |
| SET 0<br>SET 1                                                        | TAG<br>TAG        | V D WORD WORD<br>V D WORD WORD |               | TAG V D WORD    | WORD V D WORD WORD<br>WORD V D WORD WORD |  |  |  |  |  |
| SET 1                                                                 | TAG               | V D WORD WORD                  |               | TAG V D WORD    |                                          |  |  |  |  |  |
| SET 2                                                                 | TAG               | V D WORD WORD                  |               | TAG V D WORD    |                                          |  |  |  |  |  |
| SET 4                                                                 | TAG               | V D WORD WORD                  |               | TAG V D WORD    |                                          |  |  |  |  |  |
| SET 5                                                                 | TAG               | V D WORD WORD                  | V D WORD WORD | TAG V D WORD    | WORD V D WORD WORD                       |  |  |  |  |  |
| SET 6                                                                 | TAG               | V D WORD WORD                  | V D WORD WORD | TAG V D WORD    | WORD V D WORD WORD                       |  |  |  |  |  |
| SET 7                                                                 | TAG               | V D WORD WORD                  | V D WORD WORD | TAG V D WORD    | WORD V D WORD WORD                       |  |  |  |  |  |
|                                                                       |                   |                                |               |                 |                                          |  |  |  |  |  |
|                                                                       |                   |                                |               |                 |                                          |  |  |  |  |  |
|                                                                       |                   |                                |               |                 |                                          |  |  |  |  |  |

# **Major Cache Design Decisions**

- Cache Size -- in bytes
- Split/Unified -- instructions and data in same cache?
- Associativity -- how many sectors in each set?
- Sector/Block size -- how many bytes grouped together as a unit?

### Management policies

- Choosing victim for replacement on cache miss
- Handling writes (when is write accomplished?; is written data cached?)

# • How many levels of cache?

• Perhaps L1 cache on-chip; L2 cache on-module; L3 cache on motherboard

















**Important Lessons in System Architecture** • Every level of memory hierarchy employs similar principles • Registers • Cache • Main memory • Disks • Multiprocessors Address translation permits automatic management of memory • Caches • Virtual memory • Multiprocessing • A balance between latency and bandwidth is crucial • Concurrency is an effective tool for improving performance • Replication of resources • Pipelining of individual resources · Prefetching and delayed storing of data



# **Review**

# • CPU / memory speed gap is increasing in size

- Clock speed & superscalar CPUs require more from memory
- But, memory isn't keeping up with the demands
- Memory hierarchies are the obvious solution
  - Registers
  - Cache
  - Main memory
  - Disks
  - Multiprocessor memories

# • Key concepts apply at all levels

- Latency -- delay
- Bandwidth -- data moved per unit time
- Concurrency -- multiple units (or pipelining to re-use a single unit)
- Balance -- avoiding performance bottlenecks