Differences
This shows you the differences between two versions of the page.
buzzwords [2010/11/11 05:13] lsubrama |
buzzwords [2010/12/01 23:06] (current) lsubrama |
||
---|---|---|---|
Line 311: | Line 311: | ||
===== Lecture 19 ==== | ===== Lecture 19 ==== | ||
- | + | Main memory system | |
- | Main memory system | + | |
* Memory hierarchy | * Memory hierarchy | ||
Line 330: | Line 329: | ||
- Memory controller placement | - Memory controller placement | ||
- | ==== Lecture 20 ===== | + | ===== Lecture 20 ===== |
* DRAM controller functions | * DRAM controller functions | ||
Line 355: | Line 354: | ||
===== Lecture 21 ==== | ===== Lecture 21 ==== | ||
- | Super scalar processing (I) | + | Super scalar processing I |
* Types of parallelism | * Types of parallelism | ||
Line 374: | Line 373: | ||
===== Lecture 22 ==== | ===== Lecture 22 ==== | ||
- | Super scalar processing (II) | + | Super scalar processing II |
* Trace Caches | * Trace Caches | ||
Line 400: | Line 399: | ||
- Micro op sequencer | - Micro op sequencer | ||
- Instruction buffering fetch and decode | - Instruction buffering fetch and decode | ||
+ | |||
+ | ===== Lecture 23 ==== | ||
+ | Superscalar Processing III | ||
+ | |||
+ | * Renaming multiple instructions | ||
+ | - dependency check logic (n^2 comparators) | ||
+ | - help from compiler | ||
+ | * ensure instructions are independent (difficult for wide fetches) | ||
+ | * hardware-software co-design to simplify dependency logic | ||
+ | |||
+ | * Dispatching multiple instructions | ||
+ | - wakeup logic (compare all tags in reservation station with all the tags that are broadcast) | ||
+ | - select logic (hierarchical tree based selection) | ||
+ | |||
+ | * Execute | ||
+ | - enough execution units | ||
+ | - enough forwarding paths (broadcast tag/value to all functional units) | ||
+ | |||
+ | * Reducing dispatch+bypass delays | ||
+ | - clustering (divide window into multiple clusters) | ||
+ | - intra-cluster bypass is fast | ||
+ | - inter-cluster bypass can be slow | ||
+ | |||
+ | * Register file | ||
+ | - need multiple reads/writes per cycle | ||
+ | - Replicate or partition the register files | ||
+ | - using block-structured ISA | ||
+ | |||
+ | * Retirement | ||
+ | - updating architectural register map | ||
+ | |||
+ | |||
+ | ===== Lecture 24 ==== | ||
+ | Control Flow | ||
+ | |||
+ | * Problem of branches | ||
+ | * Types | ||
+ | * conditional, unconditional, call, return, indirect branches | ||
+ | * Handling conditional branches | ||
+ | * Predicate combining | ||
+ | * condition codes vs condition registers | ||
+ | * Delayed branching | ||
+ | * Fine-grained multi-threading | ||
+ | * Branch prediction | ||
+ | * predicting if an instruction is a branch (predecoding) | ||
+ | * predicting the direction of the branch | ||
+ | * predicting the target address of a branch | ||
+ | * Static branch predition | ||
+ | * always taken/not taken | ||
+ | * backward taken, forward not taken | ||
+ | * by compiler based on profiling | ||
+ | * Dynamic branch prediction | ||
+ | * last time predictor | ||
+ | * history based predictors | ||
+ | * two-level predictors | ||
+ | ===== Lecture 25 ==== | ||
+ | Control Flow - II | ||
+ | |||
+ | * 2-bit counter based prediction | ||
+ | * Global branch prediction | ||
+ | * Global branch correlation | ||
+ | * Global two-level prediction | ||
+ | - Global history register | ||
+ | * Local two-level prediction | ||
+ | - Pattern history table | ||
+ | - Interference in the pattern history table | ||
+ | - Randomizing the index into the pattern history table | ||
+ | - Agree prediction | ||
+ | * Alpha 21264 Tournament Predictor | ||
+ | * Perceptron branch predictor | ||
+ | - Perceptron - learns a target boolean function of N inputs | ||
+ | * Call and Return Prediction | ||
+ | * Indirect branch prediction | ||
+ | - Virtual Conditional Branch prediction | ||
+ | * Branch prediction issues | ||
+ | - Need to know a branch as soon as it is fetched | ||
+ | - Latency | ||
+ | - State recovery upon misprediction | ||
+ | * Predicated execution | ||
+ | |||
+ | ==== Lecture 26 ==== | ||
+ | Control Flow - III & Concurrency | ||
+ | |||
+ | * Predicated Execution | ||
+ | - Predication decisions at the compiler | ||
+ | - Rename stage modifications | ||
+ | * Limitations of predication | ||
+ | - Adaptivity | ||
+ | - Complex Control Flow Graphs | ||
+ | - ISA support | ||
+ | * Wish branches | ||
+ | - Wish jump/join | ||
+ | - Wish loop | ||
+ | * Wish branches vs Predicated Execution | ||
+ | * Wish branches vs Branch prediction | ||
+ | * Diverge-Merge Processor | ||
+ | * Dynamic-Hammock | ||
+ | * Multi-path Execution | ||
+ | * Research issues in control flow handling | ||
+ | - Hardware/software cooperation | ||
+ | - Fetch gating | ||
+ | - Recycling useful work done on wrong path | ||
+ | Concurrency | ||
+ | * Classification of machines | ||
+ | - SISD | ||
+ | - SIMD | ||
+ | - MIMD | ||
+ | * Decoupled Access/Execute | ||
+ | * Astronautics ZS-1 | ||
+ | * Loop unrolling | ||
+ | |||
+ | ==== Lecture 27 ==== | ||
+ | VLIW | ||
+ | |||
+ | * Each VLIW instruction - a bundle of independent instructions (identified by compiler) | ||
+ | * Each instruction bundle executed by hardware in lockstep | ||
+ | * Commercial VLIW machines | ||
+ | - TIC6000, Trimedia, STMicro | ||
+ | * Intel IA-64 - Partially VLIW | ||
+ | * Encoding VLIW NOPs | ||
+ | * Static Instruction Scheduling for VLIW | ||
+ | * Code motion - Safety & Legality | ||
+ | * Trace scheduling | ||
+ | * List scheduling | ||
+ | * Super block scheduling | ||
+ | * Hyperblock scheduling | ||
+ | * The Intel IA-64 architecture | ||
+ | - No lock step execution of a bundle | ||
+ | - Specify dependencies between instructions within a bundle | ||
+ | - Template bits | ||
+ | * What hinder static mode motion? | ||
+ | - Exceptions | ||
+ | - Loads/Stores |