This shows you the differences between two versions of the page.
|
buzzwords [2013/01/25 15:19] justinme [Lecture 5 (1/25 Fri.)] |
buzzwords [2013/05/01 14:05] (current) jasonli1 |
||
|---|---|---|---|
| Line 113: | Line 113: | ||
| * Program execution time: #instructions * average_CPI * clock_cycle_time | * Program execution time: #instructions * average_CPI * clock_cycle_time | ||
| * Branch delay slot | * Branch delay slot | ||
| + | |||
| + | ===== Lecture 6 (1/28 Mon.) ===== | ||
| + | * Single-cycle microarchitecture | ||
| + | * Instruction processing cycle | ||
| + | * Combinational logic - hardwired control | ||
| + | * Sequential logic - microprogrammed control | ||
| + | * Critical path | ||
| + | * Memory latency | ||
| + | * Microarchitecture design principles - critical path design, common case design, balanced design | ||
| + | * Cycles per Instruction (CPI) vs frequency | ||
| + | * Pipelining | ||
| + | * Program execution time: #instructions * average_CPI * clock_cycle_time | ||
| + | * Multi-cycle microarchitecture | ||
| + | * Instruction processing cycle | ||
| + | * Microinstruction | ||
| + | * Microsequencing | ||
| + | * Control store | ||
| + | * Microsequencer | ||
| + | * Condition codes | ||
| + | * Simple LC-3b control datapath | ||
| + | |||
| + | ===== Lecture 7 (1/30 Wed.) ===== | ||
| + | * Multi-cycle microarchitecture | ||
| + | * Instruction processing cycle | ||
| + | * Behavior of the entire [multi-cycle microarchitecture] processor is specified by a finite state machine | ||
| + | * Microinstruction | ||
| + | * Microsequencing | ||
| + | * Control store | ||
| + | * Microsequencer | ||
| + | * Tri-state buffer | ||
| + | * Bus gating | ||
| + | * Difference between gating and loading | ||
| + | |||
| + | ===== Lecture 8 (2/4 Mon.) ===== | ||
| + | * Interrupt checking | ||
| + | * Unaligned memory accesses | ||
| + | * Memory-mapped I/O | ||
| + | * Updating/patching microcode on the field | ||
| + | * Horizontal/vertical microcode | ||
| + | * Nanocode and millicode | ||
| + | * Pipelining | ||
| + | * Ideal pipeline - identical operations, independent operations, uniformly partitionable suboperations | ||
| + | * Pipeline registers | ||
| + | * Pipeline control signals - decode once and buffer, or carry instructions and decode locally | ||
| + | * Pipeline external fragmentation (pipeline stages idle for some instructions) | ||
| + | * Pipeline internal fragmentation (some pipeline states too fast while clock cycle same) | ||
| + | * Inter-instruction dependencies need to be detected and handled | ||
| + | * Issues in pipeline design - number of stages, keeping pipeline correct and full, handling exceptions and interrupts | ||
| + | * Causes of pipeline stalls - resource contention, dependencies (control, data) | ||
| + | * Handling resource contention - duplicate resource, increase throughput, detect contention and stall contending stage | ||
| + | * Data dependencies - flow dependence (true dependence, read after write), output dependence (write after write), anti dependence (write after read) | ||
| + | |||
| + | ===== Lecture 9 (2/6 Wed.) ===== | ||
| + | * Anti and output dependence - limited number of architectural registers | ||
| + | * Control flow graph | ||
| + | * Compiler profiling | ||
| + | * Profile input set vs runtime input set | ||
| + | * Load hoisting | ||
| + | * Handling flow dependencies - detect and wait, detect and forward, detect and eliminate, predict and verify | ||
| + | * Fine-grained multithreading | ||
| + | * Software vs hardware based interlocking | ||
| + | * Scoreboarding | ||
| + | * Register renaming | ||
| + | * Combinational dependence check logic | ||
| + | * Register data forwarding | ||
| + | * Control dependence | ||
| + | * Pipeline stalls | ||
| + | * Data forwarding distance | ||
| + | * Data forwarding logic | ||
| + | |||
| + | ===== Lecture 10 (2/8 Fri.) ===== | ||
| + | * Register renaming | ||
| + | * Static vs dynamic scheduling | ||
| + | * Branch types - conditional, unconditional, call, return, indirect | ||
| + | * Handling control dependencies | ||
| + | * Branch prediction | ||
| + | * Stalling | ||
| + | * Branch delay slot | ||
| + | * Predicated execution | ||
| + | * Fine-grained multithreading | ||
| + | * Multipath execution | ||
| + | * Pipeline flushing | ||
| + | * Branch misprediction penalty | ||
| + | * Forward vs backward control flow | ||
| + | * Decrease number of branches - get rid of control flow instructions, convert control dependence to data dependence, predicate combining | ||
| + | * Wish branches - choose predicated execution or branch prediction | ||
| + | * Delayed branching with squashing | ||
| + | * Enhanced branch prediction - need to predict target address, branch direction, whether instruction is branch | ||
| + | * Branch Target Buffer (BTB) or Branch Target Address Cache | ||
| + | * Compile time vs run time branch direction prediction | ||
| + | |||
| + | ===== Lecture 11 (2/11 Mon.) ===== | ||
| + | * Static branch prediction | ||
| + | * Always not-taken | ||
| + | * Always taken | ||
| + | * Backward taken, forward not taken | ||
| + | * Profile-based (compiler) | ||
| + | * Program-based (program analysis based) | ||
| + | * Pragmas - programmer conveys hints | ||
| + | * Dynamic branch prediction | ||
| + | * Last time predictor | ||
| + | * Two-bit counter based prediction (saturating counter) | ||
| + | * Global branch correlation | ||
| + | * Global History Register (GHR) | ||
| + | * Pattern History Table (PHT) | ||
| + | * Intel Pentium Pro Branch Predictor - multiple PHTs | ||
| + | * Gshare predictor - GHR hashed with Branch PC | ||
| + | * Two-level Global History Predictor | ||
| + | * Local branch correlation | ||
| + | * Per-branch history register | ||
| + | * Hybrid branch predictor - multiple algorithms, choose "best" prediction | ||
| + | * Branch confidence estimation | ||
| + | * Branch misprediction penalty | ||
| + | * Alpha 21264 Tournament Predictor | ||
| + | * SPEC - Standard Performance Evaluation Corporation (CPU benchmark) | ||
| + | |||
| + | ===== Lecture 12 (2/13 Wed.) ===== | ||
| + | * Predicated execution - compiler converts control dependence into data dependence | ||
| + | * Conditional execution in ARM ISA | ||
| + | * Hammock branch | ||
| + | * Wish jump/join | ||
| + | * Multi-path execution | ||
| + | * Call and return prediction | ||
| + | * Direct calls - easy to predict | ||
| + | * Returns are indirect branches | ||
| + | * Prediction - Return Address Stack | ||
| + | * Indirect branch prediction | ||
| + | * Last resolved target | ||
| + | * History based target prediction | ||
| + | * Superscalar processor | ||
| + | * Multiple instruction fetch | ||
| + | * Multi-cycle execution | ||
| + | * Exceptions vs interrupts | ||
| + | * Precise exceptions/interrupts | ||
| + | * Make each operation take same amount of time time | ||
| + | * Reorder buffer (ROB) | ||
| + | * History buffer | ||
| + | * Future register file | ||
| + | * Checkpointing | ||
| + | * Instruction retirement (commit) | ||
| + | |||
| + | ===== Lecture 13 (2/15 Fri.) ===== | ||
| + | * Reorder buffer (ROB) | ||
| + | * Accessing ROB with register file | ||
| + | * Use indirection from RF to ROB | ||
| + | * Register renaming with ROB | ||
| + | * Architectural register ID -> Physical register ID | ||
| + | * Eliminates false dependencies | ||
| + | * In-order execution, out-of-order completion, in-order retirement | ||
| + | * History buffer (HB) | ||
| + | * Future file + ROB | ||
| + | * Checkpointing | ||
| + | * Maintaining speculative memory states | ||
| + | * Use reorder buffer to handle out-of-order memory operations | ||
| + | * Store/write buffer | ||
| + | |||
| + | ===== Lecture 14 (2/18 Mon.) ===== | ||
| + | * Preventing dispatch stalls | ||
| + | * Fine-grained multithreading | ||
| + | * Value prediction | ||
| + | * Stride predictor | ||
| + | * Compile-time instruction reordering | ||
| + | * Out of order execution (Dynamic scheduling) | ||
| + | * Restricted dataflow | ||
| + | * Latency tolerance | ||
| + | * Reservation station | ||
| + | * Reservation station entry | ||
| + | * Register renaming | ||
| + | * Tags associated with register value | ||
| + | * Register alias table (RAT) | ||
| + | * Tomasulo's algorithm | ||
| + | * Instruction window size | ||
| + | * Registers vs memory | ||
| + | * Memory dependence handling | ||
| + | * Memory disambiguation / unknown address problem | ||
| + | * Dependence of memory instructions (loads/stores) | ||
| + | |||
| + | ===== Lecture 15 (2/20 Wed.) ===== | ||
| + | * Out of order execution | ||
| + | * Memory dependence handling | ||
| + | * Content addressable memory | ||
| + | * Memory disambiguation | ||
| + | * Reservation stations | ||
| + | * Branch prediction | ||
| + | * Superscalar execution vs OoO execution | ||
| + | * Instruction level parallelism | ||
| + | * Dataflow (at the ISA level) | ||
| + | * Systolic arrays | ||
| + | * Stream processing | ||
| + | * Pool of unmatched tokens | ||
| + | * Token matching area | ||
| + | * Instruction fetch area | ||
| + | * MIT tagged token data flow architecture | ||
| + | * Irregular parallelism | ||
| + | * Data parallelism | ||
| + | * SIMD (Single instruction multiple data) | ||
| + | * Array vs vector preocessors | ||
| + | * VLIW (Very long instruction word) | ||
| + | * Vector precessor | ||
| + | * Vector registers | ||
| + | * Vector length register (VLEN) | ||
| + | * Vector stride register (VSTR) | ||
| + | |||
| + | ===== Lecture 16 (2/25 Mon.) ===== | ||
| + | * Virtual memory | ||
| + | * Virtual page vs physical page (frame) | ||
| + | * Address translation | ||
| + | * Page Table (Virtual addr -> Physical addr) | ||
| + | * Page fault | ||
| + | * Reads data from disk to memory | ||
| + | * Direct Memory Access | ||
| + | * Caches | ||
| + | * Block, set | ||
| + | * Block/line size, associativity | ||
| + | * Hit, miss | ||
| + | * Insertion, eviction | ||
| + | * Write through vs write back | ||
| + | * Locality - temporal and spatial | ||
| + | * Working set | ||
| + | * Address isolation | ||
| + | * Copy on write | ||
| + | * Before virtual memory | ||
| + | * Single-user machine | ||
| + | * Base and bound registers | ||
| + | * Segmented address space | ||
| + | |||
| + | ===== Lecture 17 (2/27 Wed.) ===== | ||
| + | * Virtual memory | ||
| + | * Segmentation | ||
| + | * Segment selectors | ||
| + | * Segment descriptors | ||
| + | * Privilege level (ring) | ||
| + | * Paging | ||
| + | * Physical vs virtual address space | ||
| + | * Physical vs virtual page | ||
| + | * Virtual page number (VPN) | ||
| + | * Physical page number (PPN) | ||
| + | * Virtual page offset == physical page offset | ||
| + | * Address translation - VPN -> PPN | ||
| + | * Page Table | ||
| + | * Multi-level page table | ||
| + | * Translation Lookaside Buffer (TLB) | ||
| + | * Homonym problem | ||
| + | |||
| + | ===== Lecture 18 (3/1 Fri.) ===== | ||
| + | * Translation | ||
| + | * Two-level page table | ||
| + | * Page directory | ||
| + | * Multi-level page table | ||
| + | * Page Directory Base Register | ||
| + | * Translation: Segmentation + Paging | ||
| + | * Protection | ||
| + | * Privilege levels | ||
| + | * Page Directory Entry (PDE) | ||
| + | * Page Table Entry (PTE) | ||
| + | * Read/Write | ||
| + | * User/Supervisor | ||
| + | * Protection: PDE + PTE | ||
| + | * Protection: Segmentation + Paging | ||
| + | * Translation Lookaside Buffer (TLB) | ||
| + | * Context switch | ||
| + | * Flush/invalidate | ||
| + | * TLB miss: HW-managed vs. SW-managed | ||
| + | * Page walk | ||
| + | * TLB replacement | ||
| + | * Page fault | ||
| + | * Page fault handler | ||
| + | * Demand paging | ||
| + | * Swapping | ||
| + | * Thrashing | ||
| + | * Page size | ||
| + | * Internal fragmentation | ||
| + | * Memory Management Unit (MMU) | ||
| + | |||
| + | ===== Lecture 19 (3/18 Mon.) ===== | ||
| + | * Vector registers | ||
| + | * Vector data register | ||
| + | * Vector control registers | ||
| + | * Vector functional units | ||
| + | * Amdahl's Law | ||
| + | * Sequential bottleneck | ||
| + | * Vector memory system | ||
| + | * Memory banking | ||
| + | * Address generator | ||
| + | * Base and stride | ||
| + | * Vectorizable loops | ||
| + | * Vector chaining | ||
| + | * Vector stripmining | ||
| + | * Scatter/gather operations | ||
| + | * Masked operations | ||
| + | * Matrix storage | ||
| + | * Row major | ||
| + | * Column major | ||
| + | * Vector instruction level parallelism | ||
| + | * Automatic code vectorization | ||
| + | * Graphics processing units | ||
| + | * Single instruction multiple threads | ||
| + | * Thread warps | ||
| + | * SIMT memory access | ||
| + | |||
| + | ===== Lecture 20 (3/20 Wed.) ===== | ||
| + | * SIMT (Single Instruction Multiple Thread) | ||
| + | * SPMD (Single Procedure Multiple Data) | ||
| + | * Thread warps | ||
| + | * Branch divergence in warps | ||
| + | * Branch divergence handing - dynamic predicated execution | ||
| + | * Dynamic warp formation | ||
| + | * Memory access divergence in warp | ||
| + | * NVIDIA GPU terminology | ||
| + | * Streaming multiprocessor (SM) | ||
| + | * Warp | ||
| + | * Thread context | ||
| + | * Critical section | ||
| + | * Heterogeneous processing (Asymmetric) | ||
| + | * Dynamically select latency or throughput | ||
| + | * Very long instruction word (VLIW) | ||
| + | * Lock step execution | ||
| + | * Decoupled access and execute | ||
| + | * Dynamic vs static scheduling | ||
| + | * Loop unrolling | ||
| + | * Systolic arrays | ||
| + | * WARP Computer | ||
| + | |||
| + | ===== Lecture 21 (3/25 Mon.) ===== | ||
| + | * Systolic architectures | ||
| + | * Pipeline parallel execution | ||
| + | * Decoupled Access/Execute | ||
| + | * Static instruction scheduling | ||
| + | * Common subexpression elimination | ||
| + | * Loop unrolling | ||
| + | * Speculative code motion | ||
| + | * Trace scheduling | ||
| + | * Data precedence graph | ||
| + | * List scheduling | ||
| + | * Superblock scheduling | ||
| + | * Hyperblock formation | ||
| + | * Block-structured ISA | ||
| + | * Exception propagation | ||
| + | |||
| + | ===== Lecture 22 (3/27 Wed.) ===== | ||
| + | * Ideal memory | ||
| + | * DRAM - Dynamic Random Access Memory | ||
| + | * SRAM - Static Random Access Memory | ||
| + | * Bit line | ||
| + | * Row enable | ||
| + | |||
| + | * Sense amplifier | ||
| + | * Row/column decoder | ||
| + | * Phase change memory | ||
| + | * DRAM refresh | ||
| + | * Temporal locality | ||
| + | * Spatial locality | ||
| + | * Cache hierarchy | ||
| + | * Manual vs automatically managed memory hierarchy | ||
| + | * Cache block (line) | ||
| + | * Cache hit/miss | ||
| + | * Cache block eviction | ||
| + | * Tag and data store | ||
| + | * Cache associativity | ||
| + | * Direct-mapped cache | ||
| + | * Fully associative cache | ||
| + | * Cache replacement policies | ||
| + | * Least recently used (LRU) | ||
| + | * Victim/next-victim policy | ||
| + | |||
| + | ===== Lecture 23 (3/29 Fri.) ===== | ||
| + | * Cache insertion policy | ||
| + | * Cache replacement policy | ||
| + | * Cache promotion policy | ||
| + | * Non-temporal loads | ||
| + | * Victim/Next-Victim replacement policy | ||
| + | * Hybrid replacement policy - set sampling | ||
| + | * Page replacement | ||
| + | * Tag store entry | ||
| + | * Inclusive/exclusive cache | ||
| + | * Write-back/write-through cache | ||
| + | * Allocate/no-allocate on write miss | ||
| + | * Sectored cache | ||
| + | * Cache dirty bit | ||
| + | * Cache valid bit | ||
| + | * Separate data and instruction caches vs unified caches | ||
| + | * Multi-level caches | ||
| + | * Homonym and synonym problem | ||
| + | * Virtual-physical cache | ||
| + | |||
| + | ===== Lecture 24 (4/1 Mon.) ===== | ||
| + | * Virtual-physical cache | ||
| + | * Virtual memory - DRAM interaction | ||
| + | * Page coloring | ||
| + | * Critical word first fill in | ||
| + | * Cache subblocking | ||
| + | * Cache size and associativity | ||
| + | * Compulsory misses | ||
| + | * Capacity misses | ||
| + | * Conflict misses | ||
| + | * Coherence miss (communication miss) | ||
| + | * Stream prefetcher | ||
| + | * Stride prefetcher | ||
| + | * Cache working set | ||
| + | * Victim cache | ||
| + | * Hashing (pseudo-associativity) | ||
| + | * Skewed associative caches | ||
| + | * Data restructuring via software | ||
| + | * Memory level parallelism (MLP) | ||
| + | * MLP-aware cache replacement | ||
| + | |||
| + | ===== Lecture 25 (4/3 Wed.) ===== | ||
| + | * Memory level parallelism (MLP) | ||
| + | * Miss Status Handling Register (MSHR) | ||
| + | * Cache multi-porting (true and virtual) | ||
| + | * Cache banking (interleaving) | ||
| + | * DRAM organization | ||
| + | * Channel | ||
| + | * DIMM | ||
| + | * Rank | ||
| + | * Chip | ||
| + | * Bank | ||
| + | * Row/column | ||
| + | * Row/column latch | ||
| + | * Row buffer (sense amplifier) | ||
| + | * DRAM commands | ||
| + | * Activate | ||
| + | * Read/write | ||
| + | * Precharge | ||
| + | * Row buffer conflict | ||
| + | * DRAM refresh | ||
| + | * DRAM access latency | ||
| + | * DRAM address mapping - row interleaving or cache block interleaving | ||
| + | * Virtual/physical address mapping | ||
| + | |||
| + | ===== Lecture 26 (4/8 Mon.) ===== | ||
| + | * DRAM refresh | ||
| + | * Distributed refresh | ||
| + | * DRAM refresh overhead (time and energy) | ||
| + | * Bloom filter | ||
| + | * Insert, test, remove all | ||
| + | * Retention-Aware Intelligent DRAM Refresh (RAIDR) | ||
| + | * Profile, Binning, Refresh | ||
| + | * Flash memory | ||
| + | * DRAM Controller | ||
| + | * In chipset vs on CPU chip | ||
| + | * DRAM scheduling policy | ||
| + | * First come first serve (FCFS) | ||
| + | * First ready, first come first serve (FR-FCFS) | ||
| + | * DRAM row management policy - open row vs close row | ||
| + | * DRAM timing constraints | ||
| + | * DRAM power management | ||
| + | * DRAM power state | ||
| + | |||
| + | ===== Lecture 27 (4/10 Wed.) ===== | ||
| + | * DRAM bank operation | ||
| + | * Memory interference | ||
| + | * Quality-of-service unaware memory control | ||
| + | * Stall-time fairness in shared DRAM | ||
| + | * STFM (Stall Time Fair Memory) scheduling algorithm | ||
| + | * Memory bank parallelism of threads | ||
| + | * PAR-BS (Parallelism-Aware Batch Scheduling) | ||
| + | * ATLAS Memory Scheduler | ||
| + | * Throughput vs fairness | ||
| + | * Thread Cluster Memory Scheduling | ||
| + | * Quantum-based operation | ||
| + | * Misses per kilo-instruction (MPKI) | ||
| + | * Lottery scheduling | ||
| + | * Memory channel partitioning | ||
| + | * Throttling of source/core | ||
| + | * Data mapping to banks/channels/ranks | ||
| + | * Request prioritization | ||
| + | * Bottleneck Identification and Scheduling (BIS) | ||
| + | |||
| + | ===== Lecture 28 (4/12 Fri.) ===== | ||
| + | * Inter-thread/application interference | ||
| + | * Utility-based cache partitioning | ||
| + | * Non-uniform memory access (NUMA) | ||
| + | * Smart/dumb resources | ||
| + | * Full-window stall | ||
| + | * Memory latency tolerance | ||
| + | * Caching | ||
| + | * Prefetching | ||
| + | * Multithreading | ||
| + | * Out of order execution | ||
| + | * Runahead execution | ||
| + | * Runahead cache | ||
| + | * Cache working set | ||
| + | * Dependent cache misses | ||
| + | * Address-value delta | ||
| + | * Traversal address load | ||
| + | * Leaf address load | ||
| + | |||
| + | ===== Lecture 29 (4/15 Mon.) ===== | ||
| + | * Prefetching | ||
| + | * Compulsory cache misses | ||
| + | * Prefetch algorithm | ||
| + | * Early/late prefetches | ||
| + | * Prefetch distance | ||
| + | * Prefetch aggressiveness | ||
| + | * Cache pollution | ||
| + | * Prefetch buffer | ||
| + | * Decoupled fetch | ||
| + | * Prefetch destination | ||
| + | * Prefetch coverage | ||
| + | * Prefetch accuracy | ||
| + | * Prefetch timeliness | ||
| + | * Software prefetching / hardware prefetching / execution-based prefetching | ||
| + | * Next-line prefetcher | ||
| + | * Instruction based stride prefetching | ||
| + | * Stream buffer | ||
| + | |||
| + | ===== Lecture 30 (4/22 Mon.) ===== | ||
| + | * Prefetch bandwidth consumption | ||
| + | * Feedback-directed prefetcher throttling | ||
| + | * Prefetch insertion location | ||
| + | * Prefetch irregular address patterns | ||
| + | * Markov prefetching | ||
| + | * Content directed prefetching | ||
| + | * Execution-based prefetching | ||
| + | * Thread-based pre-execution | ||
| + | * Simultaneous multithreading | ||
| + | * ISA extensions for prefetching | ||
| + | * Pre-execution slice | ||
| + | * Slipstream processing | ||
| + | * Parallel computing | ||
| + | * Loosely coupled vs tightly coupled multiproecssor | ||
| + | * Message passing | ||
| + | * Cache coherence | ||
| + | * Ordering of memory operations | ||
| + | * Processor load imbalance | ||
| + | * Processor utilization / redundancy / efficiency | ||
| + | * Amdahl's Law | ||
| + | * Sequential bottleneck | ||
| + | |||
| + | ===== Lecture 31 (4/24 Wed.) ===== | ||
| + | * Bottlenecks in parallel execution | ||
| + | * Ordering of memory operations | ||
| + | * Deterministic execution | ||
| + | * Protection of shared data | ||
| + | * Mutual exclusion | ||
| + | * Sequential consistency | ||
| + | * Total global order requirement | ||
| + | * Cache coherence | ||
| + | * Snooping bus | ||
| + | * Directory-based cache coherence | ||
| + | * Update vs Invalidate | ||
| + | * MESI Protocol (Modified, Exclusive, Shared, Invalid) | ||
| + | * Read-exclusive (write) | ||
| + | * Exclusive bit | ||
| + | * MOESI (add Owned state) | ||
| + | |||
| + | ===== Lecture 32 (4/26 Fri.) ===== | ||
| + | * Snoopy cache vs Directory Coherence | ||
| + | * Set inclusion test | ||
| + | * Contention resolution | ||
| + | * Negative acknowledgement (nack) | ||
| + | * Coherence granularity | ||
| + | * False sharing | ||
| + | * Interconnection networks | ||
| + | * Topology | ||
| + | * Routing (algorithm) | ||
| + | * Buffering and flow control | ||
| + | * Point-to-point | ||
| + | * Crossbar | ||
| + | * Buffered/bufferless networks | ||
| + | * Flow control | ||
| + | * Multistage logarithm networks | ||
| + | * Circuit vs packet switching | ||
| + | * Delta network | ||
| + | * Ring network | ||
| + | * Unidirectional ring | ||
| + | * Mesh | ||
| + | * Torus | ||
| + | * Trees / Fat trees | ||
| + | * Hypercube | ||
| + | * Bufferless deflection routing | ||
| + | * Dimension-order routing | ||
| + | * Deadlock vs livelock | ||
| + | * Valiant's algorithm | ||
| + | * Adaptive vs oblivious routing | ||
| + | |||
| + | ===== Lecture 33 (4/29 Mon.) ===== | ||
| + | * Serialized code sections | ||
| + | * Critical section | ||
| + | * Barrier | ||
| + | * Limiter stages in pipelined programs | ||
| + | * Trace cache | ||
| + | * Large vs small core | ||
| + | * Asymmetric Chip Multiprocessor (ACMP) | ||
| + | * Accelerating Critical Sections | ||
| + | * False serialization | ||
| + | * Shared vs private data | ||
| + | * Data Marshalling | ||
| + | * Bottleneck Identification and Scheduling (BIS) | ||
| + | * Bottleneck Table | ||
| + | * Acceleration Index Table | ||
| + | |||
| + | ===== Lecture 34 (5/1 Wed.) ===== | ||
| + | * DRAM technology scaling | ||
| + | * Emerging memory technologies | ||
| + | * Phase change memory (PCM) | ||
| + | * Memristors | ||
| + | * Memory capacity | ||
| + | * Memory latency | ||
| + | * Memory endurance | ||
| + | * Memory idle power | ||
| + | * Hybrid memory system | ||
| + | * Replacing DRAM with PCM | ||
| + | * Row-locality Aware Data Placement | ||
| + | * DRAM cache with metadata store | ||
| + | * TIMBER Tag Management | ||
| + | * Security challenges of emerging technologies | ||