This shows you the differences between two versions of the page.
|
buzzwords [2013/02/15 13:54] jasonli1 |
buzzwords [2013/05/01 14:05] (current) jasonli1 |
||
|---|---|---|---|
| Line 268: | Line 268: | ||
| * Use reorder buffer to handle out-of-order memory operations | * Use reorder buffer to handle out-of-order memory operations | ||
| * Store/write buffer | * Store/write buffer | ||
| + | |||
| + | ===== Lecture 14 (2/18 Mon.) ===== | ||
| + | * Preventing dispatch stalls | ||
| + | * Fine-grained multithreading | ||
| + | * Value prediction | ||
| + | * Stride predictor | ||
| + | * Compile-time instruction reordering | ||
| + | * Out of order execution (Dynamic scheduling) | ||
| + | * Restricted dataflow | ||
| + | * Latency tolerance | ||
| + | * Reservation station | ||
| + | * Reservation station entry | ||
| + | * Register renaming | ||
| + | * Tags associated with register value | ||
| + | * Register alias table (RAT) | ||
| + | * Tomasulo's algorithm | ||
| + | * Instruction window size | ||
| + | * Registers vs memory | ||
| + | * Memory dependence handling | ||
| + | * Memory disambiguation / unknown address problem | ||
| + | * Dependence of memory instructions (loads/stores) | ||
| + | |||
| + | ===== Lecture 15 (2/20 Wed.) ===== | ||
| + | * Out of order execution | ||
| + | * Memory dependence handling | ||
| + | * Content addressable memory | ||
| + | * Memory disambiguation | ||
| + | * Reservation stations | ||
| + | * Branch prediction | ||
| + | * Superscalar execution vs OoO execution | ||
| + | * Instruction level parallelism | ||
| + | * Dataflow (at the ISA level) | ||
| + | * Systolic arrays | ||
| + | * Stream processing | ||
| + | * Pool of unmatched tokens | ||
| + | * Token matching area | ||
| + | * Instruction fetch area | ||
| + | * MIT tagged token data flow architecture | ||
| + | * Irregular parallelism | ||
| + | * Data parallelism | ||
| + | * SIMD (Single instruction multiple data) | ||
| + | * Array vs vector preocessors | ||
| + | * VLIW (Very long instruction word) | ||
| + | * Vector precessor | ||
| + | * Vector registers | ||
| + | * Vector length register (VLEN) | ||
| + | * Vector stride register (VSTR) | ||
| + | |||
| + | ===== Lecture 16 (2/25 Mon.) ===== | ||
| + | * Virtual memory | ||
| + | * Virtual page vs physical page (frame) | ||
| + | * Address translation | ||
| + | * Page Table (Virtual addr -> Physical addr) | ||
| + | * Page fault | ||
| + | * Reads data from disk to memory | ||
| + | * Direct Memory Access | ||
| + | * Caches | ||
| + | * Block, set | ||
| + | * Block/line size, associativity | ||
| + | * Hit, miss | ||
| + | * Insertion, eviction | ||
| + | * Write through vs write back | ||
| + | * Locality - temporal and spatial | ||
| + | * Working set | ||
| + | * Address isolation | ||
| + | * Copy on write | ||
| + | * Before virtual memory | ||
| + | * Single-user machine | ||
| + | * Base and bound registers | ||
| + | * Segmented address space | ||
| + | |||
| + | ===== Lecture 17 (2/27 Wed.) ===== | ||
| + | * Virtual memory | ||
| + | * Segmentation | ||
| + | * Segment selectors | ||
| + | * Segment descriptors | ||
| + | * Privilege level (ring) | ||
| + | * Paging | ||
| + | * Physical vs virtual address space | ||
| + | * Physical vs virtual page | ||
| + | * Virtual page number (VPN) | ||
| + | * Physical page number (PPN) | ||
| + | * Virtual page offset == physical page offset | ||
| + | * Address translation - VPN -> PPN | ||
| + | * Page Table | ||
| + | * Multi-level page table | ||
| + | * Translation Lookaside Buffer (TLB) | ||
| + | * Homonym problem | ||
| + | |||
| + | ===== Lecture 18 (3/1 Fri.) ===== | ||
| + | * Translation | ||
| + | * Two-level page table | ||
| + | * Page directory | ||
| + | * Multi-level page table | ||
| + | * Page Directory Base Register | ||
| + | * Translation: Segmentation + Paging | ||
| + | * Protection | ||
| + | * Privilege levels | ||
| + | * Page Directory Entry (PDE) | ||
| + | * Page Table Entry (PTE) | ||
| + | * Read/Write | ||
| + | * User/Supervisor | ||
| + | * Protection: PDE + PTE | ||
| + | * Protection: Segmentation + Paging | ||
| + | * Translation Lookaside Buffer (TLB) | ||
| + | * Context switch | ||
| + | * Flush/invalidate | ||
| + | * TLB miss: HW-managed vs. SW-managed | ||
| + | * Page walk | ||
| + | * TLB replacement | ||
| + | * Page fault | ||
| + | * Page fault handler | ||
| + | * Demand paging | ||
| + | * Swapping | ||
| + | * Thrashing | ||
| + | * Page size | ||
| + | * Internal fragmentation | ||
| + | * Memory Management Unit (MMU) | ||
| + | |||
| + | ===== Lecture 19 (3/18 Mon.) ===== | ||
| + | * Vector registers | ||
| + | * Vector data register | ||
| + | * Vector control registers | ||
| + | * Vector functional units | ||
| + | * Amdahl's Law | ||
| + | * Sequential bottleneck | ||
| + | * Vector memory system | ||
| + | * Memory banking | ||
| + | * Address generator | ||
| + | * Base and stride | ||
| + | * Vectorizable loops | ||
| + | * Vector chaining | ||
| + | * Vector stripmining | ||
| + | * Scatter/gather operations | ||
| + | * Masked operations | ||
| + | * Matrix storage | ||
| + | * Row major | ||
| + | * Column major | ||
| + | * Vector instruction level parallelism | ||
| + | * Automatic code vectorization | ||
| + | * Graphics processing units | ||
| + | * Single instruction multiple threads | ||
| + | * Thread warps | ||
| + | * SIMT memory access | ||
| + | |||
| + | ===== Lecture 20 (3/20 Wed.) ===== | ||
| + | * SIMT (Single Instruction Multiple Thread) | ||
| + | * SPMD (Single Procedure Multiple Data) | ||
| + | * Thread warps | ||
| + | * Branch divergence in warps | ||
| + | * Branch divergence handing - dynamic predicated execution | ||
| + | * Dynamic warp formation | ||
| + | * Memory access divergence in warp | ||
| + | * NVIDIA GPU terminology | ||
| + | * Streaming multiprocessor (SM) | ||
| + | * Warp | ||
| + | * Thread context | ||
| + | * Critical section | ||
| + | * Heterogeneous processing (Asymmetric) | ||
| + | * Dynamically select latency or throughput | ||
| + | * Very long instruction word (VLIW) | ||
| + | * Lock step execution | ||
| + | * Decoupled access and execute | ||
| + | * Dynamic vs static scheduling | ||
| + | * Loop unrolling | ||
| + | * Systolic arrays | ||
| + | * WARP Computer | ||
| + | |||
| + | ===== Lecture 21 (3/25 Mon.) ===== | ||
| + | * Systolic architectures | ||
| + | * Pipeline parallel execution | ||
| + | * Decoupled Access/Execute | ||
| + | * Static instruction scheduling | ||
| + | * Common subexpression elimination | ||
| + | * Loop unrolling | ||
| + | * Speculative code motion | ||
| + | * Trace scheduling | ||
| + | * Data precedence graph | ||
| + | * List scheduling | ||
| + | * Superblock scheduling | ||
| + | * Hyperblock formation | ||
| + | * Block-structured ISA | ||
| + | * Exception propagation | ||
| + | |||
| + | ===== Lecture 22 (3/27 Wed.) ===== | ||
| + | * Ideal memory | ||
| + | * DRAM - Dynamic Random Access Memory | ||
| + | * SRAM - Static Random Access Memory | ||
| + | * Bit line | ||
| + | * Row enable | ||
| + | |||
| + | * Sense amplifier | ||
| + | * Row/column decoder | ||
| + | * Phase change memory | ||
| + | * DRAM refresh | ||
| + | * Temporal locality | ||
| + | * Spatial locality | ||
| + | * Cache hierarchy | ||
| + | * Manual vs automatically managed memory hierarchy | ||
| + | * Cache block (line) | ||
| + | * Cache hit/miss | ||
| + | * Cache block eviction | ||
| + | * Tag and data store | ||
| + | * Cache associativity | ||
| + | * Direct-mapped cache | ||
| + | * Fully associative cache | ||
| + | * Cache replacement policies | ||
| + | * Least recently used (LRU) | ||
| + | * Victim/next-victim policy | ||
| + | |||
| + | ===== Lecture 23 (3/29 Fri.) ===== | ||
| + | * Cache insertion policy | ||
| + | * Cache replacement policy | ||
| + | * Cache promotion policy | ||
| + | * Non-temporal loads | ||
| + | * Victim/Next-Victim replacement policy | ||
| + | * Hybrid replacement policy - set sampling | ||
| + | * Page replacement | ||
| + | * Tag store entry | ||
| + | * Inclusive/exclusive cache | ||
| + | * Write-back/write-through cache | ||
| + | * Allocate/no-allocate on write miss | ||
| + | * Sectored cache | ||
| + | * Cache dirty bit | ||
| + | * Cache valid bit | ||
| + | * Separate data and instruction caches vs unified caches | ||
| + | * Multi-level caches | ||
| + | * Homonym and synonym problem | ||
| + | * Virtual-physical cache | ||
| + | |||
| + | ===== Lecture 24 (4/1 Mon.) ===== | ||
| + | * Virtual-physical cache | ||
| + | * Virtual memory - DRAM interaction | ||
| + | * Page coloring | ||
| + | * Critical word first fill in | ||
| + | * Cache subblocking | ||
| + | * Cache size and associativity | ||
| + | * Compulsory misses | ||
| + | * Capacity misses | ||
| + | * Conflict misses | ||
| + | * Coherence miss (communication miss) | ||
| + | * Stream prefetcher | ||
| + | * Stride prefetcher | ||
| + | * Cache working set | ||
| + | * Victim cache | ||
| + | * Hashing (pseudo-associativity) | ||
| + | * Skewed associative caches | ||
| + | * Data restructuring via software | ||
| + | * Memory level parallelism (MLP) | ||
| + | * MLP-aware cache replacement | ||
| + | |||
| + | ===== Lecture 25 (4/3 Wed.) ===== | ||
| + | * Memory level parallelism (MLP) | ||
| + | * Miss Status Handling Register (MSHR) | ||
| + | * Cache multi-porting (true and virtual) | ||
| + | * Cache banking (interleaving) | ||
| + | * DRAM organization | ||
| + | * Channel | ||
| + | * DIMM | ||
| + | * Rank | ||
| + | * Chip | ||
| + | * Bank | ||
| + | * Row/column | ||
| + | * Row/column latch | ||
| + | * Row buffer (sense amplifier) | ||
| + | * DRAM commands | ||
| + | * Activate | ||
| + | * Read/write | ||
| + | * Precharge | ||
| + | * Row buffer conflict | ||
| + | * DRAM refresh | ||
| + | * DRAM access latency | ||
| + | * DRAM address mapping - row interleaving or cache block interleaving | ||
| + | * Virtual/physical address mapping | ||
| + | |||
| + | ===== Lecture 26 (4/8 Mon.) ===== | ||
| + | * DRAM refresh | ||
| + | * Distributed refresh | ||
| + | * DRAM refresh overhead (time and energy) | ||
| + | * Bloom filter | ||
| + | * Insert, test, remove all | ||
| + | * Retention-Aware Intelligent DRAM Refresh (RAIDR) | ||
| + | * Profile, Binning, Refresh | ||
| + | * Flash memory | ||
| + | * DRAM Controller | ||
| + | * In chipset vs on CPU chip | ||
| + | * DRAM scheduling policy | ||
| + | * First come first serve (FCFS) | ||
| + | * First ready, first come first serve (FR-FCFS) | ||
| + | * DRAM row management policy - open row vs close row | ||
| + | * DRAM timing constraints | ||
| + | * DRAM power management | ||
| + | * DRAM power state | ||
| + | |||
| + | ===== Lecture 27 (4/10 Wed.) ===== | ||
| + | * DRAM bank operation | ||
| + | * Memory interference | ||
| + | * Quality-of-service unaware memory control | ||
| + | * Stall-time fairness in shared DRAM | ||
| + | * STFM (Stall Time Fair Memory) scheduling algorithm | ||
| + | * Memory bank parallelism of threads | ||
| + | * PAR-BS (Parallelism-Aware Batch Scheduling) | ||
| + | * ATLAS Memory Scheduler | ||
| + | * Throughput vs fairness | ||
| + | * Thread Cluster Memory Scheduling | ||
| + | * Quantum-based operation | ||
| + | * Misses per kilo-instruction (MPKI) | ||
| + | * Lottery scheduling | ||
| + | * Memory channel partitioning | ||
| + | * Throttling of source/core | ||
| + | * Data mapping to banks/channels/ranks | ||
| + | * Request prioritization | ||
| + | * Bottleneck Identification and Scheduling (BIS) | ||
| + | |||
| + | ===== Lecture 28 (4/12 Fri.) ===== | ||
| + | * Inter-thread/application interference | ||
| + | * Utility-based cache partitioning | ||
| + | * Non-uniform memory access (NUMA) | ||
| + | * Smart/dumb resources | ||
| + | * Full-window stall | ||
| + | * Memory latency tolerance | ||
| + | * Caching | ||
| + | * Prefetching | ||
| + | * Multithreading | ||
| + | * Out of order execution | ||
| + | * Runahead execution | ||
| + | * Runahead cache | ||
| + | * Cache working set | ||
| + | * Dependent cache misses | ||
| + | * Address-value delta | ||
| + | * Traversal address load | ||
| + | * Leaf address load | ||
| + | |||
| + | ===== Lecture 29 (4/15 Mon.) ===== | ||
| + | * Prefetching | ||
| + | * Compulsory cache misses | ||
| + | * Prefetch algorithm | ||
| + | * Early/late prefetches | ||
| + | * Prefetch distance | ||
| + | * Prefetch aggressiveness | ||
| + | * Cache pollution | ||
| + | * Prefetch buffer | ||
| + | * Decoupled fetch | ||
| + | * Prefetch destination | ||
| + | * Prefetch coverage | ||
| + | * Prefetch accuracy | ||
| + | * Prefetch timeliness | ||
| + | * Software prefetching / hardware prefetching / execution-based prefetching | ||
| + | * Next-line prefetcher | ||
| + | * Instruction based stride prefetching | ||
| + | * Stream buffer | ||
| + | |||
| + | ===== Lecture 30 (4/22 Mon.) ===== | ||
| + | * Prefetch bandwidth consumption | ||
| + | * Feedback-directed prefetcher throttling | ||
| + | * Prefetch insertion location | ||
| + | * Prefetch irregular address patterns | ||
| + | * Markov prefetching | ||
| + | * Content directed prefetching | ||
| + | * Execution-based prefetching | ||
| + | * Thread-based pre-execution | ||
| + | * Simultaneous multithreading | ||
| + | * ISA extensions for prefetching | ||
| + | * Pre-execution slice | ||
| + | * Slipstream processing | ||
| + | * Parallel computing | ||
| + | * Loosely coupled vs tightly coupled multiproecssor | ||
| + | * Message passing | ||
| + | * Cache coherence | ||
| + | * Ordering of memory operations | ||
| + | * Processor load imbalance | ||
| + | * Processor utilization / redundancy / efficiency | ||
| + | * Amdahl's Law | ||
| + | * Sequential bottleneck | ||
| + | |||
| + | ===== Lecture 31 (4/24 Wed.) ===== | ||
| + | * Bottlenecks in parallel execution | ||
| + | * Ordering of memory operations | ||
| + | * Deterministic execution | ||
| + | * Protection of shared data | ||
| + | * Mutual exclusion | ||
| + | * Sequential consistency | ||
| + | * Total global order requirement | ||
| + | * Cache coherence | ||
| + | * Snooping bus | ||
| + | * Directory-based cache coherence | ||
| + | * Update vs Invalidate | ||
| + | * MESI Protocol (Modified, Exclusive, Shared, Invalid) | ||
| + | * Read-exclusive (write) | ||
| + | * Exclusive bit | ||
| + | * MOESI (add Owned state) | ||
| + | |||
| + | ===== Lecture 32 (4/26 Fri.) ===== | ||
| + | * Snoopy cache vs Directory Coherence | ||
| + | * Set inclusion test | ||
| + | * Contention resolution | ||
| + | * Negative acknowledgement (nack) | ||
| + | * Coherence granularity | ||
| + | * False sharing | ||
| + | * Interconnection networks | ||
| + | * Topology | ||
| + | * Routing (algorithm) | ||
| + | * Buffering and flow control | ||
| + | * Point-to-point | ||
| + | * Crossbar | ||
| + | * Buffered/bufferless networks | ||
| + | * Flow control | ||
| + | * Multistage logarithm networks | ||
| + | * Circuit vs packet switching | ||
| + | * Delta network | ||
| + | * Ring network | ||
| + | * Unidirectional ring | ||
| + | * Mesh | ||
| + | * Torus | ||
| + | * Trees / Fat trees | ||
| + | * Hypercube | ||
| + | * Bufferless deflection routing | ||
| + | * Dimension-order routing | ||
| + | * Deadlock vs livelock | ||
| + | * Valiant's algorithm | ||
| + | * Adaptive vs oblivious routing | ||
| + | |||
| + | ===== Lecture 33 (4/29 Mon.) ===== | ||
| + | * Serialized code sections | ||
| + | * Critical section | ||
| + | * Barrier | ||
| + | * Limiter stages in pipelined programs | ||
| + | * Trace cache | ||
| + | * Large vs small core | ||
| + | * Asymmetric Chip Multiprocessor (ACMP) | ||
| + | * Accelerating Critical Sections | ||
| + | * False serialization | ||
| + | * Shared vs private data | ||
| + | * Data Marshalling | ||
| + | * Bottleneck Identification and Scheduling (BIS) | ||
| + | * Bottleneck Table | ||
| + | * Acceleration Index Table | ||
| + | |||
| + | ===== Lecture 34 (5/1 Wed.) ===== | ||
| + | * DRAM technology scaling | ||
| + | * Emerging memory technologies | ||
| + | * Phase change memory (PCM) | ||
| + | * Memristors | ||
| + | * Memory capacity | ||
| + | * Memory latency | ||
| + | * Memory endurance | ||
| + | * Memory idle power | ||
| + | * Hybrid memory system | ||
| + | * Replacing DRAM with PCM | ||
| + | * Row-locality Aware Data Placement | ||
| + | * DRAM cache with metadata store | ||
| + | * TIMBER Tag Management | ||
| + | * Security challenges of emerging technologies | ||