User Tools

Site Tools


buzzwords

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

buzzwords [2013/01/25 15:19]
justinme [Lecture 5 (1/25 Fri.)]
buzzwords [2013/05/01 14:05] (current)
jasonli1
Line 113: Line 113:
   * Program execution time:  #instructions * average_CPI * clock_cycle_time   * Program execution time:  #instructions * average_CPI * clock_cycle_time
   * Branch delay slot   * Branch delay slot
 +
 +===== Lecture 6 (1/28 Mon.) =====
 +  * Single-cycle microarchitecture
 +  * Instruction processing cycle
 +  * Combinational logic - hardwired control
 +  * Sequential logic - microprogrammed control
 +  * Critical path
 +  * Memory latency
 +  * Microarchitecture design principles - critical path design, common case design, balanced design
 +  * Cycles per Instruction (CPI) vs frequency
 +  * Pipelining
 +  * Program execution time:  #instructions * average_CPI * clock_cycle_time
 +  * Multi-cycle microarchitecture
 +  * Instruction processing cycle
 +  * Microinstruction
 +  * Microsequencing
 +  * Control store
 +  * Microsequencer
 +  * Condition codes
 +  * Simple LC-3b control datapath
 +
 +===== Lecture 7 (1/30 Wed.) =====
 +  * Multi-cycle microarchitecture
 +  * Instruction processing cycle
 +  * Behavior of the entire [multi-cycle microarchitecture] processor is specified by a finite state machine
 +  * Microinstruction
 +  * Microsequencing
 +  * Control store
 +  * Microsequencer
 +  * Tri-state buffer
 +  * Bus gating
 +  * Difference between gating and loading
 +
 +===== Lecture 8 (2/4 Mon.) =====
 +  * Interrupt checking
 +  * Unaligned memory accesses
 +  * Memory-mapped I/O
 +  * Updating/patching microcode on the field
 +  * Horizontal/vertical microcode
 +  * Nanocode and millicode
 +  * Pipelining
 +  * Ideal pipeline - identical operations, independent operations, uniformly partitionable suboperations
 +  * Pipeline registers
 +  * Pipeline control signals - decode once and buffer, or carry instructions and decode locally
 +  * Pipeline external fragmentation (pipeline stages idle for some instructions)
 +  * Pipeline internal fragmentation (some pipeline states too fast while clock cycle same)
 +  * Inter-instruction dependencies need to be detected and handled
 +  * Issues in pipeline design - number of stages, keeping pipeline correct and full, handling exceptions and interrupts
 +  * Causes of pipeline stalls - resource contention, dependencies (control, data)
 +  * Handling resource contention - duplicate resource, increase throughput, detect contention and stall contending stage
 +  * Data dependencies - flow dependence (true dependence, read after write), output dependence (write after write), anti dependence (write after read)
 +
 +===== Lecture 9 (2/6 Wed.) =====
 +  * Anti and output dependence - limited number of architectural registers
 +  * Control flow graph
 +  * Compiler profiling
 +  * Profile input set vs runtime input set
 +  * Load hoisting
 +  * Handling flow dependencies - detect and wait, detect and forward, detect and eliminate, predict and verify
 +  * Fine-grained multithreading
 +  * Software vs hardware based interlocking
 +  * Scoreboarding
 +  * Register renaming
 +  * Combinational dependence check logic
 +  * Register data forwarding
 +  * Control dependence
 +  * Pipeline stalls
 +  * Data forwarding distance
 +  * Data forwarding logic
 +
 +===== Lecture 10 (2/8 Fri.) =====
 +  * Register renaming
 +  * Static vs dynamic scheduling
 +  * Branch types - conditional, unconditional, call, return, indirect
 +  * Handling control dependencies 
 +    * Branch prediction
 +    * Stalling
 +    * Branch delay slot
 +    * Predicated execution
 +    * Fine-grained multithreading
 +    * Multipath execution
 +  * Pipeline flushing
 +  * Branch misprediction penalty
 +  * Forward vs backward control flow
 +  * Decrease number of branches - get rid of control flow instructions, convert control dependence to data dependence, predicate combining
 +  * Wish branches - choose predicated execution or branch prediction
 +  * Delayed branching with squashing
 +  * Enhanced branch prediction - need to predict target address, branch direction, whether instruction is branch
 +  * Branch Target Buffer (BTB) or Branch Target Address Cache
 +  * Compile time vs run time branch direction prediction
 +
 +===== Lecture 11 (2/11 Mon.) =====
 +  * Static branch prediction
 +    * Always not-taken
 +    * Always taken
 +    * Backward taken, forward not taken
 +    * Profile-based (compiler)
 +    * Program-based (program analysis based)
 +    * Pragmas - programmer conveys hints
 +  * Dynamic branch prediction
 +    * Last time predictor
 +    * Two-bit counter based prediction (saturating counter)
 +  * Global branch correlation
 +    * Global History Register (GHR)
 +    * Pattern History Table (PHT)
 +    * Intel Pentium Pro Branch Predictor - multiple PHTs
 +    * Gshare predictor - GHR hashed with Branch PC
 +    * Two-level Global History Predictor
 +  * Local branch correlation
 +    * Per-branch history register
 +  * Hybrid branch predictor - multiple algorithms, choose "best" prediction
 +  * Branch confidence estimation
 +  * Branch misprediction penalty
 +  * Alpha 21264 Tournament Predictor
 +  * SPEC - Standard Performance Evaluation Corporation (CPU benchmark)
 +
 +===== Lecture 12 (2/13 Wed.) =====
 +  * Predicated execution - compiler converts control dependence into data dependence
 +  * Conditional execution in ARM ISA
 +  * Hammock branch
 +  * Wish jump/join
 +  * Multi-path execution
 +  * Call and return prediction
 +    * Direct calls - easy to predict
 +    * Returns are indirect branches
 +    * Prediction - Return Address Stack
 +  * Indirect branch prediction
 +    * Last resolved target
 +    * History based target prediction
 +  * Superscalar processor
 +  * Multiple instruction fetch
 +  * Multi-cycle execution
 +  * Exceptions vs interrupts
 +  * Precise exceptions/interrupts
 +    * Make each operation take same amount of time time
 +    * Reorder buffer (ROB)
 +    * History buffer
 +    * Future register file
 +    * Checkpointing
 +  * Instruction retirement (commit)
 +
 +===== Lecture 13 (2/15 Fri.) =====
 +  * Reorder buffer (ROB)
 +  * Accessing ROB with register file
 +    * Use indirection from RF to ROB
 +  * Register renaming with ROB
 +    * Architectural register ID -> Physical register ID
 +    * Eliminates false dependencies
 +  * In-order execution, out-of-order completion, in-order retirement
 +  * History buffer (HB)
 +  * Future file + ROB
 +  * Checkpointing
 +  * Maintaining speculative memory states
 +    * Use reorder buffer to handle out-of-order memory operations
 +    * Store/write buffer
 +
 +===== Lecture 14 (2/18 Mon.) =====
 +  * Preventing dispatch stalls
 +    * Fine-grained multithreading
 +    * Value prediction
 +      * Stride predictor
 +    * Compile-time instruction reordering
 +  * Out of order execution (Dynamic scheduling)
 +    * Restricted dataflow
 +    * Latency tolerance
 +  * Reservation station
 +    * Reservation station entry
 +  * Register renaming
 +    * Tags associated with register value
 +  * Register alias table (RAT)
 +  * Tomasulo's algorithm
 +  * Instruction window size
 +  * Registers vs memory
 +  * Memory dependence handling
 +  * Memory disambiguation / unknown address problem
 +  * Dependence of memory instructions (loads/stores)
 +
 +===== Lecture 15 (2/20 Wed.) =====
 +  * Out of order execution
 +  * Memory dependence handling
 +  * Content addressable memory
 +  * Memory disambiguation
 +  * Reservation stations
 +  * Branch prediction
 +  * Superscalar execution vs OoO execution
 +  * Instruction level parallelism
 +  * Dataflow (at the ISA level)
 +  * Systolic arrays
 +  * Stream processing
 +  * Pool of unmatched tokens
 +  * Token matching area
 +  * Instruction fetch area
 +  * MIT tagged token data flow architecture
 +  * Irregular parallelism
 +  * Data parallelism
 +  * SIMD (Single instruction multiple data)
 +  * Array vs vector preocessors
 +  * VLIW (Very long instruction word)
 +  * Vector precessor
 +    * Vector registers
 +    * Vector length register (VLEN)
 +    * Vector stride register (VSTR)
 +
 +===== Lecture 16 (2/25 Mon.) =====
 +  * Virtual memory
 +    * Virtual page vs physical page (frame)
 +    * Address translation
 +    * Page Table (Virtual addr -> Physical addr)
 +    * Page fault
 +      * Reads data from disk to memory
 +      * Direct Memory Access
 +  * Caches
 +    * Block, set
 +    * Block/line size, associativity
 +    * Hit, miss
 +    * Insertion, eviction
 +    * Write through vs write back
 +  * Locality - temporal and spatial
 +  * Working set
 +  * Address isolation
 +  * Copy on write
 +  * Before virtual memory
 +    * Single-user machine
 +    * Base and bound registers
 +    * Segmented address space
 +
 +===== Lecture 17 (2/27 Wed.) =====
 +  * Virtual memory
 +  * Segmentation
 +    * Segment selectors
 +    * Segment descriptors
 +    * Privilege level (ring)
 +  * Paging
 +    * Physical vs virtual address space
 +    * Physical vs virtual page
 +    * Virtual page number (VPN)
 +    * Physical page number (PPN)
 +    * Virtual page offset == physical page offset
 +    * Address translation - VPN -> PPN
 +    * Page Table
 +    * Multi-level page table
 +    * Translation Lookaside Buffer (TLB)
 +    * Homonym problem
 +
 +===== Lecture 18 (3/1 Fri.) =====
 +  * Translation
 +    * Two-level page table
 +    * Page directory
 +    * Multi-level page table
 +    * Page Directory Base Register
 +    * Translation: Segmentation + Paging
 +  * Protection
 +    * Privilege levels
 +    * Page Directory Entry (PDE)
 +    * Page Table Entry (PTE)
 +    * Read/Write
 +    * User/Supervisor
 +    * Protection: PDE + PTE
 +    * Protection: Segmentation + Paging
 +  * Translation Lookaside Buffer (TLB)
 +    * Context switch
 +    * Flush/invalidate
 +    * TLB miss: HW-managed vs. SW-managed
 +    * Page walk
 +    * TLB replacement
 +  * Page fault
 +    * Page fault handler
 +    * Demand paging
 +    * Swapping
 +    * Thrashing
 +  * Page size
 +    * Internal fragmentation
 +  * Memory Management Unit (MMU)
 +
 +===== Lecture 19 (3/18 Mon.) =====
 +  * Vector registers
 +    * Vector data register
 +    * Vector control registers
 +  * Vector functional units
 +  * Amdahl's Law
 +    * Sequential bottleneck
 +  * Vector memory system
 +    * Memory banking
 +    * Address generator
 +    * Base and stride
 +  * Vectorizable loops
 +  * Vector chaining
 +  * Vector stripmining
 +  * Scatter/gather operations
 +  * Masked operations
 +  * Matrix storage
 +    * Row major
 +    * Column major
 +  * Vector instruction level parallelism
 +  * Automatic code vectorization
 +  * Graphics processing units
 +    * Single instruction multiple threads
 +    * Thread warps
 +    * SIMT memory access
 +
 +===== Lecture 20 (3/20 Wed.) =====
 +  * SIMT (Single Instruction Multiple Thread)
 +  * SPMD (Single Procedure Multiple Data)
 +  * Thread warps
 +  * Branch divergence in warps
 +  * Branch divergence handing - dynamic predicated execution
 +  * Dynamic warp formation
 +  * Memory access divergence in warp
 +  * NVIDIA GPU terminology
 +    * Streaming multiprocessor (SM)
 +    * Warp
 +    * Thread context
 +  * Critical section
 +  * Heterogeneous processing (Asymmetric)
 +    * Dynamically select latency or throughput
 +  * Very long instruction word (VLIW)
 +    * Lock step execution
 +  * Decoupled access and execute
 +    * Dynamic vs static scheduling
 +  * Loop unrolling
 +  * Systolic arrays
 +    * WARP Computer
 +
 +===== Lecture 21 (3/25 Mon.) =====
 +  * Systolic architectures
 +  * Pipeline parallel execution
 +  * Decoupled Access/Execute
 +  * Static instruction scheduling
 +  * Common subexpression elimination
 +  * Loop unrolling
 +  * Speculative code motion
 +  * Trace scheduling
 +  * Data precedence graph
 +  * List scheduling
 +  * Superblock scheduling
 +  * Hyperblock formation
 +  * Block-structured ISA
 +  * Exception propagation
 +
 +===== Lecture 22 (3/27 Wed.) =====
 +  * Ideal memory
 +  * DRAM - Dynamic Random Access Memory
 +  * SRAM - Static Random Access Memory
 +  * Bit line
 +  * Row enable
 +
 +  * Sense amplifier
 +  * Row/column decoder
 +  * Phase change memory
 +  * DRAM refresh
 +  * Temporal locality
 +  * Spatial locality
 +  * Cache hierarchy
 +  * Manual vs automatically managed memory hierarchy
 +  * Cache block (line)
 +  * Cache hit/miss
 +  * Cache block eviction
 +  * Tag and data store
 +  * Cache associativity
 +  * Direct-mapped cache
 +  * Fully associative cache
 +  * Cache replacement policies
 +  * Least recently used (LRU)
 +  * Victim/next-victim policy
 +
 +===== Lecture 23 (3/29 Fri.) =====
 +  * Cache insertion policy
 +  * Cache replacement policy
 +  * Cache promotion policy
 +  * Non-temporal loads
 +  * Victim/Next-Victim replacement policy
 +  * Hybrid replacement policy - set sampling
 +  * Page replacement
 +  * Tag store entry
 +  * Inclusive/exclusive cache
 +  * Write-back/write-through cache
 +  * Allocate/no-allocate on write miss
 +  * Sectored cache
 +  * Cache dirty bit
 +  * Cache valid bit
 +  * Separate data and instruction caches vs unified caches
 +  * Multi-level caches
 +  * Homonym and synonym problem
 +  * Virtual-physical cache
 +
 +===== Lecture 24 (4/1 Mon.) =====
 +  * Virtual-physical cache
 +  * Virtual memory - DRAM interaction
 +  * Page coloring
 +  * Critical word first fill in
 +  * Cache subblocking
 +  * Cache size and associativity
 +  * Compulsory misses
 +  * Capacity misses
 +  * Conflict misses
 +  * Coherence miss (communication miss)
 +  * Stream prefetcher
 +  * Stride prefetcher
 +  * Cache working set
 +  * Victim cache
 +  * Hashing (pseudo-associativity)
 +  * Skewed associative caches
 +  * Data restructuring via software
 +  * Memory level parallelism (MLP)
 +  * MLP-aware cache replacement
 +
 +===== Lecture 25 (4/3 Wed.) =====
 +  * Memory level parallelism (MLP)
 +  * Miss Status Handling Register (MSHR)
 +  * Cache multi-porting (true and virtual)
 +  * Cache banking (interleaving)
 +  * DRAM organization
 +    * Channel
 +    * DIMM
 +    * Rank
 +    * Chip
 +    * Bank
 +    * Row/column
 +  * Row/column latch
 +  * Row buffer (sense amplifier)
 +  * DRAM commands
 +    * Activate
 +    * Read/write
 +    * Precharge
 +  * Row buffer conflict
 +  * DRAM refresh
 +  * DRAM access latency
 +  * DRAM address mapping - row interleaving or cache block interleaving
 +  * Virtual/physical address mapping
 +
 +===== Lecture 26 (4/8 Mon.) =====
 +  * DRAM refresh
 +  * Distributed refresh
 +  * DRAM refresh overhead (time and energy)
 +  * Bloom filter
 +    * Insert, test, remove all
 +  * Retention-Aware Intelligent DRAM Refresh (RAIDR)
 +    * Profile, Binning, Refresh
 +  * Flash memory
 +  * DRAM Controller
 +    * In chipset vs on CPU chip
 +  * DRAM scheduling policy
 +    * First come first serve (FCFS)
 +    * First ready, first come first serve (FR-FCFS)
 +  * DRAM row management policy - open row vs close row
 +  * DRAM timing constraints
 +  * DRAM power management
 +  * DRAM power state
 +
 +===== Lecture 27 (4/10 Wed.) =====
 +  * DRAM bank operation
 +  * Memory interference
 +  * Quality-of-service unaware memory control
 +  * Stall-time fairness in shared DRAM
 +  * STFM (Stall Time Fair Memory) scheduling algorithm
 +  * Memory bank parallelism of threads
 +  * PAR-BS (Parallelism-Aware Batch Scheduling)
 +  * ATLAS Memory Scheduler
 +  * Throughput vs fairness
 +  * Thread Cluster Memory Scheduling
 +    * Quantum-based operation
 +  * Misses per kilo-instruction (MPKI)
 +  * Lottery scheduling
 +  * Memory channel partitioning
 +  * Throttling of source/core
 +  * Data mapping to banks/channels/ranks
 +  * Request prioritization
 +  * Bottleneck Identification and Scheduling (BIS)
 +
 +===== Lecture 28 (4/12 Fri.) =====
 +  * Inter-thread/application interference
 +  * Utility-based cache partitioning
 +  * Non-uniform memory access (NUMA)
 +  * Smart/dumb resources
 +  * Full-window stall
 +  * Memory latency tolerance
 +    * Caching
 +    * Prefetching
 +    * Multithreading
 +    * Out of order execution 
 +  * Runahead execution
 +  * Runahead cache
 +  * Cache working set
 +  * Dependent cache misses
 +  * Address-value delta
 +  * Traversal address load
 +  * Leaf address load
 +
 +===== Lecture 29 (4/15 Mon.) =====
 +  * Prefetching
 +  * Compulsory cache misses
 +  * Prefetch algorithm
 +  * Early/late prefetches
 +  * Prefetch distance
 +  * Prefetch aggressiveness
 +  * Cache pollution
 +  * Prefetch buffer
 +  * Decoupled fetch
 +  * Prefetch destination
 +  * Prefetch coverage
 +  * Prefetch accuracy
 +  * Prefetch timeliness
 +  * Software prefetching / hardware prefetching / execution-based prefetching
 +  * Next-line prefetcher
 +  * Instruction based stride prefetching
 +  * Stream buffer
 +
 +===== Lecture 30 (4/22 Mon.) =====
 +  * Prefetch bandwidth consumption
 +  * Feedback-directed prefetcher throttling
 +  * Prefetch insertion location
 +  * Prefetch irregular address patterns
 +  * Markov prefetching
 +  * Content directed prefetching
 +  * Execution-based prefetching
 +  * Thread-based pre-execution
 +  * Simultaneous multithreading
 +  * ISA extensions for prefetching
 +  * Pre-execution slice
 +  * Slipstream processing
 +  * Parallel computing
 +  * Loosely coupled vs tightly coupled multiproecssor
 +  * Message passing
 +  * Cache coherence
 +  * Ordering of memory operations
 +  * Processor load imbalance
 +  * Processor utilization / redundancy / efficiency
 +  * Amdahl's Law
 +  * Sequential bottleneck
 +
 +===== Lecture 31 (4/24 Wed.) =====
 +  * Bottlenecks in parallel execution
 +  * Ordering of memory operations
 +  * Deterministic execution
 +  * Protection of shared data
 +  * Mutual exclusion
 +  * Sequential consistency
 +  * Total global order requirement
 +  * Cache coherence
 +  * Snooping bus
 +  * Directory-based cache coherence
 +  * Update vs Invalidate
 +  * MESI Protocol (Modified, Exclusive, Shared, Invalid)
 +  * Read-exclusive (write)
 +  * Exclusive bit
 +  * MOESI (add Owned state)
 +
 +===== Lecture 32 (4/26 Fri.) =====
 +  * Snoopy cache vs Directory Coherence
 +  * Set inclusion test
 +  * Contention resolution
 +  * Negative acknowledgement (nack)
 +  * Coherence granularity
 +  * False sharing
 +  * Interconnection networks
 +  * Topology
 +  * Routing (algorithm)
 +  * Buffering and flow control
 +  * Point-to-point
 +  * Crossbar
 +  * Buffered/bufferless networks
 +  * Flow control
 +  * Multistage logarithm networks
 +  * Circuit vs packet switching
 +  * Delta network
 +  * Ring network
 +  * Unidirectional ring
 +  * Mesh
 +  * Torus
 +  * Trees / Fat trees
 +  * Hypercube
 +  * Bufferless deflection routing
 +  * Dimension-order routing
 +  * Deadlock vs livelock
 +  * Valiant's algorithm
 +  * Adaptive vs oblivious routing
 +
 +===== Lecture 33 (4/29 Mon.) =====
 +  * Serialized code sections
 +  * Critical section
 +  * Barrier
 +  * Limiter stages in pipelined programs
 +  * Trace cache
 +  * Large vs small core
 +  * Asymmetric Chip Multiprocessor (ACMP)
 +  * Accelerating Critical Sections
 +  * False serialization
 +  * Shared vs private data
 +  * Data Marshalling
 +  * Bottleneck Identification and Scheduling (BIS)
 +  * Bottleneck Table
 +  * Acceleration Index Table
 +
 +===== Lecture 34 (5/1 Wed.) =====
 +  * DRAM technology scaling
 +  * Emerging memory technologies
 +  * Phase change memory (PCM)
 +  * Memristors
 +  * Memory capacity
 +  * Memory latency
 +  * Memory endurance
 +  * Memory idle power
 +  * Hybrid memory system
 +  * Replacing DRAM with PCM
 +  * Row-locality Aware Data Placement
 +  * DRAM cache with metadata store
 +  * TIMBER Tag Management
 +  * Security challenges of emerging technologies
buzzwords.1359145153.txt.gz ยท Last modified: 2013/01/25 15:19 by justinme