User Tools

Site Tools


buzzwords

Buzzwords

Buzzwords are terms that are mentioned during lecture which are particularly important to understand thoroughly. This page tracks the buzzwords for each of the lectures and can be used as a reference for finding gaps in your understanding of course material.

Lecture 1 (1/14 Mon.)

  • Levels of transformation: Problem, algorithm, program/language, runtime system, instruction set architecture, microarchitecture, logic, circuits, electrons
  • Hamming distance
  • Error correcting codes
  • Three components of an algorithm: Effective computability, definite and precise, ends
  • Abstraction
  • High-/low-level programming languages
  • Multi-core system
  • Unfairness
  • DRAM (high level overview): banks, rows, columns, row hit/conflict, row buffer locality, memory controller, FR-FCFS
  • High versus low memory intensity
  • Sequential (streaming) versus random memory accesses
  • Predicated execution
  • Static versus dynamic scheduling
  • Compile versus run time
  • Computer architecture
  • Programmable versus fixed-function processor

Lecture 2 (1/16 Wed.)

  • Moore's Law
  • Reliability
  • Memory Wall
  • Programmability Wall
  • Design complexity
  • Power/energy constraints
  • Programmability Wall
  • Tradeoffs
  • Three key components of computing: Computation, communication, storage (memory)
  • von Neumann model of computation
  • Stored program
  • Instruction pointer
  • Sequential instruction processing
  • Data flow model of computation
  • Precise state
  • Barrier synchronization
  • Relational and conditional operations
  • Control versus data-driven execution
  • Instruction set architecture
  • Microarchitecture
  • The Connection Machine
  • Bit serial, ripple carry, carry lookahead adder
  • High-level overview of hardware exposure (we'll go into these in more detail in future lectures): pipelining, out-of-order, memory access scheduling, speculative execution, superscalar processing, clock gating, caching, prefetching, voltage/frequency scaling, error correction
  • Very Large Instruction Word (VLIW) processors
  • Design points: Cost, performance, maximum power consumption, energy consumption, availability, reliability and correctness, time to market

Lecture 3 (1/18 Fri.)

  • SIMD
  • Bit steering
  • Instruction sequencing model
  • Instruction processing style
  • Live in, live out
  • Accumulator
  • Semantic gap
  • ISA translation layer
  • Micro-ISA
  • Control signals
  • Data types in ISA
  • RISC versus CISC ISAs
  • Open microcode
  • Memory organization
  • Memory addressability
  • Virtual memory
  • Big versus little endian
  • Programmer visible state
  • Load/store vs memory/memory architectures
  • Addressing mode
  • Orthogonal ISA
  • Privilege modes
  • Exception and interrupt handling
  • Access protection

Lecture 4 (1/23 Wed.)

  • Autoincrement addressing mode
  • Complex versus simple instructions
  • Orthogonality
  • ISA-level tradeoffs
  • Semanic gap
  • Interrupts and exceptions
  • CISC versus RISC
  • ISA translation
  • Microoperations
  • Code morphing software
  • Fixed versus variable length instructions
  • Uniform versus non-uniform decode
  • Huffman encoding: Compact encoding for Intel 432 instructions
  • MIPS Instruction Format: R-type, I-type, J-type
  • Tradeoffs regarding number of registers

Lecture 5 (1/25 Fri.)

  • x86 addressing modes
  • MIPS (Microprocessor without Interlocked Pipeline Stages)
  • Hardware interlocks versus software-guaranteed interlocking
  • Cache coherence
  • Virtual memory versus overlay programming
  • Memory load/store alignment
  • Microarchitectural state
  • State and next-state logic
  • Single- versus multi-cycle machine
  • Combinational/sequential logic
  • Critical path
  • Machine versus instruction cycle
  • Instruction processing cycle: Fetch, decode, evaluate address, fetch operands, execute, store result
  • Functional units
  • Datapath versus controlpath
  • Hardwired/combinational versus microcoded/microprogrammed control
  • Cycles per Instruction (CPI)
  • Instruction execution time: CPI * clock_cycle_time
  • Program execution time: #instructions * average_CPI * clock_cycle_time
  • Branch delay slot

Lecture 6 (1/28 Mon.)

  • Single-cycle microarchitecture
  • Instruction processing cycle
  • Combinational logic - hardwired control
  • Sequential logic - microprogrammed control
  • Critical path
  • Memory latency
  • Microarchitecture design principles - critical path design, common case design, balanced design
  • Cycles per Instruction (CPI) vs frequency
  • Pipelining
  • Program execution time: #instructions * average_CPI * clock_cycle_time
  • Multi-cycle microarchitecture
  • Instruction processing cycle
  • Microinstruction
  • Microsequencing
  • Control store
  • Microsequencer
  • Condition codes
  • Simple LC-3b control datapath

Lecture 7 (1/30 Wed.)

  • Multi-cycle microarchitecture
  • Instruction processing cycle
  • Behavior of the entire [multi-cycle microarchitecture] processor is specified by a finite state machine
  • Microinstruction
  • Microsequencing
  • Control store
  • Microsequencer
  • Tri-state buffer
  • Bus gating
  • Difference between gating and loading

Lecture 8 (2/4 Mon.)

  • Interrupt checking
  • Unaligned memory accesses
  • Memory-mapped I/O
  • Updating/patching microcode on the field
  • Horizontal/vertical microcode
  • Nanocode and millicode
  • Pipelining
  • Ideal pipeline - identical operations, independent operations, uniformly partitionable suboperations
  • Pipeline registers
  • Pipeline control signals - decode once and buffer, or carry instructions and decode locally
  • Pipeline external fragmentation (pipeline stages idle for some instructions)
  • Pipeline internal fragmentation (some pipeline states too fast while clock cycle same)
  • Inter-instruction dependencies need to be detected and handled
  • Issues in pipeline design - number of stages, keeping pipeline correct and full, handling exceptions and interrupts
  • Causes of pipeline stalls - resource contention, dependencies (control, data)
  • Handling resource contention - duplicate resource, increase throughput, detect contention and stall contending stage
  • Data dependencies - flow dependence (true dependence, read after write), output dependence (write after write), anti dependence (write after read)

Lecture 9 (2/6 Wed.)

  • Anti and output dependence - limited number of architectural registers
  • Control flow graph
  • Compiler profiling
  • Profile input set vs runtime input set
  • Load hoisting
  • Handling flow dependencies - detect and wait, detect and forward, detect and eliminate, predict and verify
  • Fine-grained multithreading
  • Software vs hardware based interlocking
  • Scoreboarding
  • Register renaming
  • Combinational dependence check logic
  • Register data forwarding
  • Control dependence
  • Pipeline stalls
  • Data forwarding distance
  • Data forwarding logic

Lecture 10 (2/8 Fri.)

  • Register renaming
  • Static vs dynamic scheduling
  • Branch types - conditional, unconditional, call, return, indirect
  • Handling control dependencies
    • Branch prediction
    • Stalling
    • Branch delay slot
    • Predicated execution
    • Fine-grained multithreading
    • Multipath execution
  • Pipeline flushing
  • Branch misprediction penalty
  • Forward vs backward control flow
  • Decrease number of branches - get rid of control flow instructions, convert control dependence to data dependence, predicate combining
  • Wish branches - choose predicated execution or branch prediction
  • Delayed branching with squashing
  • Enhanced branch prediction - need to predict target address, branch direction, whether instruction is branch
  • Branch Target Buffer (BTB) or Branch Target Address Cache
  • Compile time vs run time branch direction prediction

Lecture 11 (2/11 Mon.)

  • Static branch prediction
    • Always not-taken
    • Always taken
    • Backward taken, forward not taken
    • Profile-based (compiler)
    • Program-based (program analysis based)
    • Pragmas - programmer conveys hints
  • Dynamic branch prediction
    • Last time predictor
    • Two-bit counter based prediction (saturating counter)
  • Global branch correlation
    • Global History Register (GHR)
    • Pattern History Table (PHT)
    • Intel Pentium Pro Branch Predictor - multiple PHTs
    • Gshare predictor - GHR hashed with Branch PC
    • Two-level Global History Predictor
  • Local branch correlation
    • Per-branch history register
  • Hybrid branch predictor - multiple algorithms, choose “best” prediction
  • Branch confidence estimation
  • Branch misprediction penalty
  • Alpha 21264 Tournament Predictor
  • SPEC - Standard Performance Evaluation Corporation (CPU benchmark)

Lecture 12 (2/13 Wed.)

  • Predicated execution - compiler converts control dependence into data dependence
  • Conditional execution in ARM ISA
  • Hammock branch
  • Wish jump/join
  • Multi-path execution
  • Call and return prediction
    • Direct calls - easy to predict
    • Returns are indirect branches
    • Prediction - Return Address Stack
  • Indirect branch prediction
    • Last resolved target
    • History based target prediction
  • Superscalar processor
  • Multiple instruction fetch
  • Multi-cycle execution
  • Exceptions vs interrupts
  • Precise exceptions/interrupts
    • Make each operation take same amount of time time
    • Reorder buffer (ROB)
    • History buffer
    • Future register file
    • Checkpointing
  • Instruction retirement (commit)

Lecture 13 (2/15 Fri.)

  • Reorder buffer (ROB)
  • Accessing ROB with register file
    • Use indirection from RF to ROB
  • Register renaming with ROB
    • Architectural register ID → Physical register ID
    • Eliminates false dependencies
  • In-order execution, out-of-order completion, in-order retirement
  • History buffer (HB)
  • Future file + ROB
  • Checkpointing
  • Maintaining speculative memory states
    • Use reorder buffer to handle out-of-order memory operations
    • Store/write buffer

Lecture 14 (2/18 Mon.)

  • Preventing dispatch stalls
    • Fine-grained multithreading
    • Value prediction
      • Stride predictor
    • Compile-time instruction reordering
  • Out of order execution (Dynamic scheduling)
    • Restricted dataflow
    • Latency tolerance
  • Reservation station
    • Reservation station entry
  • Register renaming
    • Tags associated with register value
  • Register alias table (RAT)
  • Tomasulo's algorithm
  • Instruction window size
  • Registers vs memory
  • Memory dependence handling
  • Memory disambiguation / unknown address problem
  • Dependence of memory instructions (loads/stores)

Lecture 15 (2/20 Wed.)

  • Out of order execution
  • Memory dependence handling
  • Content addressable memory
  • Memory disambiguation
  • Reservation stations
  • Branch prediction
  • Superscalar execution vs OoO execution
  • Instruction level parallelism
  • Dataflow (at the ISA level)
  • Systolic arrays
  • Stream processing
  • Pool of unmatched tokens
  • Token matching area
  • Instruction fetch area
  • MIT tagged token data flow architecture
  • Irregular parallelism
  • Data parallelism
  • SIMD (Single instruction multiple data)
  • Array vs vector preocessors
  • VLIW (Very long instruction word)
  • Vector precessor
    • Vector registers
    • Vector length register (VLEN)
    • Vector stride register (VSTR)

Lecture 16 (2/25 Mon.)

  • Virtual memory
    • Virtual page vs physical page (frame)
    • Address translation
    • Page Table (Virtual addr → Physical addr)
    • Page fault
      • Reads data from disk to memory
      • Direct Memory Access
  • Caches
    • Block, set
    • Block/line size, associativity
    • Hit, miss
    • Insertion, eviction
    • Write through vs write back
  • Locality - temporal and spatial
  • Working set
  • Address isolation
  • Copy on write
  • Before virtual memory
    • Single-user machine
    • Base and bound registers
    • Segmented address space

Lecture 17 (2/27 Wed.)

  • Virtual memory
  • Segmentation
    • Segment selectors
    • Segment descriptors
    • Privilege level (ring)
  • Paging
    • Physical vs virtual address space
    • Physical vs virtual page
    • Virtual page number (VPN)
    • Physical page number (PPN)
    • Virtual page offset == physical page offset
    • Address translation - VPN → PPN
    • Page Table
    • Multi-level page table
    • Translation Lookaside Buffer (TLB)
    • Homonym problem

Lecture 18 (3/1 Fri.)

  • Translation
    • Two-level page table
    • Page directory
    • Multi-level page table
    • Page Directory Base Register
    • Translation: Segmentation + Paging
  • Protection
    • Privilege levels
    • Page Directory Entry (PDE)
    • Page Table Entry (PTE)
    • Read/Write
    • User/Supervisor
    • Protection: PDE + PTE
    • Protection: Segmentation + Paging
  • Translation Lookaside Buffer (TLB)
    • Context switch
    • Flush/invalidate
    • TLB miss: HW-managed vs. SW-managed
    • Page walk
    • TLB replacement
  • Page fault
    • Page fault handler
    • Demand paging
    • Swapping
    • Thrashing
  • Page size
    • Internal fragmentation
  • Memory Management Unit (MMU)

Lecture 19 (3/18 Mon.)

  • Vector registers
    • Vector data register
    • Vector control registers
  • Vector functional units
  • Amdahl's Law
    • Sequential bottleneck
  • Vector memory system
    • Memory banking
    • Address generator
    • Base and stride
  • Vectorizable loops
  • Vector chaining
  • Vector stripmining
  • Scatter/gather operations
  • Masked operations
  • Matrix storage
    • Row major
    • Column major
  • Vector instruction level parallelism
  • Automatic code vectorization
  • Graphics processing units
    • Single instruction multiple threads
    • Thread warps
    • SIMT memory access

Lecture 20 (3/20 Wed.)

  • SIMT (Single Instruction Multiple Thread)
  • SPMD (Single Procedure Multiple Data)
  • Thread warps
  • Branch divergence in warps
  • Branch divergence handing - dynamic predicated execution
  • Dynamic warp formation
  • Memory access divergence in warp
  • NVIDIA GPU terminology
    • Streaming multiprocessor (SM)
    • Warp
    • Thread context
  • Critical section
  • Heterogeneous processing (Asymmetric)
    • Dynamically select latency or throughput
  • Very long instruction word (VLIW)
    • Lock step execution
  • Decoupled access and execute
    • Dynamic vs static scheduling
  • Loop unrolling
  • Systolic arrays
    • WARP Computer

Lecture 21 (3/25 Mon.)

  • Systolic architectures
  • Pipeline parallel execution
  • Decoupled Access/Execute
  • Static instruction scheduling
  • Common subexpression elimination
  • Loop unrolling
  • Speculative code motion
  • Trace scheduling
  • Data precedence graph
  • List scheduling
  • Superblock scheduling
  • Hyperblock formation
  • Block-structured ISA
  • Exception propagation

Lecture 22 (3/27 Wed.)

  • Ideal memory
  • DRAM - Dynamic Random Access Memory
  • SRAM - Static Random Access Memory
  • Bit line
  • Row enable
  • Sense amplifier
  • Row/column decoder
  • Phase change memory
  • DRAM refresh
  • Temporal locality
  • Spatial locality
  • Cache hierarchy
  • Manual vs automatically managed memory hierarchy
  • Cache block (line)
  • Cache hit/miss
  • Cache block eviction
  • Tag and data store
  • Cache associativity
  • Direct-mapped cache
  • Fully associative cache
  • Cache replacement policies
  • Least recently used (LRU)
  • Victim/next-victim policy

Lecture 23 (3/29 Fri.)

  • Cache insertion policy
  • Cache replacement policy
  • Cache promotion policy
  • Non-temporal loads
  • Victim/Next-Victim replacement policy
  • Hybrid replacement policy - set sampling
  • Page replacement
  • Tag store entry
  • Inclusive/exclusive cache
  • Write-back/write-through cache
  • Allocate/no-allocate on write miss
  • Sectored cache
  • Cache dirty bit
  • Cache valid bit
  • Separate data and instruction caches vs unified caches
  • Multi-level caches
  • Homonym and synonym problem
  • Virtual-physical cache

Lecture 24 (4/1 Mon.)

  • Virtual-physical cache
  • Virtual memory - DRAM interaction
  • Page coloring
  • Critical word first fill in
  • Cache subblocking
  • Cache size and associativity
  • Compulsory misses
  • Capacity misses
  • Conflict misses
  • Coherence miss (communication miss)
  • Stream prefetcher
  • Stride prefetcher
  • Cache working set
  • Victim cache
  • Hashing (pseudo-associativity)
  • Skewed associative caches
  • Data restructuring via software
  • Memory level parallelism (MLP)
  • MLP-aware cache replacement

Lecture 25 (4/3 Wed.)

  • Memory level parallelism (MLP)
  • Miss Status Handling Register (MSHR)
  • Cache multi-porting (true and virtual)
  • Cache banking (interleaving)
  • DRAM organization
    • Channel
    • DIMM
    • Rank
    • Chip
    • Bank
    • Row/column
  • Row/column latch
  • Row buffer (sense amplifier)
  • DRAM commands
    • Activate
    • Read/write
    • Precharge
  • Row buffer conflict
  • DRAM refresh
  • DRAM access latency
  • DRAM address mapping - row interleaving or cache block interleaving
  • Virtual/physical address mapping

Lecture 26 (4/8 Mon.)

  • DRAM refresh
  • Distributed refresh
  • DRAM refresh overhead (time and energy)
  • Bloom filter
    • Insert, test, remove all
  • Retention-Aware Intelligent DRAM Refresh (RAIDR)
    • Profile, Binning, Refresh
  • Flash memory
  • DRAM Controller
    • In chipset vs on CPU chip
  • DRAM scheduling policy
    • First come first serve (FCFS)
    • First ready, first come first serve (FR-FCFS)
  • DRAM row management policy - open row vs close row
  • DRAM timing constraints
  • DRAM power management
  • DRAM power state

Lecture 27 (4/10 Wed.)

  • DRAM bank operation
  • Memory interference
  • Quality-of-service unaware memory control
  • Stall-time fairness in shared DRAM
  • STFM (Stall Time Fair Memory) scheduling algorithm
  • Memory bank parallelism of threads
  • PAR-BS (Parallelism-Aware Batch Scheduling)
  • ATLAS Memory Scheduler
  • Throughput vs fairness
  • Thread Cluster Memory Scheduling
    • Quantum-based operation
  • Misses per kilo-instruction (MPKI)
  • Lottery scheduling
  • Memory channel partitioning
  • Throttling of source/core
  • Data mapping to banks/channels/ranks
  • Request prioritization
  • Bottleneck Identification and Scheduling (BIS)

Lecture 28 (4/12 Fri.)

  • Inter-thread/application interference
  • Utility-based cache partitioning
  • Non-uniform memory access (NUMA)
  • Smart/dumb resources
  • Full-window stall
  • Memory latency tolerance
    • Caching
    • Prefetching
    • Multithreading
    • Out of order execution
  • Runahead execution
  • Runahead cache
  • Cache working set
  • Dependent cache misses
  • Address-value delta
  • Traversal address load
  • Leaf address load

Lecture 29 (4/15 Mon.)

  • Prefetching
  • Compulsory cache misses
  • Prefetch algorithm
  • Early/late prefetches
  • Prefetch distance
  • Prefetch aggressiveness
  • Cache pollution
  • Prefetch buffer
  • Decoupled fetch
  • Prefetch destination
  • Prefetch coverage
  • Prefetch accuracy
  • Prefetch timeliness
  • Software prefetching / hardware prefetching / execution-based prefetching
  • Next-line prefetcher
  • Instruction based stride prefetching
  • Stream buffer

Lecture 30 (4/22 Mon.)

  • Prefetch bandwidth consumption
  • Feedback-directed prefetcher throttling
  • Prefetch insertion location
  • Prefetch irregular address patterns
  • Markov prefetching
  • Content directed prefetching
  • Execution-based prefetching
  • Thread-based pre-execution
  • Simultaneous multithreading
  • ISA extensions for prefetching
  • Pre-execution slice
  • Slipstream processing
  • Parallel computing
  • Loosely coupled vs tightly coupled multiproecssor
  • Message passing
  • Cache coherence
  • Ordering of memory operations
  • Processor load imbalance
  • Processor utilization / redundancy / efficiency
  • Amdahl's Law
  • Sequential bottleneck

Lecture 31 (4/24 Wed.)

  • Bottlenecks in parallel execution
  • Ordering of memory operations
  • Deterministic execution
  • Protection of shared data
  • Mutual exclusion
  • Sequential consistency
  • Total global order requirement
  • Cache coherence
  • Snooping bus
  • Directory-based cache coherence
  • Update vs Invalidate
  • MESI Protocol (Modified, Exclusive, Shared, Invalid)
  • Read-exclusive (write)
  • Exclusive bit
  • MOESI (add Owned state)

Lecture 32 (4/26 Fri.)

  • Snoopy cache vs Directory Coherence
  • Set inclusion test
  • Contention resolution
  • Negative acknowledgement (nack)
  • Coherence granularity
  • False sharing
  • Interconnection networks
  • Topology
  • Routing (algorithm)
  • Buffering and flow control
  • Point-to-point
  • Crossbar
  • Buffered/bufferless networks
  • Flow control
  • Multistage logarithm networks
  • Circuit vs packet switching
  • Delta network
  • Ring network
  • Unidirectional ring
  • Mesh
  • Torus
  • Trees / Fat trees
  • Hypercube
  • Bufferless deflection routing
  • Dimension-order routing
  • Deadlock vs livelock
  • Valiant's algorithm
  • Adaptive vs oblivious routing

Lecture 33 (4/29 Mon.)

  • Serialized code sections
  • Critical section
  • Barrier
  • Limiter stages in pipelined programs
  • Trace cache
  • Large vs small core
  • Asymmetric Chip Multiprocessor (ACMP)
  • Accelerating Critical Sections
  • False serialization
  • Shared vs private data
  • Data Marshalling
  • Bottleneck Identification and Scheduling (BIS)
  • Bottleneck Table
  • Acceleration Index Table

Lecture 34 (5/1 Wed.)

  • DRAM technology scaling
  • Emerging memory technologies
  • Phase change memory (PCM)
  • Memristors
  • Memory capacity
  • Memory latency
  • Memory endurance
  • Memory idle power
  • Hybrid memory system
  • Replacing DRAM with PCM
  • Row-locality Aware Data Placement
  • DRAM cache with metadata store
  • TIMBER Tag Management
  • Security challenges of emerging technologies
buzzwords.txt · Last modified: 2013/05/01 14:05 by jasonli1