Differences

This shows you the differences between two versions of the page.

--- buzzword [2015/04/10 16:38]
kevincha [Lecture 28 (4/8 Wed.)]
+++ buzzword [2015/04/27 18:20] (current)
rachata
@@ Line 1250: / Line 1250: @@
     * Debugging is also difficult (non-deterministic behavior)
   * Dekker's algorithm
-    * Inconsistency -- the two processors did NOT see the same order of
+    * Inconsistency -- the two processors did NOT see the same order of operations to memory
-      operations to memory
   * Sequential consistency
     * Multiple correct global orders
@@ Line 1281: / Line 1280: @@
         * Invalidates the block on a write
         * Has an exclusive state
+===== Lecture 29 (4/10 Fri.) =====
+  * MSI coherent protocol
+    * The problem: unnecessary broadcasts of invalidations
+  * MESI coherent protocol
+    * Add the exclusive state: this is the only cache copy and it is a clean state to MSI
+    * Multiple invalidation tradeoffs
+    * Problem: memory can be unnecessarily updated
+    * A possible owner state (MOESI)
+  * Tradeoffs between snooping and directory based coherence protocols
+    * Slide 31 has a good summary
+  * Directory: data structures
+    * Bit vectors vs. linked lists
+  * Scalability of directories
+    * Size? Latency? Thousand of nodes? Best of both snooping and directory?
+===== Lecture 30 (4/13 Mon.) =====
+  * In-memory computing
+  * Design goals of DRAM
+  * DRAM structures
+    * Banks
+    * Capacitors and sense amplifiers
+    * Trade-offs b/w number of sense amps and cells
+    * Width of bank I/O vs. row size
+  * DRAM operations
+    * ACTIVATE, READ/WRITE, and PRECHARGE
+  * Trade-offs
+    * Latency
+    * Bandwidth: Chip vs. rank vs. bank
+      * What's the benefit of having 8 chips?
+    * Parallelism
+  * RowClone
+    * What are the problems?
+    * Copying b/w two rows that share the same sense amplifier
+    * System software support
+  * Bitwise AND/OR
+===== Lecture 31 (4/15 Wed.) =====
+  * Application slowdown
+  * Interference between different applications
+    * Applications' performance depends on other applications that they are running with
+  * Predictable performance
+    * Why are they important?
+    * Applications that need predictibility
+    * How to predict the performance?
+      * What information are useful?
+      * What need to be guarantee?
+      * How to estimate the performance when running with others?
+        * Easy, just measure the performance while it is running.
+      * How to estimate the performance when the application is running by itself.
+        * Hard if there is no profiling.
+      * The relationship between memory service rate and the performance.
+        * Key assumption: applications are memory bound
+    * Behavior of memory-bound applications
+      * With and without interference
+  * Memory phase vs. compute phase
+  * MISE
+    * Estimating slowdown using request service rate
+    * Inaccuracy when measuring request service rate alone
+    * Non-memory-bound applications
+    * Control slowdown and provide soft guarantee
+  * Taking into account of the shared cache
+    * MISE model + cache resource management
+    * Aug tag store
+      * Separate tag store for different cores
+    * Cache access rate alone and shared as the metric to estimate slowdown
+  * Cache paritiioning
+    * How to determine partitioning
+      * Utility based cache partitioning
+      * Others
+  * Maximum slowdown and fairness metric
+===== Lecture 32 (4/20 Mon.) =====
+  * Heterogeneous systems
+    * Asymmetric cores: different types of cores on the chip
+      * Each of these cores are optimized for different workloads/requirements/goals
+      * Multiple special purpose processors
+      * Flexible and can adapt to workload behavior
+      * Disadvantages: complex and high overhead
+    * Examples: CPU-GPU systems, heterogeneity in execution models
+    * Heterogeneous resources
+      * Example: reliable and non-reliable DRAM in the same system
+  * Key problems in modern systems
+    * Memory system
+    * Efficiency
+    * Predictability
+    * Assymmetric design can help solving these problems
+  * Serialized code sections
+    * Bottleneck in multicore execution
+    * Parallelizable vs. serial portion
+    * Accelerate critical section
+    * Cache ping-ponging
+      * Synchronization latency
+    * Symmetric vs. assymmetric design
+  * Large cores + small cores
+    * Core assymmetry
+  * Amdahl's law with heterogeneous cores
+  * Parallel bottlenecks
+    * Resource contention
+      * Depends on what are running
+  * Accelerated critical section
+    * Ship critical sections to large cores
+    * Small modifications and low overhead
+    * False serialization might become the bottleneck
+    * Can reduce parallel throughput
+    * Effect on private cache misses and shared cache misses
+===== Lecture 33 (4/27 Mon.) =====
+  * Interconnects
+    * Connecting multiple components together
+    * Goal: Scalability, flexibility, performance and energy efficiency
+  * Metric: Performance, bandwidth, bisection bandwidth, cost, energy efficienct, system performance, contention, latency
+    * Saturation point
+      * Saturation throughput
+  * Topology
+    * How to wire components together, affects routing, throughput, latency
+    * Bus: All nodes connected to a single ring
+      * Hard to increase frequency, bandwidth, poor scalability but simple
+    * Point-to-point
+      * Low contention and potentially low latency. Costly, not scalable and hard to wire.
+    * Crossbar
+      * No contention. Concurrent request from different src/dest can be sent concurrently. Costly.
+    * Multistage logarithmic network
+      * Indirect network, low contention, multiple request can be sent concurrently. More scalable compared to crossbar.
+      * Circuit switch
+      * Omega network, delta network.
+      * Butterfly network
+      * Intermediate switch between sources and destinations
+    * Switching vs. topology
+    * Ring
+      * Each node connected to two other nodes, forming a ring
+      * Low overhead, high latency, not as scalable.
+      * Unidirectional ring and bi-directional ring
+    * Hierarchical Rings
+      * Layers of rings. More scalable, lower latency.
+      * Bridge router connect multiple rings together
+    * Mesh
+      * 4 input and output ports
+      * More bisection bandwidth and more scalable
+      * Easy to layout
+      * Path diversity
+      * Routers are more complex
+    * Tree
+      * Another hierarchical topology
+      * Specialized topology
+      * Good for local traffic
+      * Fat tree: higher level have more bandwidth
+      * CM-5 Fat tree
+        * Fat tree with 4x2 switches
+    * Hypercube
+      * N-Dimensional cubes
+      * Caltech cosmic cube
+      * Very complex
+  * Routing algorithm
+    * How does message get sent from source to destination
+    * Static or adaptive
+    * Handling contention
+      * Buffering helps handling contention, but adds complexity
+    * Three types of routing algorithms
+      * Deterministic: always takes the same path
+      * Oblivious: takes different paths without taking into account of the state of the network
+        * For example, Valiant algorithm
+      * Adaptive: takes different paths taking into account of the state of the network
+        * Non-minimal adaptive routing vs. minimal adaptive routing
+      * Minimal path: path that has minimum number of hops
+  * Buffering and flow control
+    * How to store within the network
+    * Handling oversubscription
+    * Source throttling
+    * Bufferless vs. buffered crossbars
+    * Buffer overflow
+    * Bufferless deflection routing
+      * Deflect packets when there is contention
+      * Hot-potato routing

18-447 Introduction to Computer Architecture – Spring 2015

User Tools

Site Tools

Differences

Page Tools