This class meets TBA at Gates TBD.
Teaching Assistant
Schedule
| Lecture | Topic | Reading | Spatial Assignment |
|---|---|---|---|
| 1 | Introduction, role of hardware accelerators in post Dennard and Moore era |
Is Dark silicon useful? Hennessy Patterson Chapter 7.1-7.2 |
|
| 2 | Classical ML algorithms: Regression, SVMs (What is the building block?) |
TABLA | |
| 3 | Linear algebra fundamentals and accelerating linear algebra BLAS operations 20th century techniques: Systolic arrays and MIMDs, CGRAs |
Why Systolic Architectures? Anatomy of high performance GEMM |
Linear Algebra Accelerators |
| 4 | Evaluating Performance, Energy efficiency, Parallelism, Locality, Memory hierarchy, Roofline model |
Dark Memory | |
| 5 | Real-World Architectures: Putting it into practice Accelerating GEMM: Custom, GPU, TPU1 architectures and their GEMM performance |
Google TPU Codesign Tradeoffs NVIDIA Tesla V100 |
|
| 6 | Neural networks: MLPs and CNNs Inference | IEEE proceeding Brooks’s book (Selected Chapters) |
CNN Inference Accelerators |
| 7 | (2 Lectures) Accelerating Inference for CNNs: Blocking and Parallelism in practice DianNao, Eyeriss, TPU1 |
Systematic Approach to Blocking Eyeriss Google TPU (see lecture 5) |
|
| 8 | Modeling neural networks with Spatial, Analyzing performance and energy with Spatial |
Spatial One related work |
|
| 9 | Training: SGD, back propagation, statistical efficiency, batch size | NIPS workshop last year Graphcore |
Training Accelerators |
| 10 | Resilience of DNNs: Sparsity and Low Precision Networks | Some theory paper EIE Flexpoint of Nervana Boris Ginsburg: paper, presentation LSTM Block Compression by Baidu? |
|
| 11 | Low precision training | HALP Ternary or binary networks See Boris Ginsburg's work (lecture 10) |
|
| 12 | Training in Distributed and Parallel systems: Hogwild!, asynchrony and hardware efficiency |
Deep Gradient compression Hogwild! Large Scale Distributed Deep Networks Obstinate cache? |
|
| 13 | FPGAs and CGRAs: Catapult, Brainwave, Plasticine | Catapult Brainwave Plasticine |
|
| 14 | ML benchmarks: DAWNbench, MLPerf | DawnBench MLPerf |
|
| 15 | Project presentations |
Guest Lectures

