1 Effects of Limiting Numerical Precision on Neural NetworksAn Empirical Study on Deep Learning Accelerators
2 Deep Learning Applications Neural NetworkCompute-intensive embedded projects like Drones, Autonomous robotic systems, Mobile medical imaging, and Intelligent Video Analytics (IVA). OEMs, independent developers, Neural Network Training Inference
3 Problem Statement (New age) Neural Network Architects are having a hard time Training Time Compute Resources Hyperparameter search … Inference Power Budget Accuracy
4 Problem Statement (New age) Neural Network Architects are having a hard time Training Time Compute Resources Hyperparameter search … Inference Power Budget Accuracy
5 Lecture: Tools to Explore AcceleratorsWe saw Minerva In Class … Topics: the Minerva tool to explore the design space, prune/quantize, lower voltages Project prep
6 Mine-rva Keras Aladdin MadonnaOverview Keras Aladdin Madonna
7 Motivation
8 Project Proposal MADONNA : A tool for Measurement & Assistance in Design Of NN to its Architect Measurement Nvidia jetson TK-1 Assistance FPTuner Ristretto
9 Numeric Precision
10 Assistance
11 Measurement
12 Architecting a good Deep Learning ApplicaitonLow power User defined accuracy Best possible within power budget MAUD Our project, Framework for NN architect Principle: Measure As yoU Design
13 Hardware Jetson Embedded Platform Measurement hardware TopologyNVIDIA Jetson with GPU-accelerated parallel processing. Leading embedded visual computing platform. It features high-performance, low-energy computing for deep learning and computer vision Ideal for compute-intensive embedded projects like drones, autonomous robotic systems, mobile medical imaging, and Intelligent Video Analytics (IVA). OEMs, independent developers, Makers and hobbyists can use the NVIDIA Jetson TX1 to explore the future of embedded computing. Measurement hardware Yokogawa wt310 The WT300E series digital power analyzer Provides extremely low current measurement capability down to 50 micro-Amps, This instrument is ideal for engineers performing stand-by power measurements. Topology The head/gateway node is mir.cs.utah.edu This is the large tower computer on the floor. mir is then connected to the switch on the table. This switch is then connected to the nvidia jetson tk-1 boards (mir01, mir02,.... mir16).
14 nVIDIA Tegra, Jetson K1 Board
15 GPU based Accelerator Tegra K1 GPU NVIDIA® Kepler™ ArchitectureTEGRA K1 PROCESSOR SPECIFICATIONS - See more at: GPU based Accelerator Tegra K1 GPU NVIDIA® Kepler™ Architecture 192 NVIDIA CUDA® Cores CPU CPU Cores and Architecture NVIDIA 4-Plus-1™ Quad-Core ARM Cortex-A15 "r3" Max Clock Speed 2.3 GHz Memory Memory Type DDR3L and LPDDR3 Max Memory Size 8 GB (with 40-bit address extension) Display LCD 3840x2160 HDMI 4K (UltraHD, 4096x2160) Package Package Size/Type 23x23 FCBGA 16x16 S-FCCSP 15x15 FC PoP Process 28 nm
16 Workflow Design
17 Software Measurement Assistance eServer.exe (Backend)eNergy.py (Frontend) Assistance Caffe FPTuner Ristretto
18 Caffe: make –j 4 all 10 W 8690 Joules
19 Caffe: make clean 5 W 34 Joules
20 Performance Metrics (No free lunch)DOUBLE I :12: caffe.cpp:275] Batch 49, loss = I :12: caffe.cpp:280] Loss: I :12: caffe.cpp:292] accuracy = I :12: caffe.cpp:292] loss = (* 1 = loss) SINGLE I :02: caffe.cpp:275] Batch 49, loss = I :02: caffe.cpp:280] Loss: I :02: caffe.cpp:292] accuracy = I :02: caffe.cpp:292] loss = (* 1 = loss)
21 34K Joules 346 Joules 12K Joules 244 Joules6W 10 W 34K Joules 346 Joules Power Metrics 10W 5.5 W 12K Joules 244 Joules CIFAR 10
22 11K Joules 62 Joules 4K Joules 45 Joules8 W 4.5W 11K Joules 62 Joules 8 W 4 W 4K Joules 45 Joules LENET - MNIST
23 Next Steps Measurement Assistance Complete measurement studiesCaffeNet ImageNet Study and Report Impact of Precision on energy consumption Assistance Attempt implementing fixed point support in Ristretto / native Caffe Resume FPTuner addition in workFlow, aiming for automation