[PAST EVENT] Towards 1,000X model compression in Deep Neural Networks

March 7, 2019
2pm - 3pm
McGlothlin-Street Hall, Room 020
251 Jamestown Rd
Williamsburg, VA 23185Map this location

Speaker: Yanzhi Wang, Northeastern University

Title: Towards 1,000X model compression in Deep Neural Networks


Hardware implementation of deep neural networks (DNNs) with emphasis on performance and energy efficiency has been the focus of extensive ongoing investigations. When large DNNs are mapped to hardware as an inference engine, the resulting hardware suffers from expensive hardware computations and frequent accesses to off-chip DRAM memory, which in turn result in significant performance and energy overheads. To overcome this hurdle, we develop ADMM-NN, an algorithm-hardware co-optimization framework for greatly reducing DNN computation and storage requirements by incorporating Alternating Direction Method of Multipliers (ADMM) and utilizing all redundancy sources in DNN. ADMM-NN includes: (i) development of a framework of joint pruning and quantization of DNN weights, and (ii) development of a unified framework for utilizing all sources of redundancy in DNN inference engines, enabling joint pruning and quantization of both weights and intermediate result in a DNN. Our preliminary results show that ADMM-NN can achieve the highest degree of model compression on representative DNNs. For example, we can achieve 246X, 32X, 34X, and 17X weight reduction on LeNet-5, AlexNet, VGGNet, and ResNet-50, respectively, with (almost) no accuracy loss. We achieve a maximum of 1,910X weight data storage reduction when combining weight pruning and weight quantization, while maintaining accuracy.


Yanzhi Wang is currently an assistant professor in the Department of Electrical and Computer Engineering at Northeastern University. He has received his Ph.D. Degree in Computer Engineering from University of Southern California (USC) in 2014, and his B.S. Degree with Distinction in Electronic Engineering from Tsinghua University in 2009.

Dr. Wang's current research interests are the energy-efficient and high-performance implementations of deep learning and artificial intelligence systems, as well as the integration of security protection in deep learning systems. His works have been published in top venues in conferences and journals (e.g. ASPLOS, MICRO, HPCA, ISSCC, AAAI, ICML, CVPR, ICLR, ECCV, ACM MM, CCS, VLDB, FPGA, DAC, ICCAD, DATE, LCTES, INFOCOM, ICDCS, Nature SP, etc.), and have been cited for around 4,000 times according to Google Scholar. He has received four Best Paper Awards, has another seven Best Paper Nominations and two Popular Papers in IEEE TCAD. His group is sponsored by the NSF, DARPA, IARPA, AFRL/AFOSR, and industry sources.


Bin Ren