[PAST EVENT] Wei Niu, Computer Science - Dissertation Proposal

January 27, 2023
10am - 12pm
McGlothlin-Street Hall, Zoom
251 Jamestown Rd
Williamsburg, VA 23185Map this location

Deep Learning, especially in the form of Deep Neural Networks (DNNs), has enabled remarkable breakthroughs in many scenarios over the past few years, including autonomous driving, natural language processing, extended reality (XR), medical diagnosis, and view synthesis. With their power-efficient and specialized processors and real-time scenario suitability, mobile (and edge) devices are becoming the primary carriers for these emerging applications. Recently, as a result of AutoML tools (e.g., Network Architecture Search) and other training-related advancements, DNNs are designed to be deeper with increasingly complex structures and larger computation sizes. Given these ever-growing demands for computation, real-time DNN execution (inference) is an ideal but extremely difficult objective for mobile devices due to the limited computing and storage resources within embedded chips. In addition, there is still a significant gap between the peak and actual performance of DNN workloads on mobile devices. This is due to a number of factors, including a lack of understanding between hardware and parallel algorithms and legacy solutions that cannot handle the newest hardware very well.

To that end, this dissertation seeks to support real-time DNN execution on mobile devices through a variety of compiler-based innovations, including three novel optimizations: 1) compression and compilation co-design for computation-intensive models; 2) advanced operator fusion for memory-intensive models; and 3) global optimization for emerging power-efficient processors.

For the first optimization, we present PatDNN, a novel compression-compilation co-design framework that enables compressed large-scale (computationally intensive) DNNs to fit within the limited storage and computation resources of mobile devices. We first propose a novel, hardware-friendly pattern-based pruning scheme for compressing DNN model parameters. Then, we propose a set of sophisticated compiler optimizations that are tailored for pattern-based pruning and allow to compile compressed models into source code for mobile CPU and GPU, achieving significant speedup (up to 44.5x) over existing frameworks. The results demonstrate for the first time that real-time inference of representative large-scale DNNs (e.g., VGG-16, ResNet-50) can be performed on mobile devices.

For extremely deep neural networks (memory-intensive), we propose DNNFusion, an advanced operator fusion framework that can fuse multiple successive operators within a DNN into a single fused operator, significantly reducing the number of memory accesses and arithmetic operations. Other than the fixed-pattern-matching fusion strategy, DNNFusion is a flexible and extensive operator fusion framework. In addition, to optimize DNN computation, we implement a novel mathematical-property-based graph rewriting framework. DNNFusion has been thoroughly tested on 15 DNN models with varying task types, model sizes, and layer counts. The results show that DNNFusion outperforms other cutting-edge frameworks by up to 9.3x and allows for many of the target models to be executed on mobile devices and even as part of a real-time application.

For the emerging dedicated accelerators within mobile SOCs, we propose GCD2. GCD2 is aimed at mobile Digital Signal Processors (DSPs), which have much more complex SIMD instruction sets with larger width and variety of instructions than mainstream processors. With our unique compiler optimizations, we fully exploit the special features exposed by mobile DSPs and improve hardware utilization. These enhancements are incorporated into a full compilation system that is extensively tested against other systems using 10 large DNN models.

In continuation of my existing research on real-time machine learning systems, I will explore a broader scope of embedded platforms and related areas. My overall goal is to push the boundaries of mobile (and parallel) computing to create efficient solutions for emerging architectures and applications. In particular, the ongoing project will focus on accelerating the dynamic neural networks on mobile devices.

Wei Niu is a fifth-year Ph.D. candidate in the Department of Computer Science at William & Mary under the supervision of Professor Bin Ren. Wei's research interests lie in real-time machine learning systems, mobile computing, parallel computing, and compilers. In particular, he focuses on achieving real-time DNN execution on mobile platforms with compiler optimizations. His work has appeared at top conferences (e.g., MICRO, PLDI, ASPLOS, RTAS, ICS, DAC, NeurIPS, CVPR, AAAI, ECCV, ICCV) and top journals (e.g., TPAMI, CACM). He is the recipient of the Stephen K. Park Graduate Research Award at William & Mary. He also won first place in the 2020 ISLPED Design Contest, the CACM Contributed Article Award in 2021, and the Best Paper Award at an ICLR workshop in 2021. Previously, he earned his bachelor's degree from Beihang University in 2016.