W&M Featured Events
[PAST EVENT] Qihan Wang, Computer Science - Dissertation Defense
Abstract
With the enlarging computation capacity of general Graphics Processing Units (GPUs), leveraging GPUs to accelerate parallel applications has become a critical topic in academia and industry. However, many irregular applications with a computation-/memory-intensive nature cannot easily achieve high GPU utilization. The challenges mainly involve the following aspects: first, data dependence leads to a coarse-grained kernel; second, heavy GPU memory usage may cause frequent memory evictions and extra overhead of I/O; third, specific computation patterns produce memory redundancies; last, workload balance and data reusability conjunctly benefit the overall performance, but there may exist a dynamic trade-off between them.
Targeting these challenges, this dissertation proposes multiple optimizations to accelerate parallel irregular applications on GPU architectures. The dissertation focuses on two real-world applications as case studies: one is calculating many-body correlation functions in a large-scale scientific system; and the other one is the eALS-based matrix factorization recommendation system.
To accelerate the calculations of many-body correlation functions, this dissertation presents three frameworks in GPU memory management and multi-GPU scheduling. Firstly, an optimized systematic GPU memory management framework, MemHC, utilizes a series of new memory reduction designs in GPU memory allocation, CPU/GPU communications, and GPU memory oversubscription. MemHC employs duplication-aware management and lazy release of GPU memories to corresponding host managing for better data reusability. Moreover, MemHC designs a novel eviction policy called Pre-Protected LRU (Least Recently Used) to reduce evictions and increase memory hits. Secondly, an enhanced multi-GPU scheduling framework, MICCO, particularly by taking both data dimension (e.g., data reuse and data eviction) and computation dimension into account. This work first performs a comprehensive study on the interplay of data reuse and load balance and brings up two new concepts: local reuse pattern and reuse bound for the optimal trade-off between them. Based on this study, MICCO designs a heuristic scheduling algorithm and a machine-learning-based regression model to generate the optimal setting of reuse bounds. Thirdly, a locality-aware multi-GPU scheduling framework. This scheduler leverages pipeline batch generation with a looking-ahead strategy. The scheduler builds local dependency graphs based on locality analysis to reorganize input data for memory transfer reduction and better data reuse, achieving up to 79.92% memory cost reduction and 1.67x speedup.
To parallelize the eALS-based recommendation system, this dissertation proposes an efficient CPU/GPU heterogeneous recommendation system, HEALS. HEALS employs newly designed architecture-adaptive data formats to achieve load balance and good data locality on CPU and GPU. To mitigate the data dependence, HEALS presents a CPU/GPU collaboration model for both task parallelism and data parallelism. Additionally, HEALS optimizes this collaboration model with kernel-communication overlapping and dynamic workload partition. HEALS also applies various kernel parallel techniques for better GPU utilization: loop unrolling, vectorization, and warp reduction.
In summary, this dissertation efficiently accelerates two typical irregular applications on GPUs by building four frameworks, including CPU/GPU collaboration, GPU memory management, and multi-GPU scheduling.
Bio:
Qihan Wang is a Ph.D. Candidate in the Department of Computer Science at William & Mary. Her Ph.D. advisor is Prof. Bin Ren. Her research interests mainly include High Performance Computing, GPU architectures, and machine learning. Her Ph.D. research works have been accepted by IPDPS 2022, TACO 2021, HiPC 2021, and Smart Health 2020. Previously, she received her Bachelor of Software Engineering at Beihang University in 2017.