[PAST EVENT] Haonan Wang, Computer Science - Oral Exam
Graphics Processing Unit (GPU)-based architectures have become the default accelerator choice for a large number of data-parallel applications because they are able to provide high compute throughput at a competitive power budget. Unlike CPUs which typically have limited multi-threading capability, GPUs execute large numbers of threads concurrently to achieve high thread-level parallelism (TLP). While the computation of each thread requires its corresponding data to be loaded from or stored to the memory, the key to supporting the high TLP of GPUs lies in the high bandwidth provided by the GPU memory system. However, with the continuous scaling of GPUs, the challenges of designing an efficient GPU memory system have become two-fold. On one hand, to keep the growing compute and memory resources highly utilized, co-locating two or more application kernels on the GPU has become an inevitable trend. One of the major roadblocks in achieving the maximum benefits of multi-application execution is the difficulty to design mechanisms that can efficiently and fairly manage the application interference in the shared caches and the main memory. On the other hand, to maintain the continuous scaling of GPU performance, the increasing energy consumption of the memory system has become a major problem because of its limited power budget. This limitation of the GPU memory energy restricts its maximum theoretical bandwidth and in turn limits the overall throughput.
To address the aforementioned challenges, this dissertation proposes three different approaches. First, this dissertation shows that high efficiency and fairness can be achieved in the GPU multi-programming environment with novel TLP management techniques. We propose a new metric, effective bandwidth (EB), to accurately estimate the shared resources in the GPU memory hierarchy. Meanwhile, we propose a pattern-based searching scheme (PBS) that can quickly and accurately achieve efficiency and fairness via managing the TLP of each application. Second, to reduce data movement and improve GPU throughput, this dissertation develops Address-Stride Assisted Approximate Value Predictor (ASAP) for GPUs. We show that by utilizing address stride and value stride correlation present in GPGPU applications, significant data movement reduction and throughput improvement can be achieved at a much lower application quality loss and hardware overhead. ASAP achieves this by predicting load values if it detects strides in their corresponding addresses. Third, this dissertation shows that GPU memory energy consumption can be significantly reduced by utilizing novel memory scheduling techniques. We propose a lazy memory scheduler which significantly improves the row buffer locality of GPU memory by leveraging the latency and error tolerance of GPGPU applications. Finally, our future work targets data movement reduction with flexible data precisions. We plan to develop architectural support and novel data types to dynamically reduce the data size transferred per each memory operation. Altogether, this dissertation develops several innovative techniques to improve the GPU memory system efficiency, which are necessary for enabling the development of next-generation GPUs.
Haonan Wang is a Ph.D. Candidate in the Department of Computer Science at William & Mary. He is advised by Professor Adwait Jog. Haonan’s main research interest lies in the area of GPU Architecture. He is also interested in several architecture-related topics such as memory systems, approximate computing, hardware scheduling, machine learning, and GPU security. His research has been published in major computer architecture conferences (HPCA, DSN, ICS). He received his BS degree in Computer Science at East China University of Science and Technology. He received his MS degree in Computer Science at William & Mary.