[PAST EVENT] Hongyuan Liu, Computer Science - Dissertation Proposal [Zoom]
Big-data era has brought new challenges to computer architectures due to the large-scale computation and data. Moreover, this problem becomes critical in several domains where the computation is also irregular, among which we focus on automata processing in this dissertation proposal. Automata are widely used in applications from different domains such as network intrusion detection, machine learning, and parsing. Large-scale automata processing is challenging for traditional von Neumann architectures. To this end, many accelerator prototypes have been proposed. Micron's Automata Processor (AP) is an example. However, as a spatial architecture, it is unable to handle large automata programs without repeated reconfiguration and re-execution. We found a large number of automata states are never-enabled in the execution but still configured on the AP chips, leading to its underutilization. To address this issue, we proposed a lightweight offline profiling technique to predict the never-enabled states and keep them out of the AP. Furthermore, we develop SparseAP, a new execution mode for AP to handle the misprediction efficiently. Our software and hardware co-optimization obtains 2.1X speedup over the baseline AP execution across 26 applications.
Since the Automata Processor is not publicly available, we aim to reduce the performance gap between a general-purpose accelerator---Graphics Processing Unit (GPU) and AP. We identify excessive data movement in the GPU memory hierarchy and propose optimization techniques to reduce the data movement. Although our optimization techniques significantly alleviate these memory-related bottlenecks, a side effect of them is the static assignment of work to cores. This leads to poor compute utilization, where GPU cores are wasted on idle automata states. Therefore, we propose a new dynamic scheme that effectively balances compute utilization with reduced memory usage. Our combined optimizations provide a significant improvement over the previous state-of-the-art GPU implementations of automata. Moreover, they enable current GPUs to outperform the Automata Processor across several applications while performing within an order of magnitude for the rest of the applications.
The future work targets to bridge the gap between general-purpose architectures and domain-specific architectures for more domains with irregular computation. Specifically, we will explore to overcome the programmability and scalability issues of GPU unified memory for large-scale graph analytics workloads via machine learning-based page management approaches.
Hongyuan Liu is a Ph.D. candidate in the Department of Computer Science at William & Mary advised by Prof. Adwait Jog. His research interests lie in the broad area of computer architecture, with an emphasis on domain-specific architectures and GPUs. His Ph.D. research was published in MICRO 2018 and ASPLOS 2020. Before joining William & Mary, he received his B.Eng degree from Shandong University in 2013 and an M.Sc degree from the University of Hong Kong in 2016. He was a software engineer at Baidu from 2013 to 2014 and worked as a software engineer intern at Intel in the Fall of 2019.