[PAST EVENT] Zhen Peng, Computer Science - PhD Defense
Graph processing is at the heart of many modern applications where graphs are used as the basic data structure to represent the entities of interest and the relationships between them. Improving the performance of graph-based applications, especially using parallelism techniques, has drawn significant interest both in academia and industry. On the one hand, modern CPU architectures are able to provide massive computational power by using sophisticated memory hierarchy and multi-level parallelism, including thread-level parallelism, data-level parallelism, etc. On the other hand, graph processing workloads are notoriously challenging for achieving high performance due to their irregular computation pattern and unpredictable control flow. Therefore, how to accelerate the performance of graph-based applications using parallelism is still an open question.
This dissertation focuses on providing high performance for graph-based applications. To take full advantage of multi-level parallelism resources provided by CPUs, this dissertation studies the characteristics of graph-based applications and matches their parallel solutions with the underlying hardware via algorithm and system co-design.
This dissertation divides graph-based applications into three categories: typical graph algorithms, sequential graph-based applications, and applications with graph-based solutions. The first category comprises typical graph algorithms with available parallel solutions. This dissertation presents the insight that the whole graph processing system stack, including data representation, the execution model, and job scheduling, should match the features of the hardware. This dissertation proposes GraphPhi as a new approach to graph processing on emerging Intel Xeon Phi-like architectures. GraphPhi employs hierarchical data representation to enhance the data locality. GraphPhi also has a hybrid vertex-centric and edge-centric execution model and a uniform scheduler integrated with thread-level and data-level parallelism to improve its performance.
The second category includes specialized graph applications without nontrivial parallel solutions. This dissertation studies a state-of-the-art 2-hop labeling approach named Pruned Landmark Labeling (PLL), which is used to solve the shortest path distance problem for large graphs. PLL imposes a control-flow dependency among each graph traversal iteration, which reduces its algorithmic complexity but becomes the major obstacle to enabling parallelism. This dissertation re-designs PLL from a parallel processing perspective and proposes the algorithm named Batched Vertex-Centric PLL (BVC-PLL), which breaks PLL's inherent dependencies and parallelizes it in a scalable way.
The third category includes applications that rely on graph-based solutions. This dissertation studies the sequential search algorithm for the graph-based indexing methods used for the Approximate Nearest Neighbor Search (ANNS) problem. The sequential search algorithm employs best-first traversal along with the underlying graph index to search nearest neighbors for given queries. This dissertation provides an in-depth examination of the challenges in parallelizing the similarity search algorithm. It explores whether the similarity graph search is robust to deviation from maintaining strict order by allowing multiple walkers to advance the search frontier simultaneously. Based on the insights, this dissertation proposes Speed-ANN, a parallel similarity search algorithm that reveals hidden intra-query parallelism to accelerate the search speed while fulfilling the high accuracy requirement.
Moreover, this dissertation further explores the optimization opportunities for computational graph-based deep neural network inference running on tiny devices, specifically microcontrollers (MCUs). This dissertation provides preliminary results of using lightweight quantization, loop unrolling, and off-chip memory. Altogether, this dissertation studies graph-based applications and improves their performance by providing solutions of multi-level parallelism via algorithm and system co-design to match them with the underlying multi-core CPU architectures.
Bio: Zhen Peng is a Ph.D. Candidate in the Department of Computer Science at William & Mary under the supervision of Professor Bin Ren. Zhen's research interests lie in parallel computing with an emphasis on the design and optimization of parallel solutions for graph-based applications. His Ph.D. research has been published in PACT 2018 and ICS 2020. Before joining William & Mary, he received his bachelor's and master's degrees at Huaqiao University, China in 2013 and 2016, respectively.