[PAST EVENT] Probir Roy, Computer Science - Ph.D. dissertation defense
The complexity of computational systems is increasing. The gap of the user's knowledge about the application's behavior on the underlying hardware architectures is widening. As a result, designing scalable software/hardware is becoming more challenging. One of the significant challenges to high performance is due to the memory overhead of parallel architectures. Since memory is increasingly becoming a burden in compute-systems, we need to improve memory/cache utilization, reduce conflicts in shared memory/cache, reduce unnecessary data movements between CPU-cores, and reduce unnecessary data access. However, the developers need guidance and deep insight to identify these inefficiencies and optimization opportunities in their applications.
However, pinpointing performance issues and providing insightful guidance in real-world applications are often challenging. While static analysis fails to address the dynamic nature of the applications, existing runtime analysis tools typically leverage heavy-weight memory instrumentations, which hinders the applicability of these tools for real long-running programs. Alternatively, measurement-based techniques impose less overhead. These profilers collect raw performance data from the underlying hardware architecture and pinpoint various data-locality issues and computational inefficiency in the application. However, identifying the complex performance issues such as cache miss-classification or high-level optimization guidance such as structure splitting from raw performance data are challenging and yet to explore.
This dissertation addresses these performance challenges using the lightweight hardware-based PMUs to monitor memory footprint and perform statistical analysis to provide deep insight. This dissertation makes five contributions. First, it presents a novel tool StructSlim to guide code optimization by structure splitting. Structure splitting, a code transformation technique, can significantly improve memory locality. Second, the dissertation presents a lightweight performance tool, named SMTAnalyzer, to identify contention in simultaneous multi-threading (SMT) threads and guide better scheduling policy for optimization. Third, it describes CCProf, a lightweight measurement-based profiler that identifies conflict cache misses. Additionally, CCProf associates the conflict cache misses with program source code and data structures for further optimization guidance. Fourth, it describes non-uniform memory architecture (NUMA)-aware multi-solver based CNN design, named NUMA-Caffe, for accelerating deep learning neural networks on multi- and many-core CPU architectures. Finally, it studies the performance bugs in the Linux kernel caused by weak data structures and algorithm choice. Additionally, it presents Kognizance, a tool to pinpoint program and system-level inefficiency of the Linux kernel at a low overhead.
Probir Roy is a Ph.D. candidate in the Department of Computer Science at William & Mary. He is advised by Professor Xu Liu. His research interests include program analysis, high-performance computing, and operating system. Common threads of his research are to develop tools and techniques to improve software performance and developer productivity. He received his B.S. in Computer Science and Engineering from Bangladesh University of Engineering and Technology in 2009.