[PAST EVENT] Pengfei Su, Computer Science - Oral Exam [Zoom]
Production software packages have become increasingly complex with millions of lines of code, sophisticated data and control flows, and references to a hierarchy of external libraries. This complexity often introduces performance inefficiencies across the software stack, making it impractical for users to pinpoint them manually.
Performance profiling tools (a.k.a. profilers) abound in the tools community to aid software developers in understanding their program behavior. Classical profiling techniques focus on identifying hotspots. The hotspot analysis is indispensable; however, it can hardly diagnose whether a resource is being used in a productive manner that contributes to the overall efficiency of a program. Hence, a significant burden is on the developer to make a judgment call on whether there is scope to optimize hotspots. Derived metrics, e.g., Cycles-Per-Instruction (CPI), cache miss ratio, offer slightly better intuition into hotspots but are still not panaceas. Hence, there is a need for tools that investigate resource wastage instead of resource usage.
To overcome the critical missing pieces, we have developed several fine-grained and coarse-grained profilers to pinpoint varieties of performance inefficiencies and provide optimization guidance for a wide range of software covering benchmarks, enterprise software, and large-scale parallel applications running on supercomputers and data centers.
Fine-grained profilers. Fine-grained profilers are indispensable to understand performance inefficiencies comprehensively. We propose a whole-program profiler called LoadSpy, which works on binary executables to detect and quantify wasteful memory operations in their context and scope. Based on numerous case studies, we observe that wasteful memory operations are often an indicator of various forms of performance inefficiencies, such as suboptimal choices of algorithms or data structures, missed compiler optimizations, and developers’ inattention to performance. Guided by LoadSpy, we are able to optimize a large number of well-known benchmarks, e.g., SPEC CPU2017 benchmarks, and real-world applications, e.g., Apache Avro, yielding significant speedups.
Coarse-grained profilers. Despite deep performance insights offered by fine-grained profilers, the high overhead keeps them away from widespread adoption, particularly in production. By contrast, coarse-grained profilers introduce low overhead at the cost of poor performance insights. So, another research topic is how to benefit from both, that is, the combination of the low overhead of coarse-grained profiling and deep insights of fine-grained profiling. The effort to do so is proposing a lightweight profiler called JXPerf. JXPerf abandons heavyweight instrumentation by combining hardware performance monitoring units and debug registers available in commodity CPUs to detect wasteful memory operations. Compared with LoadSpy, JXPerf reduces runtime overhead from 10x to 7% on average. The lightweight nature makes it useful and practical in production.
Pengfei Su is currently working as a Ph.D. candidate under the supervision of Prof. Xu Liu in the Department of Computer Science at William & Mary. His research interests lie in program analysis and high-performance/parallel computing, with a focus on providing tools support for analyzing and optimizing software inefficiencies. His papers received the Best Paper Award at PPoPP '19 and the Distinguished Paper Award at ICSE '19. His tools have received broad interests in supercomputing centers such as TACC, and industries such as Uber, Google, and Facebook. One of his tools for Golang profiling is under code review for upstreaming to the official Golang repository. He received a B.S. degree from Yunnan University in 2013 and an M.S. degree from The University of Chinese Academy of Sciences in 2016.