A&S Graduate Studies
[PAST EVENT] Jialiang Tan, Computer Science - Dissertation Defense
Abstract: Software inefficiencies are inevitable in computer systems. At the code level, software packages have become increasingly complex, they are comprised of a large amount of source code, sophisticated control and data flow, and growing levels of abstraction. This complex often introduces inefficiencies across software stacks, leading to performance degradation. At the resource level, the evolution of hardware outpaces the performance optimization of software, leading to resource wastage and energy dissipation in emerging architecture. To better understand program behaviors, software developers take advantage of performance profiling tools. Existing profiling techniques, whether fine-grained profilers or coarse-grained profilers focus on identifying hotspots, which is the code region that consumes plenty of resources during program execution. Although hotspot analysis is useful, it hardly diagnoses whether a resource is being used in a productive manner of a program. As a result, developers need to make extra effort to decide whether a certain hotspot needs to be optimized. For this reason, to better perform program optimizations, we need tools that investigate resource wastage rather than resource usage.
In this dissertation, we perform program inefficiency detection from different perspectives. First, we study the inefficiency in compiler optimizations. We propose CIDetector, a fine-grained profiler that works on binary executables, detecting the compiler-introduced and compiler-missed inefficiencies. Through our analysis, we select 12 representative programs from different domains to form a dataset called CIBench. We perform the first study on compiler-related inefficiencies in fully optimized binary codes, it offers several insights that are valuable for scientific programmers, compiler writers, and tool developers. Moreover, we study the interaction (between Python code and native libraries) inefficiency in Python applications and extract two inefficiency patterns that are common in interaction inefficiencies. Based on these patterns, we propose PieProf, a lightweight profiler, to pinpoint interaction inefficiencies in Python applications. The principle of PieProf is to measure the inefficiencies in the native execution and associate inefficiencies with high-level Python code to provide a holistic view. Guided by PieProf, we optimize 17 real-world applications, yielding speedups up to 6.3x on the application level.
In the meantime, we notice the same program inefficiency patterns occur in students' codes. As instructors, we realized that the importance of code performance education to students can never be exaggerated. By exploring the pedagogical method and developing educational tools, we hope to understand and address the challenges that students have during programming. We report our experience of integrating VS Code into an introductory-level Python programming course, together with comprehensive guidance, it significantly balances the teaching resources and shortens the students' learning curves. Additionally, we propose ProTracker, an end-to-end solution to estimate the progress of programming assignments with machine learning techniques. ProTracker employs static analysis to extract features from assignment samples from previous semesters, then applied a two-level cross-validation method for tuning and selecting the proper machine-learning model. It runs as a VS Code extension and performs real-time programming progress estimation for students.
Bio: Jialiang Tan is a Ph.D. candidate in the Department of Computer Science at William & Mary. She is advised by Dr. Xu Liu. Her research lies in high-performance computing, program/software analysis, and CS education research. Her research appeared in ICS'20 and FSE'21. Previously, she received her Bachelor of Engineering in Information Security from Sichuan University, China, in 2015, and her Master of Science in Computer Science from Arizona State University in 2016.