[PAST EVENT] Qingsen Wang, Computer Science - Ph.D. dissertation defense
Profilers are widely used for developers and system managers to monitor the performance, understand the behavior and exploit optimization opportunities of applications running on various systems. Programs that use hardware transactional memory (HTM) demand sophisticated performance analysis tools when they suffer from performance losses. Reuse distance---the number of distinct memory locations accessed between two consecutive accesses to the same location---is the de facto, machine-independent metric of data locality, which has a profound impact on program performance. However, existing tools to profile HTM or measure reuse distance cannot achieve high accuracy and low overhead at the same time.
In the first project, we have developed TxSampler---a lightweight profiler for programs that use HTM. TxSampler measures performance via sampling and provides a structured performance analysis to guide intuitive optimization with a novel decision-tree model. TxSampler computes metrics that drive the investigation process in a systematic way. It not only pinpoints hot transactions with time quantification of transactional and fallback paths, but also identifies causes of transaction aborts such as data contention, capacity overflow, false sharing, and problematic instructions. TxSampler associates metrics with full call paths that are even deeply embedded inside transactions and maps them to the program's source code. Our evaluation of more than 30 HTM benchmarks and applications shows that TxSampler incurs ~4% runtime overhead and negligible memory overhead for its insightful analyses. Guided by TxSampler, we are able to optimize several HTM programs and obtain nontrivial speedups.
In the second project, We develop RDX, a lightweight profiling tool for characterizing reuse distance in an execution. RDX typically incurs negligible time (5%) and memory (7%) overheads. RDX performs no instrumentation whatsoever but uniquely combines hardware performance counter sampling with hardware debug registers, both available in commodity CPU processors, to produce reuse-distance histograms. RDX typically has more than 90% accuracy compared to the ground truth. With the help of RDX, we are the first to characterize memory performance of long-running SPEC CPU2017 benchmarks.
Traditional profilers (such as the two aforementioned profilers) collect hardware or software events and usually attribute them to corresponding callstacks. However, they fail to provide more insights with regard to different inputs, threads or environments. In the third project, we develop SemProf, a lightweight framework of semantic profiling working on Java applications. SemProf can extract semantic information from running application according to one's needs and inject it to profiles. Aided with SemProf, users are able to gain a much more profound understanding of their developing applications.
Qingsen Wang is a Ph.D. candidate at William & Mary, supervised by Dr. Xu Liu. He received a Bachelor degree in Information Engineering from the Chinese University of Hong Kong in 2014 before going to William & Mary. His research focuses on developing metrics and tools to profile applications running on modern CPU architectures and identifying optimization opportunities for better efficiency.