[PAST EVENT] Qingsen Wang, Computer Science - Oral Proposal
Programs that use hardware transactional memory (HTM) demand sophisticated performance analysis tools when they suffer from performance losses. Reuse distance---the number of distinct memory locations accessed between two consecutive accesses to the same location---is the de facto, machine-independent metric of data locality, which has a profound impact on program performance. However, existing tools to profile HTM or measure reuse distance cannot achieve high accuracy and low overhead at the same time.
In the first project, we have developed TxSampler---a lightweight profiler for programs that use HTM. TxSampler measures performance via sampling and provides a structured performance analysis to guide intuitive optimization with a novel decision-tree model. TxSampler computes metrics that drive the investigation process in a systematic way. It not only pinpoints hot transactions with time quantification of transactional and fallback paths, but also identifies causes of transaction aborts such as data contention, capacity overflow, false sharing, and problematic instructions. TxSampler associates metrics with full call paths that are even deeply embedded inside transactions and maps them to the program's source code. Our evaluation of more than 30 HTM benchmarks and applications shows that TxSampler incurs ~4% runtime overhead and negligible memory overhead for its insightful analyses. Guided by TxSampler, we are able to optimize several HTM programs and obtain nontrivial speedups.
In the second project, We develop RDX, a lightweight profiling tool for characterizing reuse distance in an execution. RDX typically incurs negligible time (5%) and memory (7%) overheads. RDX performs no instrumentation whatsoever but uniquely combines hardware performance counter sampling with hardware debug registers, both available in commodity CPU processors, to produce reuse-distance histograms. RDX typically has more than 90% accuracy compared to the ground truth. With the help of RDX, we are the first to characterize memory performance of long-running SPEC CPU2017 benchmarks.
Qingsen Wang is a Ph.D. candidate at William & Mary, supervised by Dr. Xu Liu. He received a Bachelor degree in Information Engineering from the Chinese University of Hong Kong in 2014 before going to William & Mary. His research focuses on developing metrics and tools to profile applications running on modern CPU architectures and identifying optimization opportunities for better efficiency.