[PAST EVENT] Colloquium: Performance Analysis of Program Execution on Modern Parallel Architectures

February 28, 2014
8am - 9am
Location
McGlothlin-Street Hall, Room 020
251 Jamestown Rd
Williamsburg, VA 23185Map this location
Xu Liu, Rice University


Abstract

Modern parallel architectures have complex features: deep memory hierarchies, many hardware threads, and heterogeneous cores. Without careful design, programs may have poor performance due to excessive data movement in the memory hierarchy, load imbalance, over-synchronization between threads, or an inefficient workload partition between heterogeneous cores. Therefore, it is necessary to tune programs to achieve good performance on emerging architectures. However, tuning a program with dozens of source files, hundreds of functions, and millions of lines is difficult in practice. To address this problem, we developed an open-source performance tool -- HPCToolkit to guide parallel program optimization. My dissertation research extends HPCToolkit with unique capabilities, enabling it to analyze programs using massive threads, identify memory layout bottlenecks, and provide guidance for partitioning data to avoid unnecessary long-latency memory accesses in multi-socket node architectures. HPCToolkit analyzes fully optimized binary code without any need for source code recompilation, compiler support, or advance knowledge of the code. All of our measurement and analysis techniques apply to programs running within or across nodes on large-scale clusters. To demonstrate the utility of the aforementioned enhancements to HPCToolkit, we used it to analyze sophisticated parallel programs and gained insights that enabled us to significantly improve their performance.


Bio

Xu Liu is a Ph.D. candidate in the Dept. of Computer Science at Rice University. His research interests are parallel computing, compilers, performance analysis, and modeling. Xu has been working on an open-source performance tool -- HPCToolkit, which is world-widely used at universities, national laboratories, and in industry. He is a member of the OpenMP tools subcommittee, working to define the OMPT API for inclusion in the OpenMP language standard. Xu received HPC fellowships from NAG, Schlumberger, and BP while a Ph.D. student at Rice. He received Samsung's Award of Excellence for work during his 2013 internship at Samsung Austin Research Center.