W&M Featured Events
This calendar presented by
William & Mary
[PAST EVENT] Master's Thesis Defense: Ziyu Guo
July 15, 2011
4:30pm
With the industry rapidly transiting into multicore/manycore era, heterogeneous systems will be the mainstream in the foreseeable future, and thus require a highly versatile compilation framework that is able to generate efficient code for different architectures in the system from a single version of source code. However, the device-specific programming models on these devices make such translation difficult. A prominent exhibition of the difficulty exists in the compilation of fine-grained SPMD-threaded code (e.g., GPU CUDA code) for multicore CPUs.
In this thesis we propose a reference level dependence analysis algorithm to reveal the relationships between the correctness and performance of the translated program and the dependencies introduced by implicit synchronizations. Based on the analysis result we present several low-overhead extensions to previous GPU-CPU compilation schemes with guaranteed correctness and improved performance. To utilize the instance-level dependence information, we propose thread-level dependence graph (TLDG) , which leads to a method that enables fine-grained treatment to both implicit and explicit synchronizations, and reveals redundant computation at the instruction-instance level. We then present an automatic framework that performs such treatment on GPU code.
Together, the dependence analysis and code generation schemes form a complete solution to the problem of GPU-to-CPU translation of synchronizations for the first time. The methods presented in this thesis can act as basis for treating other device-specific intrinsics, and is critical for the whole-system synergy in heterogeneous systems.
In this thesis we propose a reference level dependence analysis algorithm to reveal the relationships between the correctness and performance of the translated program and the dependencies introduced by implicit synchronizations. Based on the analysis result we present several low-overhead extensions to previous GPU-CPU compilation schemes with guaranteed correctness and improved performance. To utilize the instance-level dependence information, we propose thread-level dependence graph (TLDG) , which leads to a method that enables fine-grained treatment to both implicit and explicit synchronizations, and reveals redundant computation at the instruction-instance level. We then present an automatic framework that performs such treatment on GPU code.
Together, the dependence analysis and code generation schemes form a complete solution to the problem of GPU-to-CPU translation of synchronizations for the first time. The methods presented in this thesis can act as basis for treating other device-specific intrinsics, and is critical for the whole-system synergy in heterogeneous systems.