Physics Events
[PAST EVENT] Gurunath Kadam, Computer Science - PhD Dissertation Defense [Zoom]
Abstract:
In recent years, Graphics Processing Units (GPUs) have become a de facto choice to accelerate the computations in various domains such as machine learning, security, financial and scientific computing. GPUs leverage the inherent data parallelism in the target applications to provide high throughput at superior energy efficiency. Due to the rising usage of GPUs for a large number of applications, they are facing new challenges, especially in the security and reliability domains. From the security side, recently several microarchitectural attacks targeting GPUs have been demonstrated. These attacks leak the secret information stored on GPUs, for example, the parameters of a neural network (NN) model and the user private information. From the reliability side, new innovations to improve GPU memory systems are making them more susceptible to errors. My dissertation research focuses on addressing these security and reliability challenges in GPUs while minimizing the associated overhead of the proposed protection mechanisms.
To improve GPU security, we focus on the previously demonstrated correlation timing attack. Such an attack exploits the deterministic nature of the coalescing mechanism in GPUs to correlate the execution time and the number of accesses. Consequently, an attacker can recover the encryption keys stored on GPUs. Therefore, to counter the correlation timing attack, we first introduce a randomized coalescing defense scheme (RCoal). RCoal randomizes the coalescing logic such that the attacker fails to correlate the execution time and the number of accesses. As a result, RCoal thwarts the correlation timing attack. Next, we propose a bucketing based coalescing defense scheme, BCoal, which minimizes the variation in the number of memory accesses by generating a predetermined number (called buckets) of memory accesses. With low variation in the number of memory accesses, the attacker cannot correlate the application execution time and the secret information, thus failing the correlation timing attack. BCoal generates less memory traffic than RCoal and, therefore, is performance efficient.
To improve GPU reliability, we address the data memory errors in GPU caches and DRAM. Existing reliability mechanisms of redundancy and check-pointing fail to scale with the increasing memory/computational demands on GPUs and quickly become impractical. To address this problem, we study a wide range of applications to find that a very small fraction of the data memory is most vulnerable to errors. This small fraction of the data is not only highly accessed but also highly shared across GPU threads. Consequently, we propose and develop two reliability schemes to detect-only and to detect/correct this most vulnerable data while incurring low overhead. Our future work will focus on improving the reliability of machine learning applications.
Bio:
Gurunath Kadam is a PhD Candidate in the Department of Computer Science at William & Mary. His PhD advisor is Prof. Adwait Jog. His research focuses on the security and reliability of emerging computing systems and general-purpose accelerators, such as GPUs. His PhD research was published in HPCA 2018 and HPCA 2020, and a new paper is accepted to appear at DSN 2021. Previously, he received his Bachelor of Engineering in Electrical Engineering from the University of Mumbai, India in 2006. He received his Master of Science in Information and Communication Engineering from the Technical University of Darmstadt, Germany in 2012. He worked as a research intern at Intel Labs, OR in the fall of 2018.