Physics Events
David Nader, Computer Science - Dissertation Defense
Location
Integrated Science Center (ISC), Room 3280540 Landrum Dr
Williamsburg, VA 23185Map this location
Abstract:
This final defense addresses achieving causal interpretability in Deep Learning for Software Engineering (DL4SE). Although Neural Code Models (NCMs) demonstrate promising performance in automating software engineering tasks, their lack of transparency on causal relationships between inputs and outputs hinders a complete understanding of their capabilities and limitations. Traditional associational interpretability, which focuses on identifying correlations, is deemed insufficient for tasks requiring interventions and understanding the impact of changes. To overcome this limitation, this presentation introduces doCode, a novel post hoc interpretability method specifically designed for NCMs. doCode leverages causal inference to provide programming language-oriented explanations of model predictions. It comprises a formal four-step pipeline: modeling a causal problem using Structural Causal Models (SCMs), identifying the causal estimand, estimating causal effects using metrics such as Average Treatment Effect (ATE), and refuting the effect estimates. The theoretical underpinnings of doCode are extensible, and a concrete instantiation is provided to mitigate the impact of spurious correlations by grounding explanations in properties of programming languages. A comprehensive case study on deep code generation across various interpretability scenarios and using different deep learning architectures demonstrates the practical benefits of doCode, revealing insights into the sensitivity of NCMs to changes in code syntax and their ability to learn certain programming language concepts with less confounding bias. Furthermore, this presentation explores the role of associational interpretability as a foundation, examining the causal nature of software information and using information theory with tools like COMET (a Hierarchical Bayesian Software Retrieval Model) and TraceXplainer to understand software traceability. The defense also emphasizes the importance of identifying code confounders for a more rigorous evaluation of DL4SE models, introducing the Galeras benchmark for causal evaluation in code intelligence. Finally, this presentation offers guidelines for applying causal interpretability to Neural Code Models, contributing a formal framework and practical insights towards building more reliable and trustworthy AI in Software Engineering.
Bio: David N. Palacio is a Ph.D. Candidate in Computer Science at William & Mary, where he is a member of the SEMERU Research Group, supervised by Dr. Denys Poshyvanyk. He received his MSc. in Computer Engineering at Universidad Nacional de Colombia (UNAL), Colombia, 2017. His research is concentrated on interpretable methods for deep learning code generators, specifically, on using causal inference to explain deep software models. His fields of interest lie in complexity science, neuroevolution, causal inference, and interpretable machine learning for the study and automation of software engineering processes. More information is available at [BIO].
Sponsored by: Computer Science