W&M Featured Events
This calendar presented by
William & Mary
[PAST EVENT] Martin White, Computer Science - Ph.D. Candidate
March 18, 2016
9am - 10:30am
Abstract
Bridging the abstraction gap between artifacts and concepts is the essence of software engineering (SE) research problems. SE researchers regularly use machine learning to bridge this gap, but there are three fundamental issues with traditional applications of machine learning in SE research.
Traditional applications are too reliant on labeled data. They are too reliant on human intuition, and they are not capable of learning expressive yet efficient internal representations. Ultimately, SE research needs approaches that can automatically learn representations of massive, heterogeneous, in situ datasets, apply the learned features to a particular task and possibly transfer knowledge from task to task.
Improvements in both computational power and the amount of memory in modern computer architectures have enabled new approaches to canonical machine learning tasks. Specifically, these architectural advances have enabled machines that are capable of learning deep, compositional representations of massive data depots. The rise of deep learning has ushered in tremendous advances in several fields, and given the complexity of software repositories we presume deep learning has the potential to usher in new analytical frameworks and methodologies for SE research and the practical applications it reaches.
The proposal is to examine and enable deep learning algorithms in different SE contexts. Our prior work demonstrated that deep learners significantly outperformed state-of-the-practice software language models at code suggestion on a Java corpus. Further, our deep learners automatically induced semantic representations for lexical elements. Our current work uses these semantic representations to transmute in situ source code into structures for detecting similar code fragments at different levels of granularity without declaring features for how the source code is to be represented. Indeed, our work aims to move SE research from the art of feature engineering to the science of automated discovery.
Bio
Martin White is a Ph.D. candidate in computer science. His research concerns SE and machine learning. Before joining William & Mary, Martin earned a B.S. in Mathematics with a Minor in Physics; an M.S. in Computational and Applied Mathematics; an M.E. in Modeling and Simulation.
Martin is also a Data Scientist with Booz Allen Hamilton in the firm's Cloud Analytics and Data Science Functional Community supporting a number of Department of Defense clients.
Bridging the abstraction gap between artifacts and concepts is the essence of software engineering (SE) research problems. SE researchers regularly use machine learning to bridge this gap, but there are three fundamental issues with traditional applications of machine learning in SE research.
Traditional applications are too reliant on labeled data. They are too reliant on human intuition, and they are not capable of learning expressive yet efficient internal representations. Ultimately, SE research needs approaches that can automatically learn representations of massive, heterogeneous, in situ datasets, apply the learned features to a particular task and possibly transfer knowledge from task to task.
Improvements in both computational power and the amount of memory in modern computer architectures have enabled new approaches to canonical machine learning tasks. Specifically, these architectural advances have enabled machines that are capable of learning deep, compositional representations of massive data depots. The rise of deep learning has ushered in tremendous advances in several fields, and given the complexity of software repositories we presume deep learning has the potential to usher in new analytical frameworks and methodologies for SE research and the practical applications it reaches.
The proposal is to examine and enable deep learning algorithms in different SE contexts. Our prior work demonstrated that deep learners significantly outperformed state-of-the-practice software language models at code suggestion on a Java corpus. Further, our deep learners automatically induced semantic representations for lexical elements. Our current work uses these semantic representations to transmute in situ source code into structures for detecting similar code fragments at different levels of granularity without declaring features for how the source code is to be represented. Indeed, our work aims to move SE research from the art of feature engineering to the science of automated discovery.
Bio
Martin White is a Ph.D. candidate in computer science. His research concerns SE and machine learning. Before joining William & Mary, Martin earned a B.S. in Mathematics with a Minor in Physics; an M.S. in Computational and Applied Mathematics; an M.E. in Modeling and Simulation.
Martin is also a Data Scientist with Booz Allen Hamilton in the firm's Cloud Analytics and Data Science Functional Community supporting a number of Department of Defense clients.