W&M Featured Events
[PAST EVENT] Nathan Cooper, Computer Science - Dissertation Proposal
Abstract:
Software has literally eaten the world with many of the necessities and quality of life services people use requiring software. Therefore, tools that improve the software development experience can have a significant impact in the speed at which software is written through automation, but also the quality of software via automatically detecting and flagging potential errors. Deep learning's success over the past decade has shown a potential method for helping with the creation of what we've coined Intelligent Software Tools. Specifically, tools that automate or improve aspects of a software developer's workflow leveraging deep learning.
This dissertation focuses on the creation of these Intelligent Software Tools that help in many of the aspects comprising software development. To guide our construction of these intelligent software tools, we performed a systematic literature review to understand the current landscape of research of applying deep learning techniques to software tasks and any gaps that exist. This involved processing 128 papers manually to extract information such as the software engineering task and deep learning model used. From this literature review, we found source code related tasks were the most popular such as code generation or program synthesis. Therefore, for our efforts in developing intelligent software developer tooling, we wished to explore other potential software engineering tasks and artifacts. Specifically, we developed a tool for automatically detecting duplicate mobile bug reports from user submitted videos. This is done by first training a large deep learning model, we used the popular Convolutional Neural Network (CNN), to learn important features from a large collection of mobile screenshots. This model is then used to extract the important features from the frames in a user submitted video-based bug report. The features are then compared between a newly submitted video and a corpus of existing ones from previous bug reports to determine their similarity and produce a ranked list of duplicate candidates that a developer can review for duplicates. Next, we explored the task of semantic code search as this has the potential to help developers in many areas such as feature location or code duplication. However, many of the previous approaches that perform code search at method level treat the code in isolation. This is unrealistic as a majority of code that developers will interact with will be part of a larger software system. Therefore, we created Athena, a software tool that leverages knowledge of a software system through its call-graph when constructing a high-level representation of the methods inside of the system thereby allowing for information the role of the method to be taken into consideration when using the representation to perform code search via cosine similarity. We used a combination of the Transformer architecture, which is a state of art model in natural language processing (NLP), and Node2Vec architecture, which also is a popular model in Graph learning tasks. Lastly, we explored the task of code completion, which has seen heavy interest from industry and academia. Specifically, we explored techniques to improve the efficiency of training these models since training deep learning models can incur a heavy cost due to the necessity of large models and datasets. In this work, we explored two deep learning architectures, T5 and ALiBi, both of which have shown an interesting ability to be trained on short natural language English sequences yet perform well on sequences there are much longer during inference. We sought to answer whether these approaches could be applied to code as this could significantly decrease the cost of training these models.
Through numerous empirical evaluations, of which included user studies and specially designed benchmarks, we show the usefulness, practicality, and performance of these works.
Bio:
Nathan is a nerd and a Ph.D Candidate under the supervision of Dr. Denys Poshyvanyk at William & Mary. His research area is on the intersection of Software Engineering and Deep Learning. Specifically, on the creation of intelligent tools for helping software developers. on algorithms at the intersection of numerical linear algebra, machine learning, and data science. He has published in the top peer-reviewed Software Engineering venues ICSE and MSR. He has also received the ACM SIGSOFT Distinguished paper award at ICSE'20. Additionally, he has helped mentor undergraduates and received the S. Laurie Sanderson Award for Excellence in Undergraduate Mentoring. Previously, he received a B.S. degree in Software Engineering from the University of West Florida in 2018. More information is available at https://nathancooper.io/#/