[PAST EVENT] Colloquium: Bringing BIGDATA to the Masses: The Boa Experience (new date!)

March 4, 2014
10am - 11am
Location
McGlothlin-Street Hall, Room 020
251 Jamestown Rd
Williamsburg, VA 23185Map this location
Robert Dyer, Iowa State University

Abstract

Impressively large datasets are becoming common for modern researchers across many fields, including biology, physics, genomics, astronomy, meteorology, and software engineering to name a few. The ability to analyze these large datasets requires not only large computational resources but also expert knowledge in big data techniques, libraries, and tools. Even if researchers have such expertise, analyzing such datasets is challenging due to the substantial investment in both time and building research infrastructure. These challenges place such analysis capabilities out of reach for many researchers.

In this talk I describe these challenges in detail for the field of mining software repositories using a specific dataset created from ultra-large-scale software repositories, such as SourceForge and GitHub. These repositories contain a massive amount of source code and related data. Mining these repositories provides researchers the ability to examine open-source development activities, leading to the development and testing of important hypotheses across many diverse areas, e.g. software engineering, programming languages, security, and legal and social issues. However, these mining activities also pose the aforementioned challenges.

I then describe a language and infrastructure called Boa, which aims to make mining these ultra-large-scale software repositories easy and accessible, even to non-experts. Users write their analyses in a high-level, declarative language that abstracts away many of the underlying details of obtaining, transforming, storing, and efficiently querying such a large dataset. The supporting infrastructure then automatically parallelizes the analysis and executes it on a distributed cluster, allowing to efficiently query millions of source files.

Finally I describe future research directions, to take what we learned from Boa and generalize the approach for other scientific domains.


Bio

Robert Dyer is a postdoctoral researcher in the Laboratory for Software Design at Iowa State University. His research interests are at the intersection of software engineering and programming language design. He is the lead researcher on the Boa project, which aims to allow even non-experts to easily and efficiently mine ultra-large-scale software repositories. He received his Ph.D. from the department of Computer Science at Iowa State University in December 2013, where he was the 2013 Tom Miller Fellow.

Dr. Dyer has published in leading software engineering venues and journals such as ICSE and TOSEM. He has served on the PC for the OOPSLA'13 artifact evaluation, FOAL'14, and Modularity'14 SRC and has been an external reviewer for conferences such as OOPSLA, TOPLAS, GPCE, TAOSD, and AOSD. He has received several distinguished awards, including second place at the SPLASH 2013 student research competition, the Dr. Robert Stewart Early Research Award in 2009, and a 2007 CRA Outstanding Undergraduate Award Honorable Mention. More information can be found on {{http://www.cs.iastate.edu/~rdyer/, his personal website.}}