W&M Featured Events
[PAST EVENT] Christopher Vendome, Computer Science - Oral Exam
Location
0248Abstract:
Software licensing determines how open source systems are reused, distributed, and modified from a legal perspective. While it facilitates rapid development, it can present difficulty for developers in understanding due to the legal language of these licenses. Because of misunderstandings, systems can incorporate licensed code in a way that violates the terms of the license. Such incompatibilities between licensing can result in the inability to reuse a particular library without either re-licensing the system or redesigning the architecture of the system. Prior efforts have predominantly focused on license identification or understanding the underlying phenomena without reasoning about compatibility in a broad scale.
The work in this dissertation first investigates the rationale of developers and identifies that areas that developers struggle with respect to licensing. First, we investigate the diffusion of licenses and the prevalence of license changes in a large scale empirical study of 16,221 Java systems. We observed a clear lack of traceability and a lack of standardized licensing that led to difficulties and confusion for developers trying to reuse source code. We further investigated the difficulty by surveying the developers of the systems with license changes to understand why they first adopted a license and then changed licenses. Additionally, we identified key areas that developers struggled and needed support.
While developers need support to identify license incompatibilities and understand both the cause and implications of the incompatibilities, we observed that state-of-the-art license identification tools did not identify license exceptions. Since these exceptions directly modify the license terms (either the permissions granted by the license or the restrictions imposed by the license), we proposed an approach to complement current license identification techniques in order to classify license exceptions. The approach relies on supervised machine learners to classify the licensing text to identify the particular license exceptions or the lack of a license exception.
Subsequently, we present our proposed research plan of ensuring license compliance of a system. Our research incorporates techniques from information retrieval, code search, mining software repositories, and previous work on licensing. Our work focuses on the development of a license compliance engine. This engine will not only identify license incompatibilities but it will also recommend strategies for developers to fix the incompatible components to ensure license compliance. Our research aims then to address both the analysis of licenses across dependencies and creating license traceability of byte-code (i.e. provenance). The component will allow us to extend our compliance to include binaries.
Bio:
Christopher Vendome is a Ph.D. student at William & Mary. He is a member of the SEMERU Research Group and is advised by Dr. Denys Poshyvanyk. He received a B.S. in Computer Science from Emory University in 2012 and he received his M.S. in Computer Science from College of William & Mary in 2014. His main research areas are software maintenance and evolution, mining software repositories, software provenance, and software licensing.