A&S Graduate Studies
[PAST EVENT] Christopher Vendome, Computer Science - Dissertation Defense
Location
Integrated Science Center (ISC), Room 1280540 Landrum Dr
Williamsburg, VA 23185Map this location
Abstract:
Software licensing determines how open source systems are reused, distributed, and modified from a legal perspective. While it facilitates rapid development, it can present difficulty for developers in understanding due to the legal language of these licenses. Because of misunderstandings, systems can incorporate licensed code in a way that violates the terms of the license. Such incompatibilities between licensing can result in the inability to reuse a particular library without either re-licensing the system or redesigning the architecture of the system. Prior efforts have predominantly focused on license identification or understanding the underlying phenomena without reasoning about compatibility in a broad scale.
The work in this dissertation first investigates the rationale of developers and identifies that areas that developers struggle with respect to licensing. First, we investigate the diffusion of licenses and the prevalence of license changes in a large scale empirical study of 16,221 Java systems. We observed a clear lack of traceability and a lack of standardized licensing that led to difficulties and confusion for developers trying to reuse source code. We further investigated the difficulty by surveying the developers of the systems with license changes to understand why they first adopted a license and then changed licenses. Additionally, we performed an analysis on issue trackers and legal mailing lists to extract licensing bugs. From these works, we identified key areas that developers struggled and needed support.
While developers need support to identify license incompatibilities and understand both the cause and implications of the incompatibilities, we observed that state-of-the-art license identification tools did not identify license exceptions. Since these exceptions directly modify the license terms (either the permissions granted by the license or the restrictions imposed by the license), we proposed an approach to complement current license identification techniques in order to classify license exceptions. The approach relies on supervised machine learners to classify the licensing text to identify the particular license exceptions or the lack of a license exception.
Subsequently, we built an infrastructure to assist developers with evaluating license compliance of a system their system. The infrastructure evaluates compliance across the dependency tree of a system to ensure it is compliant with all licenses. When an incompatibility is present, it notes the specific library/libraries and the conflicting license(s) so that the developers and remove these compliance issues, which would prevent distribution of their software, from their system. We conduct a study on 121,094 open source projects spanning 6 programming languages, and we demonstrate that the infrastucture is able to identify license incompatibilities between these projects and their dependencies.
Biography:
Christopher Vendome is a Ph.D. student at William & Mary. He is a member of the SEMERU Research Group and is advised by Dr. Denys Poshyvanyk. He received a B.S. in Computer Science from Emory University in 2012 and he received his M.S. in Computer Science from William & Mary in 2014. His main research areas are software maintenance and evolution, mining software repositories, software provenance, and software licensing.