[PAST EVENT] Mathematics Colloquium and EXTREEMS-QED Lecture: Lucas Mentch (Cornell University)

January 28, 2015
2pm - 3pm
Small Hall, Room 235
300 Ukrop Way
Williamsburg, VA 23185Map this location
Abstract: Machine learning algorithms are typically seen as prediction-only tools, meaning that the interpretability and intuition provided by a traditional statistical modeling approach are sacrificed in order to achieve superior predictions. In this talk, we argue that this black-box perspective need not always be the case. After contrasting the traditional statistical and machine learning approaches to data analysis, we demonstrate that predictions from tree-based ensemble learners like bagged trees and random forests, when appropriately structured, can be viewed as extended versions of U-statistics. Given this framework, we derive central limit theorems (CLTs) for predictions and derive a consistent estimate of variance that may be computed at no additional cost, which allows for formal statistical inference to be carried out in practice. In particular, we produce confidence intervals to accompany predictions and define formal hypothesis tests for both additivity and feature significance. When a large test set is required, we extend our testing procedures and utilize random projections to accommodate the potential p>>n
setting. These tools are illustrated on data provided by Cornell University's Lab of Ornithology.

Sarah Day