A&S Graduate Studies
[PAST EVENT] Jeremy Myers, Computer Science - Oral preliminary exam
Abstract:
Low-rank approximations play an important role in data science analysis and applications. When the model is linear and the data is represented as a matrix, the optimal rank-r approximation is given by the r dominant factors of the singular value decomposition (SVD). When the model is multilinear and the data is represented as a multi-way array, called a tensor, the optimal rank-r approximation may not even exist. Nonetheless, the canonical polyadic decomposition (CP) provides a useful low-rank tensor approximation for interpretability and analysis in a way similar to the matrix SVD but across multiple axes simultaneously. In this dissertation, we focus on two important problems involving low-rank models for data science applications.
The first problem involves maintaining a summary matrix or sketch from a data stream. We follow an established framework for low-rank approximation of streaming data, called incremental SVD (IncSVD), where computing the SVD is the main computational bottleneck. Prior work used SVD methods for dense matrices, but the cost of these methods is computationally intractable given the volume of data streamed in the problems we consider. Using error bounds from prior work, we first present an analysis to support our argument that one must use iterative Krylov subspace methods to address the streaming-sketching problem with minimal error. We investigate the role of randomization techniques for estimating error by sampling, which is necessary since the data is assumed to be never available in its entirety. We consider an application from kernel learning that requires a low-rank SVD approximation of a large, dense matrix.
The second problem involves computing a low-rank CP decomposition model when the input has entries that are non-negative integers, or counts, that are assumed to follow a Poisson distribution. We develop a hybrid method that switches between deterministic and randomized algorithms to leverage desirable computational properties with the overall goal of achieving optimal accuracy-performance trade-off. This investigation begins with a systematic study of the algorithms at hand and concludes with a method for nonlinear, nonconvex global optimization.
Bio:
Jeremy Myers is a Ph.D Candidate under the supervision of Dr. Andreas Stathopoulos in the Computer Science Department at William & Mary. His research focuses on algorithms at the intersection of numerical linear algebra, machine learning, and data science. Additionally, Jeremy has conducted doctoral research under the supervision of Dr. Daniel Dunlavy at Sandia National Laboratories in Albuquerque, NM, and Livermore, CA, since 2019. His research has appeared in IEEE HPEC20 & IEEE HPEC21. He has co- organized minisymposia at SIAM LA21 and SIAM PP22. He received a B.S. and a M.S. in Mathematical Sciences from Virginia Commonwealth University in Richmond, VA, in 2014 and 2017, respectively. He received a B.A. in International Affairs from James Madison University in Harrisonburg, VA, in 2009.