W&M Featured Events
[PAST EVENT] Jeremy Myers, Computer Science - Dissertation Defense
Abstract:
Low-rank approximations play an important role in data science analysis and applications. When the model is linear and the data is represented as a matrix, the optimal rank-r approximation is given by the r dominant factors of the singular value decomposition (SVD). When the model is multilinear and the data is represented as a multi-way array called a tensor, the optimal rank-r approximation may not even exist. Nonetheless, the canonical polyadic decomposition (CP) provides a useful low-rank tensor approximation for interpretability and analysis in a way similar to the matrix SVD but across multiple axes simultaneously. In this dissertation, we focus on two important problems involving low-rank matrix and tensor models for data science applications.
The first thrust of this dissertation involves maintaining a summary or sketch matrix from a data stream. We study the trade-offs between an established framework for low-rank approximation of streaming data, called incremental SVD, where a low-rank approximation is maintained for every innovation of data, and randomized sketching, where the data stream is summarized and a low-rank approximation is reconstructed later. We analyze the role of iterative methods with incremental SVD and provide numerical evidence to support our analysis. We evaluate the relative accuracy and computational costs of incremental SVD and randomized sketching on datasets that are challenging for even state-of-the-art batch solvers. We develop a sampling-based convergence criterion to terminate computation early with negligible loss of fidelity.
The second thrust involves computing a low-rank CP decomposition model when the input entries are counts, i.e., non-negative integers, that are assumed to follow a Poisson distribution. Our first contribution is a systematic study of state-of-the-art algorithms. Our second contribution is a hybrid method that switches between a deterministic and a randomized algorithm to leverage desirable computational properties from both methods with the overall goal of achieving optimal accuracy-performance trade-off. Our third contribution is a heuristic that detects convergence toward undesirable solutions and implements a restarting technique to improve the likelihood the method will converge to the optimal solution.
Bio:
Jeremy Myers is a Ph.D candidate under the supervision of Dr. Andreas Stathopoulos in the Computer Science Department at William & Mary and Dr. Daniel Dunlavy at Sandia National Laboratories. He received a B.S. and a M.S. in Mathematical Sciences from Virginia Commonwealth University in Richmond, VA, in 2014 and 2017, respectively. He received a B.A. in International Affairs from James Madison University in Harrisonburg, VA, in 2009. His research focuses on algorithms at the intersection of numerical linear algebra, machine learning, and data science. His research has appeared in IEEE HPEC20, IEEE HPEC21, SIAM OP23, and ICIAM 2023. He has co-organized minisymposia at SIAM LA21 and SIAM PP22.
Sponsored by: Office of Graduate Studies