[PAST EVENT] Qiong Wu, Computer Science - Dissertation Defense
This thesis develops several forecasting models for simultaneously predicting the prices of d assets traded in financial markets, a most fundamental problem in the emerging area of ``FinTech''. The models are optimized to address three critical challenges, C1. High-dimensional interactions between assets. Assets could interact (e.g., Amazon's disclosure of its revenue change in cloud services could indicate that revenues also could change in other cloud providers). The number of possible interactions is quadratic in d, and is often much larger than the number of observations. C2. Non-linearity of the hypothesis class. Linear models are usually insufficient to characterize the relationship between the labels (responses) and the available information (features). C3. Data scarcity for each asset. The size of the data associated with an individual asset could be small. For example, a typical daily forecasting model based on technical factors uses three years (approx. 750 trading days) of data. We collect one data point for each day so only 750 observations are available for each asset.
We develop the following works to address these challenges.
Adaptive reduced rank regression (addressing C1). We examine a linear regression model y=Mx+? that aims to directly capture the interactions between all features from all assets and all the responses, by estimating d×?(d) entries in M using O(d) observations. In this setting, existing low-rank regularization techniques such as reduced rank regression or nuclear-norm based regularizations fail to work. Adaptive Reduced Rank Regression (Adaptive-RRR) is a new provable algorithm for estimating M under a mild assumption on the spectrum of the covariance matrix of x.
On embedding stocks (addressing C1 & C2). We next propose a semi-parametric model called the "additive influence model" that decomposes the inference problem into two orthogonal subroutines. One subroutine is used to learn the high-dim interactions between entities, and we solve the problem with techniques developed for Adaptive-RRR. The other subroutine is used to learn the non-linear signals, and we solve the problem with practical algorithms such as deep learning and ensemble learning.
Equity2Vec: Interaction beyond return correlations (addressing C2 & C3). We develop a specialized neural net model for each asset (e.g., train gi (?) for asset i) but there is insufficient data to properly train gi with data only from i (because of C3). Our idea is to shrink gi (?)’s toward one or more centroids to reduce model (sample) complexities. Specifically, we train a neural net model gi (x, W, Wi ) where W is shared across all entities, Wi is entity-specific and is learned through embedding, and gi (x) = gi (x, W, Wi ). When entities i and j are close, then Wi and Wj are close. Consequently, gi and gj will be similar when entity i and entity j are similar.
The proposed algorithms/models are verified via extensive experiments based on real-world equity datasets. Our forecasting models can also be applied to a wide range of applications, such as identifying biomarkers, understanding risks associated with various diseases, image recognition, and link prediction.
Qiong Wu is a Ph.D. candidate in the Department of Computer Science at William & Mary advised by Prof. Zhenming Liu. Her research on developing robust machine learning algorithms under harsh conditions, such as an excessively large number of features and notoriously low signal-to-noise ratio. Her Ph.D. research has been published in TIST 2021, NeurIPS 2020, ICAIF 2020, AAAI 2019, and ICSC 2019. Her current and past efforts include collaborations with AT&T on customer care analysis, Instacart on conceptual graph construction, and The Alan Turing Institute on high-dimensional regularization methods and developing forecasting models for financial instruments. Before joining William & Mary, she received her B.Eng degree from the Dalian University of Technology in 2014 and an M.Sc degree from the University of Hong Kong in 2016.