[PAST EVENT] Ji Xue, Computer Science - Ph.D. Final Defense
In large-scaled and distributed systems, like multi-tier storage systems and cloud data centers, resource sharing among workloads brings multiple benefits while introducing many performance challenges. The key To effective workload multiplexing is accurate workload prediction. This thesis focuses on how to capture the salient characteristics of the real-world workloads to develop workload prediction methods and to drive scheduling and resource allocation policies, in order to achieve efficient and in-time resource isolation among applications.
For a multi-tier storage system, high-priority user work is often multiplexed with low-priority background work. This brings the challenge of how to strike a balance between maintaining the user performance and maximizing the amount of finished background work. In this thesis, we propose two resource isolation policies based on different workload prediction methods: one is a Markovian model-based and the other is a neural networks-based. These policies aim at, via workload prediction, discovering the opportune time to schedule background work with minimum impact on user performance. Trace-driven simulations verify the efficiency of the two proposed resource isolation policies. The Markovian model-based policy successfully schedules the background work at the appropriate periods with small impact on the user performance. The neural networks-based policy adaptively schedules user and background work, resulting in meeting both performance requirements consistently.
This thesis also proposes an accurate while efficient neural networks-based prediction method for data center usage series, called PRACTISE. Different from the traditional neural networks for time series prediction, PRACTISE selects the most informative features from the past observations of the time series itself. Testing on a large set of usage series in production data centers illustrates the accuracy (e.g., prediction error) and efficiency (e.g., time cost) of PRACTISE.
The superiority of the usage prediction also allows a proactive resource management in the highly virtualized cloud data centers. In this thesis, we analyze on the performance tickets in the cloud data centers, and propose an active sizing algorithm, named ATM, that predicts the usage workloads and re-allocates capacity to workloads to avoid VM performance tickets. Moreover, via characterization of resource usage in cloud data centers, we discover the hidden relationship between mean and tail of resource usage. Driven by cheap prediction of usage tails, we also present TailGuard in this thesis, which dynamically clones VMs among co-located boxes, in order to efficiently reduce the performance violations of physical boxes in cloud data centers.
Ji Xue has been working on his Ph.D. in Computer Science since Fall 2012. He is working with Dr. Evgenia Smirni, and his research interests include performance evaluation/diagnosis, resource management in large-scale storage systems and cloud data centers, and data mining problems related with time series. Ji did several internships during his Ph.D., including Microsoft Research (2014 summer), IBM Research (2015 fall), and Google (2015 & 2016 summer). Before joining W&M, Ji got his bachelor degree in Computer Science from Beihang University (BUAA) in 2012.