W&M Featured Events
[PAST EVENT] Data Enrichment for Data Science
Speaker: Fatemeh Nargesian, University of Toronto
Title: Data Enrichment for Data Science
Abstract:
Preparing data for advanced analytics is prohibitively time-consuming and
computationally expensive. In this talk, I will discuss my research on the
challenges of data preparation for data science. In particular, I will
talk about data discovery problem. In data science, it is increasingly the
case that the main challenge is not in integrating known data, rather it
is in discovering the right data to solve a given data science problem. I
discuss two paradigms of data discovery. In the first paradigm, the query
is a dataset and the data scientist is interested in interactively finding
datasets that can be integrated (e.g unioned) with the query. I will
introduce a probabilistic framework for searching for top-k unionable
tables and aligning them with a query table and discuss the need for
distribution-aware techniques for data discovery. In the second paradigm,
search does not start with a query, instead, it is data-driven. I will
talk about data lake organization problem where the goal is to find a
directory structure -- data lake organization -- that allows a user to
most efficiently navigate data lakes. I will present a probabilistic
navigation model of how users interact with a directory structure and
introduce a scalable local search algorithm for optimizing data lake
organizations.
Bio:
Fatemeh Nargesian is a PhD candidate in the Data Curation Group of the
Department of Computer Science at University of Toronto. Her primary
research interests are in the data management challenges of end-to-end
data science. A paper she co-authored on data discovery was accorded the
Best Demonstration Award at VLDB 2017. While at University of Toronto,
Fatemeh was a joint Research intern at IBM Research-NY. Prior to
University of Toronto, she worked on clinical data management at the
Clinical Informatics Research Group at McGill University, and received
M.Sc. degrees in Computer Science at University of Ottawa and Artificial
Intelligence at Sharif University of Technology.
Contact
Pieter Peers