Arts & Sciences Events
This calendar presented by
Arts & Sciences
[PAST EVENT] What's new in "The Big Data Theory"?
February 8, 2016
8am - 9am
Speaker: Grigory Yaroslavtsev, University of Pennsylvania
Time and place: Monday, Feb. 8th at 8AM in McGl 020
Title: What's new in "The Big Data Theory"?
Abstract:
In this talk I will cover some recent advances in theoretical foundations of big data analysis. I will focus on new topics in the analysis of distributed algorithms and communication complexity motivated by advances in systems such as MapReduce and services provided through cloud infrastructure. I will show how the number of interactive supersteps (or rounds) plays a crucial role in determining the overall performance and cost of performing a distributed computation.
I will illustrate this premise through multiple examples including:
-- The tradeoffs between the number of supersteps and the amount of communication required for checking consistency between two large distributed file systems.
-- Algorithms that optimize the number of supersteps for combinatorial problems on multi-dimensional vectors (single-linkage clustering, minimum spanning tree and bichromatic matching).
I will also demonstrate connections with topics such as adaptive data analysis
Bio:
{{http://www.grigory.us, Grigory Yaroslavtsev}} is a postdoctoral fellow at the Warren Center for Network and Data Sciences at the University of Pennsylvania. He is interested in developing efficient combinatorial algorithms for sparsification, summarization and testing properties of large data, including: approximation, parallel, streaming and online algorithms; learning theory and property testing; communication and information complexity; private data release. Grigory has been continuously supported through various fellowships since he started graduate school in 2010 and has been a research intern and visiting consultant for AT&T Labs, IBM Research, Microsoft Research and Google.
Time and place: Monday, Feb. 8th at 8AM in McGl 020
Title: What's new in "The Big Data Theory"?
Abstract:
In this talk I will cover some recent advances in theoretical foundations of big data analysis. I will focus on new topics in the analysis of distributed algorithms and communication complexity motivated by advances in systems such as MapReduce and services provided through cloud infrastructure. I will show how the number of interactive supersteps (or rounds) plays a crucial role in determining the overall performance and cost of performing a distributed computation.
I will illustrate this premise through multiple examples including:
-- The tradeoffs between the number of supersteps and the amount of communication required for checking consistency between two large distributed file systems.
-- Algorithms that optimize the number of supersteps for combinatorial problems on multi-dimensional vectors (single-linkage clustering, minimum spanning tree and bichromatic matching).
I will also demonstrate connections with topics such as adaptive data analysis
Bio:
{{http://www.grigory.us, Grigory Yaroslavtsev}} is a postdoctoral fellow at the Warren Center for Network and Data Sciences at the University of Pennsylvania. He is interested in developing efficient combinatorial algorithms for sparsification, summarization and testing properties of large data, including: approximation, parallel, streaming and online algorithms; learning theory and property testing; communication and information complexity; private data release. Grigory has been continuously supported through various fellowships since he started graduate school in 2010 and has been a research intern and visiting consultant for AT&T Labs, IBM Research, Microsoft Research and Google.