High Performance IO and Storage Engine for Graph Data Management and Analytics
Pradeep Kumar (George Washington University)
The arrival of big data has established a new paradigm of harnessing data. This has opened new research frontiers in the form of data analytics and machine learning to solve the problems of diverse domains, such as cyber-security, medical science, bio-science, social network etc. However, harnessing data faces challenges on multiple fronts of which developing systems to enable usage of modern storage and compute devices for data management and analytics is of special interest to me. Specifically, I have focused on the twin challenges of ingesting the ever increasing rate of data arrival and timely but diverse data access needs of different classes of graph analytics. This has resulted in identification of the several bottlenecks in the data management layer, the kernel IO stack, and the memory sub-system of the data stack. To this end, I have proposed new design conventions and abstractions to improve the performance and understanding of these sub-systems. In this talk, I will first present a unified data store for evolving graphs that has enabled diverse classes of real-time analytics at different granularities of data access using two new abstractions of graph view and data visibility. This will be followed by a discussion of the first ever graph analytics system to perform high performance batch processing on trillion edge graphs using multiple solid state drives (SSD). Then, I will focus on a new kernel IO stack design for multi-SSD volume that proposes a new convention of per-drive IO processing for efficient IO need of the data analytics. The new convention replaces the existing per-volume IO processing that had limited or no parallelism among member drives of the volume. Finally, I will conclude with future outlook of the data analytics on the graph data.
Pradeep Kumar is a PhD student at the George Washington University. His work focuses on the storage and compute optimizations for big data and graph analytics applications targeting modern storage and compute devices. His research papers have frequently appeared on leading system conferences, such as SC, Usenix FAST, and Usenix ATC. His team was a finalist at the inaugural IEEE/Amazon/DARPA Graph Challenge in the year 2017 for showing excellent performance and scalability in a multi-GPU based triangle counting solution. Before joining the George Washington University, he spent over four years at NetApp India in the design and implementation of Flash based storage products, data mirroring and disaster recovery modules, and storage management tools. He is a recipient of “Huawei New Comer” award in the year 2008 for his outstanding work and initiatives in the design and implementation of compiler and run-time library for TTCN and ASN programming languages. He completed his Bachelor of Technology degree from Indian Institute of Technology, Dhanbad, India.