Research Project Title:
Optimizing Multi-GPU Table Joins
abstract:Commercial GPU-accelerated databases like MapD and Kinetica are increasingly used to interactively analyze large-scale datasets. Since GPUs have thousands of cores, GPU-based databases have the potential to be much faster than CPU-based databases. However, large scale analytic queries are typically bottlenecked by I/O and memory bandwidth. The goal of my project is to research the table join problem in the context of partitioned databases running on a GPU cluster, and investigate algorithms to leverage recent advancements in hardware, like NVLink and NVMe. Although extensive, previous work has been done to optimize partitioned joins on a cluster of CPUs, optimizing performance for GPU-based database joins is still an open problem.
I am excited about SuperUROP because it lets me apply what I have learned from classes I have taken in previous terms, like 6.824 (Distributed Systems), 6.172 (Performance Engineering), and 6.828 (OS Engineering), to a project with real world load and resource constraints. This project will help me further understand the modern CPU / GPU architecture and how to program them efficiently.