The intersection of massively parallel processing (MPP) databases and general-purpose programming on graphics processors (GPGPU) affords incredible compute capabilities to scientists and analysts. This talk will showcase the marriage of well-established, open source MPP database infrastructure and cutting edge data-level parallelism using GPGPU. Some examples will be shown using a hosted, cluster environment to showcase the ease of implementation. Pending disclosure authorization, some real-world use cases will be discussed as well.
The goal is to showcase the integration of database user-defined functions (UDFs) and the single instruction multiple data (SIMD) compute capabilities available to programmers on modern GPU devices - all with open source software. I will speak to the motivation for doing this and briefly demonstrate the required infrastructure and effort. The applications of this technique span a wide range of industries and use-cases; pending disclosure authorization, I will highlight a few of these. I plan to keep the content comprehensible to an audience familiar with any or all of the following (MPP, SQL UDFs, Python, CUDA, OpenCL) although I will briefly outline the importance of each as they relate to this technology showcase (and at large). I am hoping for a more even split between Q&A and content delivery as this tends to be the most engaging for the audience (and myself).
Speakers: Kyle Dunn