Deep Learning on Massively Parallel Processing Databases

FOSDEM 2019

In this session we will discuss the use of massively parallel databases for deep learning, drawing on experience from running deep learning frameworks like Keras and TensorFlow with GPU acceleration using free and open source software like Greenplum Database and the Apache MADlib machine learning library. Topics will include architecture, common usage patterns, scalability results and bright opportunities for the future.

Deep neural networks are very efficient at solving problems in domains such as computer vision, speech recognition and language translation. Once solely the purview of academia and Silicon Valley types of companies, deep learning is now making inroads in the enterprise by virtue of new algorithms, better tools, and lower costs for computation, storage and networking.

But enterprise data typically lives in relational and document form in databases, so how can you use this data for building deep learning models? You could try to move it out to a separate execution engine, but it is suboptimal to copy huge amounts of data between systems. What about the idea of building deep learning models directly in the database, bringing the compute to where the data lives?

It’s possible and I look forward to discussing this topic at FOSDEM’19!

Speakers: Frank McQuillan