According to Andrej Kaparthy, there are four main factors holding back AI: Compute, Data, Algorithms, and Infrastructure. In this talk, we will show how we attack the Data and Infrastructure challenges for Deep Learning. Specifically, we will show how we integrated Tensorflow with the world's most scalable and human-friendly distribution of Hadoop, Hops (www.hops.io). Hops is a new European distribution of Hadoop with a distributed metadata architecture and 16X the performance of HDFS. Hops also includes a human-friendly UI, called Hopsworks, with support for the Apache Zeppelin Notebook. We will show how users can run tensorflow programs in Apache Zeppelin on huge datasets in Hadoop. Moreover, we will show how Hopsworks makes discovering and downloading huge datasets a piece of cake with peer-to-peer sharing of datasets between Hopsworks clusters. Within minutes, you can install Hopsworks, discover curated important datasets and download them to train Deep Neural networks using Tensorflow. Hops is the first Hadoop distribution to support Tensorflow. Hops and Hopsworks are both Apache v2 licensed projects and have been developed primarily at KTH Royal Institute of Technology and SICS Swedish ICT in Stockholm.
According to Andrej Kaparthy, there are four main factors holding back AI: Compute, Data, Algorithms, and Infrastructure. In this talk, we will show how we attack the Data and Infrastructure challenges for Deep Learning. Specifically, we will show how we integrated Tensorflow with the world's most scalable and human-friendly distribution of Hadoop, Hops (www.hops.io). Hops is a new European distribution of Hadoop with a distributed metadata architecture and 16X the performance of HDFS. Hops also includes a human-friendly UI, called Hopsworks, with support for the Apache Zeppelin Notebook. We will show how users can run tensorflow programs in Apache Zeppelin on huge datasets in Hops Hadoop. Moreover, we will show how Hopsworks makes discovering and downloading huge datasets a piece of cake with custom peer-to-peer sharing of datasets between Hopsworks clusters. A new user can, within minutes, install Hopsworks, discover curated important datasets and download them to train Deep Neural networks using Tensorflow. Hops is the first Hadoop distribution to support Tensorflow. Hopsworks itself is a self-service UI for Hops Hadoop, that is based around projects, users, and dataset concepts. Users collaborate in projects that contain datasets. Data owners can give users access to process data (but not download it, copy it outside of the project, or cross-link it with data outside the project). Hopsworks, thus, provides stronger access control guarantees than are available in Hadoop, enabling senstive data to securely reside on shared Hadoop clusters. Since April 2016, Hopsworks has provided Hadoop/Spark/Flink/Kafka-as-a-service to researchers in Sweden from the Swedish ICT SICS Data Center at www.hops.site.
Hops and Hopsworks are both Apache v2 licensed projects and have been developed primarily at KTH Royal Institute of Technology and SICS Swedish ICT in Stockholm.
Speakers: Jim Dowling Gautier Berthou