conferences | speakers | series

H2O: An Open-Source Platform for Machine Learning and Big Data/Big Math

home

H2O: An Open-Source Platform for Machine Learning and Big Data/Big Math
FOSDEM 2016

H2O: An Open-Source Platform for Machine Learning and Big Data/Big Math. H2O is clustering: from just your laptop to 100's of nodes, you get a Single System Image; allowing easy aggregation of all the memory and all the cores, and a simple coding style that scales wide at in-memory speeds.

This is a technical talk on the insides of H2O, specifically focusing on the Single-System-Image aspect: how we write single-threaded code, and have H2O auto-parallelize and auto-scale-out to 100's of nodes and 1000's of cores.

H2O: An Open-Source Platform for Machine Learning and Big Data/Big Math. H2O is clustering: from just your laptop to 100's of nodes, you get a Single System Image; allowing easy aggregation of all the memory and all the cores, and a simple coding style that scales wide at in-memory speeds. H2O is easily 1000x faster than disk based clustering solutions, and often 10x faster than best-of-breed alternative in-memory solutions. H2O is Big Data: we ingest a wide variety of formats, in parallel and distributed across the cluster, and store the data column-compressed - often exceeding 2x to 4x gzip-on-disk. H2O is Big Math: we do scale-out math at memory-bandwidth speeds (on compressed data!), making terabyte-scale munging an interactive experience. H2O is Machine Learning: On this Big Data, Big Math platform we have Best-of-Breed implementations of effective and popular Machine Learning algorithms: e.g. Deep Learning (Neural Nets), GBM, Random Forest, GLM, K-means, PCA, Naive Bayes, and more... with all the features you need to do real data science built-in. Finally H2O interacts directly with Python, R, Scala, Spark, REST/JSON, and a JS-based web browser - making it the most interconnected Machine Learning platform out there.

This is a technical talk on the insides of H2O, specifically focusing on the Single-System-Image aspect: how we write single-threaded code, and have H2O auto-parallelize and auto-scale-out to 100's of nodes and 1000's of cores.

Speakers: Cliff Click