YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing

FOSDEM 2014

As part of Hadoop 2.0, YARN takes the resource management capabilities that were in MapReduce and packages them so they can be used by new engines. This also streamlines MapReduce to do what it does best: process data. With YARN, you can now run multiple applications in Hadoop, all sharing a common resource management. Many organizations are already building applications on YARN in order to bring them in to Hadoop.

A developer room is also organized to apply the presented technologies.

Apache Hadoop YARN is a sub-project of Hadoop at the Apache Software Foundation introduced in Hadoop 2.0 that separates the resource management and processing components. YARN was born of a need to enable a broader array of interaction patterns for data stored in HDFS beyond MapReduce, not constrained to MapReduce.

The fundamental idea of YARN is to split up the two major responsibilities of the JobTracker/TaskTracker into separate entities: - A global Resource Manager. - A per-application Application Master. - A per-node slave Node Manager. - A per-application Container running on a Node Manager.

These added capabilities allow enterprises to realize near real-time processing and increased ROI on their Hadoop investments. With MapReduce becoming a user-land library, it can evolve independently of the underlying resource manager layer and in a much more agile manner.

Speakers: Eric Charles