Graph pattern matching is one of the most interesting and challenging operations in graph analytics. Query languages like openCypher, implemented in systems like Neo4j, SAP HANA Graph and Redis Graph, allow the intuitive definition of graph patterns including structural and semantic predicates.
For now, graph query languages are most prominent in graph database systems such as Neo4j. However, we think that many systems can benefit from having such a language in their toolbox. One of these systems is Apache Spark, which is one of the most popular open source frameworks in the context of distributed processing of large data volumes within complex analytical workloads. To bring the benefits of Cypher from the graph database realm into the world of Big Data, we at Neo4j started developing Cypher for Apache Spark (CAPS). CAPS is primarily focused on graph-powered data integration and graph analytical query workloads within the Spark ecosystem. In addition, CAPS is our testbed for Cypher language extensions as specified in the openCypher project; for example, multiple graphs, graph transformations and construction, and query composition.
In our talk, we want to motivate use-cases for CAPS and give an overview of new querying capabilities which we demonstrate using Apache Spark and Apache Zeppelin. Furthermore, we briefly present the internal architecture highlighting the main differences between Neo4j and CAPS.
Developers and analysts, trying to express graph queries on distributed graphs
Speakers: Martin Junghanns Max Kießling