Benchmarking graph databases with gMark

FOSDEM 2016

Massive graph data sets are pervasive in contemporary application domains. Hence, graph database systems are becoming increasingly important. In the study of these systems, it is vital that the R&D community has shared benchmarking solutions for the generation of database instances and query workloads having predictable and controllable properties. Similarly to TPC benchmarks for relational databases, benchmarks for graph databases have been important drivers for the Semantic Web and graph data management communities. Current benchmarks, however, are either limited to fixed graphs or graph schemas, or provide limited or no support for generating tailored query workloads to accompany graph instances.

To move the community forward, a benchmarking approach which overcomes these limitations is crucial. In this talk, we present the design and engineering principles of gMark, a domain- and query language-independent open-source graph benchmark addressing these limitations of current solutions. A core contribution of gMark is its ability to target and control the diversity of properties of both the generated graph instances and the generated query workloads coupled to these instances. A further novelty is the support of recursive regular path queries, a fundamental graph query paradigm. We illustrate the flexibility and practical usability of gMark by showcasing the framework's capabilities in generating high quality graphs and workloads, and its ability to encode user-defined schemas across a variety of application domains.

This is joint work with Guillaume Bagan, Angela Bonifati, Radu Ciucanu, Aurélien Lemay, and Nicky Advokaat.

Audience: Developers, trying to optimize processing of highly connected data sets

This is joint work with Guillaume Bagan, Angela Bonifati, Radu Ciucanu, Aurélien Lemay, and Nicky Advokaat.

Speakers: George Fletcher