The LDBC benchmark suite

FOSDEM 2023

We motivate and present an open-source benchmark suite for graph processing, created and maintained by the Linked Data Benchmark Council (LDBC). We first define common graph workloads and the pitfalls of benchmarking systems that support them, then explain our guiding principles that allow for conducting meaningful benchmarks. We outline our open-source ecosystem that consists of a scalable graph generator (capable of producing property graphs with 100B+ edges) and benchmarks drivers with several reference implementations. Finally, we highlight the results of recent audited benchmark runs.

Data processing pipelines frequently involve graph computations: running complex path queries in graph databases, evaluating metrics for network science, training graph neural networks for classification, and so on. While graph technology has received significant attention in academia and industry, the performance of graph processing systems is often lacklustre, which hinders their adoption for large-scale problems.

The Linked Data Benchmark Council (LDBC) was founded in 2012 by vendors and academic researchers with the aim of making graph processing performance measurable and comparable. To this end, LDBC provides open-source benchmark suites with openly available data sets starting at 1 GB and scaling up to 30 TB. Additionally, it allows vendors to submit their benchmark implementations to LDBC-certified auditors who ensure that the benchmark executions are reproducible and comply with the specification.

In this talk, we describe three LDBC benchmarks: (1) the Graphalytics benchmark for offline graph analytics, (2) the Social Network Benchmark's Interactive workload for transactional graph database systems, and (3) the Business Intelligence workload for analytical graph data systems. For each benchmark, we explain how it ensures meaningful and interpretable results. Then, we summarize the main features of the benchmark drivers and list the current reference implementations (maintained by vendors and community members). Finally, we highlight recent audited benchmark results.

Information on the talk:

Expected prior knowledge: specialized prior knowledge is not needed
Intended audience: developers and users of graph processing frameworks

Speakers:

Gábor Szárnyas is a post-doctoral researcher at CWI Amsterdam. He is the lead developer of the LDBC Social Network Benchmark's BI workload and the maintainer of the Graphalytics benchmark. He is a member of the LDBC steering committee.
David Püroja is a research software engineer at CWI Amsterdam and the maintainer of the LDBC Social Network Benchmark's Interactive workload. He is a certified LDBC auditor.

Previous talks:

The LDBC Social Network Benchmark: Business Intelligence workload (15th LDBC Technical User Community meeting, 2022)
The Linked Data Benchmark Council: Fostering competition in the graph processing space (World AI Conference, 2022)

Speakers: Gabor Szarnyas David Püroja