Gabor Szarnyas

👥 5 conferences
🎤 6 talks
📅 Years active: 2017 to 2023
📊 Wikidata: Q57166117

Biography

Gabor Szarnyas is a researcher working on graph processing techniques. His core research areas are live graph pattern matching, benchmarking graph queries, and analyzing large-scale networks. His main research project is ingraph, an openCypher-compatible query engine supporting live query evaluation. His research team was the first to publish a formalisation that captures the semantics of a core subset of the openCypher language.

Gabor works at the Budapest University of Technology and Economics, teaching system modelling and database theory. He conducted research visits at the University of York, McGill University and the University of Waterloo. He is a member of the openCypher Implementers Group and the LDBC Social Network Benchmark task force. He received 1st prize in the ACM Student Research Competition at the MODELS 2016 conference. He is also a frequent speaker at industrial conferences (FOSDEM, GraphConnect) and meetups (openCypher meetup NYC, Budapest Neo4j meetup).

Links

University page

GitHub

Twitter

— biography from FOSDEM 2019
https://archive.fosdem.org/2019/schedule/speaker/gabor_szarnyas/

Conferences

5 known conferences

👥 FOSDEM 2023 📅 04 Feb 2023

🎤 The LDBC benchmark suite
05 Feb 2023 show details

We motivate and present an open-source benchmark suite for graph processing, created and maintained by the Linked Data Benchmark Council (LDBC). We first define common graph workloads and the pitfalls of benchmarking systems that support them, then explain our guiding principles that allow for conducting meaningful benchmarks. We outline our open-source ecosystem that consists of a scalable graph generator (capable of producing property graphs with 100B+ edges) and benchmarks drivers with several reference implementations. Finally, we highlight the results of recent audited benchmark runs.

Data processing pipelines frequently involve graph computations: running complex path queries in graph databases, evaluating metrics for network science, training graph neural networks for classification, and so on. While graph technology has received significant attention in academia and industry, the performance of graph processing systems is often lacklustre, which hinders their adoption for large-scale problems.

The Linked Data Benchmark Council (LDBC) was founded in 2012 by vendors and academic researchers with the aim of making graph processing performance measurable and comparable. To this end, LDBC provides open-source benchmark suites with openly available data sets starting at 1 GB and scaling up to 30 TB. Additionally, it allows vendors to submit their benchmark implementations to LDBC-certified auditors who ensure that the benchmark executions are reproducible and comply with the specification.

In this talk, we describe three LDBC benchmarks: (1) the Graphalytics benchmark for offline graph analytics, (2) the Social Network Benchmark's Interactive workload for transactional graph database systems, and (3) the Business Intelligence workload for analytical graph data systems. For each benchmark, we explain how it ensures meaningful and interpretable results. Then, we summarize the main features of the benchmark drivers and list the current reference implementations (maintained by vendors and community members). Finally, we highlight recent audited benchmark results.

Information on the talk:

Expected prior knowledge: specialized prior knowledge is not needed
Intended audience: developers and users of graph processing frameworks

Speakers:

Gábor Szárnyas is a post-doctoral researcher at CWI Amsterdam. He is the lead developer of the LDBC Social Network Benchmark's BI workload and the maintainer of the Graphalytics benchmark. He is a member of the LDBC steering committee.
David Püroja is a research software engineer at CWI Amsterdam and the maintainer of the LDBC Social Network Benchmark's Interactive workload. He is a certified LDBC auditor.

Previous talks:

The LDBC Social Network Benchmark: Business Intelligence workload (15th LDBC Technical User Community meeting, 2022)
The Linked Data Benchmark Council: Fostering competition in the graph processing space (World AI Conference, 2022)

Other people: David Püroja

🎤 The LDBC Social Network Benchmark
04 Feb 2023 show details

In this talk, we describe the LDBC Social Network Benchmark's two workloads: the Interactive workload for transactional graph database systems and the Business Intelligence workload for analytical graph data systems.

We describe the key features of both workloads, including their data sets, queries, and update operations. We explain how they ensure meaningful and interpretable results via careful parameter tuning. Finally, we showcase the workloads' reference implementations (maintained by vendors and community members).

Information on the talk:

Expected prior knowledge: specialized prior knowledge is not needed
Intended audience: developers and users of graph database management systems

Speakers:

Gábor Szárnyas is a post-doctoral researcher at CWI Amsterdam. He is the lead developer of the LDBC Social Network Benchmark's BI workload and the maintainer of the Graphalytics benchmark. He is a member of the LDBC steering committee.
David Püroja is a research software engineer at CWI Amsterdam and the maintainer of the LDBC Social Network Benchmark's Interactive workload. He is a certified LDBC auditor.

Previous talks:

The LDBC Social Network Benchmark: Business Intelligence workload (15th LDBC Technical User Community meeting, 2022)
The Linked Data Benchmark Council: Fostering competition in the graph processing space (World AI Conference, 2022)

Other people: David Püroja

👥 FOSDEM 2020 📅 01 Feb 2020

🎤 GraphBLAS: A linear algebraic approach for high-performance graph algorithms
02 Feb 2020 show details

There is increasing interest to apply graph analytical techniques to a wide array of problems, many operating on large-scale graphs with billions of edges. While graph algorithms and their complexity is textbook material, efficient implementation of such algorithms is still a major challenge due to a number of reasons. First, the irregular and unstructured nature of graphs leads to a massive amount of random data access, which makes it difficult to use typical caching and parallelization techniques. Second, to optimize their code, developers need to be aware of the nuances of the underlying hardware which, at the very least consists of multiple CPU cores but often also incorporates heterogeneous components such as GPUs or even FPGAs. During the last decade, a number of graph programming models (such as Google's Pregel) have been proposed but most of these focused defining high-level abstractions for distributed execution environments and introduced a significant runtime overhead.

A potential approach for defining efficient graph processing algorithms is to exploit the well-known duality of graphs and sparse adjacency matrices, using matrix operations to capture algorithms. Surprisingly, only a few recent research prototypes have used this model with little consensus on the set of necessary building blocks. The GraphBLAS initiative (launched in 2013) aims to define a standard to capture graph algorithms in the language of linear algebra - following the footsteps of the BLAS standard which, starting four decades ago, revolutionized scientific computing by defining constructs on dense matrices.

In this talk, I give an overview of the GraphBLAS standard and its key components. First, I illustrate how matrix operations on various semirings correspond to the steps in graph algorithms. I then use these operations to present fundamental graph algorithms such as breadth-first search, shortest paths, and the clustering coefficient. Finally, I demonstrate the scalability of the GraphBLAS-based algorithms with the LDBC Graphalytics benchmark. The presented implementations are available open-source as part of LAGraph, a library built on top of GraphBLAS to demonstrate how to design efficient algorithms in linear algebra.

Intended audience: Developers interested in implementing high-performance graph algorithms.

Expected prior knowledge: Familiarity with linear algebra helps plus but we only use basic concepts such as matrix-matrix multiplication

👥 FOSDEM 2019 📅 02 Feb 2019

🎤 Multiplex graph analysis with GraphBLAS
02 Feb 2019 show details

Introduction

Graph analysis workloads present resource-intensive computations that require a large amount of memory and CPU time. Consequently, there an abundance of graph processing tools which build on distributed data processing frameworks, including Spark GraphX, Flink Gelly and Giraph (which runs on Hadoop). According to a recent survey, most of these systems build on the vertex-centric programming model, originally introduced in Google’s Pregel paper. This model defines graph analytical algorithms in terms of vertices communicating with their neighbours through message passing, which allows both easy parallelization (for the systems) and intuitive formalization of the computation (for developers). While these systems indeed exhibit horizontal scalability, they introduce numerous inefficiencies requiring a large amount of resources even for moderately sized graphs. Most practical applications only use graphs up to a few hundred million vertices and edges, which can now be stored comfortably on a single machine. For such graphs, it is worth investigating techniques that allow their evaluation without the additional cost and complexity of operating a distributed cluster.

GraphBLAS

The GraphBLAS initiative is an effort to design a set of standard building blocks that allow users to formulate graph computations in the language of linear algebra, using operations on sparse adjacency matrices defined on custom semirings. Since its inception, GraphBLAS has been implemented for multiple languages (e.g. C, C++, and Java). Additionally, GraphBLAS is being designed in collaboration with hardware vendors (such as Intel and Nvidia) to define a standardized set of interfaces, which will allow building specialized hardware components for graph processing in the future.

Multiplex graph metrics

Graph analysis has a significant overlap with network science, a field that aims to uncover the hidden structural properties of graphs and determine the interplay between their vertices. Most works in network science only study homogeneous (monoplex) graphs, and do not distinguish between different types of vertices and edges. We believe this abstraction is wasteful for most real-life networks, which are heterogeneous (multiplex) and emerge by different types of interactions. To illustrate such analyses, we calculated multiplex clustering metrics on the Paradise papers data set to find interesting entities that were engaged in disproportionately high levels of activities with their interconnected neighbours. We found that even on this relatively small data set (2M vertices and 3M edges), naive implementations did not terminate in days. Hence, we adapted techniques from GraphBLAS to optimize the computations to finish in a few minutes.

Outline of the talk

This talk gives a brief overview of how linear algebra can be used to define graph computations on monoplex graphs, and how we applied it to speedup the calculation of multiplex graph metrics. We present the lessons learnt while experimenting with sparse matrix libraries in C, Java, and Julia. Our graph analyzer framework is available as open-source.

Intended audience: users interested in applying multiplex graph analytical techniques for their problems and developers who strive to implement high-performing graph analytical computations.

Speaker biography.

Gabor Szarnyas is a researcher working on graph processing techniques. His core research areas are live graph pattern matching, benchmarking graph queries, and analyzing large-scale multiplex networks. His main research project is ingraph, an openCypher-compatible query engine supporting live query evaluation. His research team was the first to publish a formalisation that captures the semantics of a core subset of the openCypher language.

Gabor works at the Budapest University of Technology and Economics, teaching system modelling and database theory. He conducted research visits at the University of York, McGill University and the University of Waterloo. He is a member of the openCypher Implementers Group and the LDBC Social Network Benchmark task force. He received 1st prize in the MODELS 2016 ACM Student Research Competition conference and 2nd prize at the SIGMOD 2018 competition. He is a frequent speaker at industrial conferences (FOSDEM, GraphConnect) and meetups (openCypher meetup NYC, Budapest Neo4j meetup).

👥 FOSDEM 2018 📅 03 Feb 2018

🎤 Graph-based analysis of JavaScript source code repositories
03 Feb 2018 show details

JavaScript is one of the decade’s most trending languages. It ranked #1 in popularity in Stack Overflow questions and is consistently featured in the top 10 languages of the TIOBE Index. Originally intended for client-side scripting, the language is now widely used to build complex desktop applications, write server-side code and program IoT devices. The latest standards of the language are released yearly under the ECMAScript trademark and contain sophisticated features and syntactical constructs.

Static analysis is a software testing approach that is performed without compiling and executing the program itself. This allows developers to catch programming errors before building, testing and deploying the code. There is a wide range of static analysis tools: linters and code style analysers repeatedly perform checks in IDEs, while more complex analyzers, such as type checkers, run as part of the continuous integration (CI) process.

As JavaScript is a dynamic language, static analysis approaches are particularly useful: they can detect erroneous type usages that would not be revealed by building the code, but only occur during thorough testing or even worse, at production. Thanks to the popularity of the language, there are already numerous approaches for static analysis available, such as Tern, Facebook’s Flow and TAJS. However, none of these fitted all of our requirements:

Utilise both linter-style and complex global analysis rules.
Evaluate rules in real-time, i.e. in sub-seconds upon each “save” operation.
Allow users to easily extend the analysis rules.

As none of the current approaches satisfied these requirements, we built our own solution that uses a property graph query engine to represent the code graphs used for analysis and graph queries to evaluate the analysis rules. Compared to other static analysis frameworks, the novelty of our solution is twofold:

We continuously maintain code graphs based on the latest changes on the source code.
We use the declarative openCypher language to define the static analysis and graph maintenance rules.

Using declarative queries, our tool is able to perform the complex analysis queries quickly, including:

Detecting asynchronous method calls with missing await statements (a common issue in the callback hell).
Detecting unreachable code, i.e. code parts that cannot be reached through the control flow.

The analysis can be easily extended by custom analysis rules defined in the openCypher language. Building the system on openCypher also allows us to use different query engines: both mature databases, such as Neo4j, and also experimental engines, such as our own ingraph engine. The latter is our research prototype that supports live query evaluation for Cypher queries, which allows near instant answers even for complex analysis rules.

In this talk, we give an overview of the steps involved in transforming the source code file to a syntax graph and converting it to a call flow graph. We demonstrate how openCypher queries can be used to capture complex analysis rules in a concise way, and how ingraph allows us to continuously evaluate these queries.

Intended audience: Developers of static analysis tools, users looking for a flexible analysis framework

Speaker biography.

Gabor Szarnyas is a researcher working on graph processing techniques. His core research areas are live graph pattern matching, benchmarking graph queries, and analyzing large-scale networks. His main research project is ingraph, an openCypher-compatible query engine supporting live query evaluation. His research team was the first to publish a formalisation that captures the semantics of a core subset of the openCypher language.

Gabor works at the Budapest University of Technology and Economics, teaching system modelling and database theory. He conducted research visits at the University of York, McGill University and the University of Waterloo. He is a member of the openCypher Implementers Group and the LDBC Social Network Benchmark task force. He received 1st prize in the ACM Student Research Competition at the MODELS 2016 conference. He is also a frequent speaker at industrial conferences (FOSDEM, GraphConnect) and meetups (openCypher meetup NYC, Budapest Neo4j meetup).

👥 FOSDEM 2017 📅 04 Feb 2017

🎤 Incremental Graph Queries with openCypher
04 Feb 2017 show details

How can we evaluate a global query on huge graphs in 0.1 seconds? Given our current technology, that would be magic. The lack of wizarding skills did not stop us, however, from tackling the problem by using smart caching structures, which are witchcrafts on their own.

Why is this challenge important? Several applications evaluate global queries on continuously changing graphs: fraud detection in financial transactions, analysis of source code repositories and validating engineering models. Current approaches employ domain-specific optimizations, which are difficult and error-prone to implement. Meanwhile, the requirements of these (and similar) use cases could be uniformly addressed by incremental graph query evaluation. With this technique, the first execution of the queries takes some time, but once the result are calculated, they can be efficiently maintained for each change in the graph.

To allow incremental queries on property graphs, we implemented the ingraph engine, based on the openCypher language specification. We aim to support the standard subset of openCypher, as most standard constructs can be calculated incrementally. We already mapped some of the standard constructs to relational algebra, defined incremental relational algebraic operators and implemented them in an incremental relational engine using Akka actors.

We start the talk by presenting use cases that evaluate complex global queries on continuously changing graphs and discuss the idea of incremental graph queries. We show the mapping of basic openCypher constructs (e.g. MATCH, WHERE, WITH, RETURN) to relational operators, such as joins, selections and projections. Finally, we show our approach for optimizing incremental graph queries and outline related challenges.

Target audience: Developers, looking for a deeper understanding of openCypher and/or facing complex queries on continuously changing graphs