In this talk, Christopher Rost will present "graph stream zoomed", a graph stream grouping algorithm resulting from two master thesis. It enables real-time zooming of a property graph stream, from a schema graph stream to more fine-grained summarizations.
Graphs today are not only very large and heterogeneous, but also change over time. In addition to the typical storage of a graph in a database and subsequent analysis, some use cases require reactivity, and thus recent work has focused on processing the graph in a stream model. Real-time query and analysis of high-frequency incoming graph data, such as social network interactions, click streams, bike rentals, or supply chain product updates, provides continuous insights into to the most recent portion of the graph.
Graph summarization (or grouping) as a typical analysis has been well explored for static graphs, but there are just a few new approaches using of a graph stream, especially when the stream is modelled after the property graph model and thus has labels and properties on the streamed vertices and edges. In this talk, we present a graph stream grouping algorithm and a distributed reference implementation based on Apache Flink. It enables window-based summarization of the property graph stream by grouping vertices and edges on equal characteristics using so-called key functions. This enables various zoom granularities, from a schema graph stream to more fine-grained summarizations. In addition, elements leading to a group can be aggregated in flexible ways to provide deeper insights in the respective summarized vertices and edges. A bicycle rental graph stream is used to demonstrate the power of the algorithm and the multitude of analytical questions that can be answered with it.