Several sources of data consist of events representing relationships between entities, like user interactions in a social network, clicks on pages linking to each other, purchases of products on web stores, etc. These streams of real-time data can be represented as dynamic graphs, where each event adds or updates an edge in the graph.
Processing dynamic graphs is a challenging task that requires sophisticated state management, snapshotting mechanisms, and incremental graph algorithms. Luckily, several graph computations, like graph statistics, aggregates, and graph sketches, as well as more complex algorithms like connected components and bipartiteness detection, can be computed in a single-pass fashion. Single-pass algorithms process each edge once and do not need to store or access the complete graph state.
In this talk we will give an overview of the Apache Flink streaming API and show how we can leverage it for building graph streaming applications. We will show how to implement single-pass graph aggregations and single-pass graph algorithms, like connected components and bipartiteness detection. Finally, we will give a preview of a work-in-progress graph streaming API, which we are building on top of Flink.