Mining frequent itemsets is an established approach to data mining and supported by productive data mining solutions. For example, one can get insights about buyers’ behavior by analyzing frequent co-occurrences of products in shopping baskets. In contrast, frequent subgraph mining (FSM), the graphy variant of frequent itemset mining, not only evaluates entity co-occurrence but also relationships among entities, i.e., structural patterns. However, existing implementations are all research prototypes which are tailored to textbook problems.
In our talk, we want to give an introduction to the FSM problem on distributed collections of graphs and our implementation in Gradoop, an open source system for scalable graph analytics based on Apache Flink. In contrast to other iterative graph algorithms like page rank, in FSM the search space is dropped but intermediate results of iterations are the desired result. Here, the major technical challenge is the respective usage of Flinks’ distributed iterations.
We will explain different implementation approaches, discuss implementation details which influence scalability and show benchmark results.
Intended audience and goal of the talk: Developers and analysts, interested in relationship-centric data mining techniques
Desired length of your time slot: 30min
Links to background information on the given talk for the hungry and impatient: http://www.gradoop.com http://flink.apache.org/
Links to your previous talks, code snippets or repositories: http://dbs.uni-leipzig.de/file/GraphMiningforComplexDataAnalytics.pdf GitHub: http://www.gradoop.com/ Graph Data Model: http://dbs.uni-leipzig.de/file/EPGM.pdf Fosdem 2016: https://fosdem.org/2016/schedule/event/graphprocessinggradoopflink_analytics
Speakers: André Petermann