This talk will discuss the basics, the challenges, and the possibilities of graph construction.
Often, the most painful and time-consuming part of graph processing at the massive scale is constructing the graph from raw data.
Technologies such as the many Hadoop/Python make it possible to create ad-hoc solutions. But even then, the task requires familiarity with writing and debugging Map-Reduce code, and graph construction can involve consistency checks and cleanup tasks that can prove tricky even for experienced data scientists.
The Intel Graph Analytics operation has been using the GraphBuilder library of tools in multiple projects analyzing large scale graph data, and recently an alpha version of the toolkit has had an open source release in Pig. GraphBuilder occupies the boundary between non-graph and graph processing: Raw data comes in from text files, big tables, etc, non-graph transformations are performed in Pig, then rules are specified to construct the graph from raw data, and finally the graph is stored in a graph database, such as Titan, or in a standard graph format, such as RDF.
Where'd you get that big old graph?
Speakers: Nathan Segerlind