Code is an incredible source of information.
Indeed, mining software repositories can tell us whether code is natural, how to use a new framework, or how to identify similar changes.
In this talk, I will present state-of-the-art and limitations on the usage of graph-based algorithms to mine software repository.
In particular, I will cover how program analysis, pattern mining and pattern matching can help developers to identify:
- conventions and idioms, using syntactic information commonly represented as Abstract Syntax Trees;
- framework usages, relying on semantic information such as control and data dependencies represented as Program Dependency Graphs;
- similar modifications in multiple different locations, computed using change loggers or change distillers.
Last I will present INTiMALS, an ongoing industry-university collaboration to develop a language-parametric framework for mining this information in legacy systems.