Software development projects, in particular Open Source ones, heavily rely on the use of telematic tools to support, coordinate and promote development activities. Despite their paramount value, project data is scattered on the Internet, making them difficult to retrieve, collect, clean, link and analyze, challenging the achievement of insightful analytics for both practitioners and researchers. This talk presents Perceval, a tool able to perform automatic and incremental data gathering from almost any tool related with contributing to Open Source development (e.g., source code management, issue tracking systems, mailing lists, forums). It hides the technical complexities related to data acquisition and eases the definition of analytics. Perceval is an industry strong free software tool that has been widely used in Bitergia, a company devoted to offer software analytics of open source software projects.
The arise of the Internet has radically changed how software is being developed. Over the years, platforms like GitHub, StackOverflow and Slack have became important tools to support, coordinate and promote the daily activities around software. This is specially true for Open Source projects, which rely heavily on distributed and collaborative development.
Beyond being successfully and increasingly adopted by both end-users and development teams, these telematic tools offer relevant data sources, which can be exploited by practitioners and researchers to describe, predict, and improve specific aspects of software projects.
However, accessing and gathering this data is often a time-consuming and an error-prone task, that entails many considerations and expertise. It may require to understand how to obtain an OAuth token (e.g., StackExchange, GitHub) or prepare storage to download the data (e.g., Git repositories, mailing list archives); when dealing with development support tools that expose their data via APIs, special attention has to be paid to the terms of service (e.g., an excessive number of requests could lead to temporary or permanent bans); recovery solutions to tackle connection problems when fetching remote data should also taken into account; storing the data already received and retrying failed API calls may speed up the overall gathering process and reduce the risk of corrupted data. Nonetheless, even if these problems are known, many practitioners tend to re-invent the wheel by retrieving the data themselves with ad-hoc scripts.
This talk introduces Perceval, a tool that simplifies the collection of project data by covering more than 20 popular tools and platforms related to contributing to Open Source development, thus enabling the definition of software analytics. Perceval is an industry-strength tool, that (i) allows to retrieve data from multiple sources in an easy and consistent way, (ii) offers the results in a the flexible JSON format, and (iii) gives the possibility to connect the results with analysis and/or visualization tools. Furthermore, it is easy to extend, allows cross-cutting analysis and provides incremental support (useful when analyzing large software projects).