This presentation will show an exploratory data analysis about bicycle-sharing stations in two French cities (Lyon and Bordeaux).
Keywords: Data Science, Prediction, Machine Learning, Python, Open Data, GIS
Thanks to Open Data portals, bicycle-sharing availability data are freely accessible. The main issue linked to these data is to predict bicycle availability for each sharing station.
The talk will follow a classic data workflow:
After a short introduction to Luigi, a data pipeline Python library, the second part will show how to cluster sharing stations starting from their hourly availability profile. The clustering effort will be done with KMeans, one of the most popular unsupervised Machine Learning models. Then, some features engineering methods will be carried out in order to prepare the data for availability prediction. As a consequence a short-term (/e.g./ one hour) bicycle availability prediction will be proposed.
A word will be said about the set of Python libraries used in this project: luigi, pandas, seaborn, scikit-learn, folium or xgboost.
Speakers: Raphaƫl Delhome