This Data Science project consists of an Exploratory Data Analysis of football events dataset and other football related datasets to analyze the footballing career of Lionel Messi at FC Barcelona based on various crucial statistics. The project also involves a logistic regression model to predict expected goals for players. Finally, the project also aims to explore future prospects for Lionel Messi.
It has recently become quite clear that Leo is dissatisfied at FCB and has also expressed a desire to transfer to another club. With this news in mind, I deceided to carry out a data science project to analyse and take a look at La Pulga's FC Barcelona career through numbers and statistics. Through this Data Science project, I will be analysing a number of football datasets available to get a deeper insight into Lionel Messi's footballing career at FC Barcelona.
The datasets used in this project are described below:
- Football Events dataset (events.csv): The dataset provides a granular view of 9,074 games, totaling 941,009 events from the biggest 5 European football leagues: Bundesliga (Germany), La Liga (Spain), Ligue 1 (France), Premier League (England) and Serie A (Italy) from the 2011/2012 season to the 2016/2017 season. This dataset can be found here.
- General Information dataset (ginf.csv): The ginf dataset contains general information and metadata about each game in the events dataset. I have used this dataset along with the events dataset loaded above to get more information to describe each game in the football events dataset.
- Bundesliga Points dataset (bundesliga_points.csv): This dataset contains information about points for each team in Bundesliga from 2004-2018.
- La Liga Points dataset (laliga_points.csv): This dataset contains information about points for each team in La Liga from 2004-2018.
- Ligue 1 Points dataset (ligue1_points.csv): This dataset contains information about points for each team in Ligue 1 from 2004-2018.
- Premier League Points dataset (premierleague_points.csv): This dataset contains information about points for each team in Premier League from 2004-2018.
- Serie A Points dataset (seriea_points.csv): This dataset contains information about points for each team in Serie A from 2004-2018.
For Exploratoty Data Analysis, I have computed various crucial statistics for footballers, especially strokers and forwards, such as average Goals per Game (GPG), penalty efficiency, assists and partnerships etc. Based on these statistics, we will be able to look at Lionel Messi's career at FC Barcelona in numbers.
Further, I have utilised a Logistic Regression model in order to predict the probability of an attempt at goal (describesd in the football events dataset) ending up in the back of the net. Summing up the probability values for each attempt will give us the value for expected goals for players. Using an expected goals model, we can see if a player is performed better than or worse than expected in terms of goal-scoring.
Finally, I have given my own verdict of which club would be the next best destination for Lionel Messi in case of a departure from FC Barcelona. To do this, I have analyzed each of Europe's top 5 leagues (Germany, Spain, France, England and Italy) and compared it with LA Liga, where Lionel Messi has played during all of his career. By comparing and looking at the various stats computed during this project, I have described in which club would Leo fit best and continue his career.
Kaggle Notebook: https://www.kaggle.com/prathamsharma123/lionel-messi-career-analysis
Kaggle Dataset: https://www.kaggle.com/prathamsharma123/comprehensive-football-dataset
Pratham Sharma
Student at Vellore Institute of Technology, Vellore, Tamil Nadu, India
Reach out to me: [email protected]
LinkedIn profile: https://www.linkedin.com/in/prathamSharma25/
Kaggle profile: https://www.kaggle.com/prathamsharma123