Cleaning and Preprocessing FIFA 2021 Player Data: A Data Analyst’s Journey
As a data analyst, one of the most exciting and fulfilling aspects of my job is taking raw, messy data and transforming it into a clean, organized dataset that is ready for analysis and visualization. In this project, I cleaned the dataset with player information from FIFA 2021, a widely popular soccer video game. The dataset contained a plethora of columns, each representing a unique attribute of the players, ranging from their basic information to their in-game statistics. The goal of this project was to clean and preprocess this dataset, making it more usable and insightful for future analysis.
The Dataset
The FIFA 2021 player dataset was rich with potential insights, but before diving into analysis, it was crucial to ensure the data was clean and well-structured. The dataset had a total of 77 columns, each representing a different attribute of the players. Some of the key columns included ‘ID’, ‘Name’, ‘Nationality’, ‘Age’, ‘Club’, ‘Value’, ‘Wage’, and various gameplay statistics such as ‘Attacking’, ‘Defending’, ‘Reactions’, and more.
Data Cleaning Steps
Handling Missing Values
The first step was to identify and handle missing values. I used pandas to read the CSV file into a DataFrame and checked for any columns with missing values. Fortunately, there were no missing values in the dataset, which was a great starting point.
Dropping Unnecessary Columns
To simplify the dataset and remove irrelevant information, I dropped two columns, namely ‘photoUrl’ and ‘playerUrl’. These columns contained URLs to player photos and profiles, which were not needed for this analysis.
Renaming Columns
I noticed that one of the columns had an unusual character ‘↓’ in its name, so I renamed it to ‘OVA’ for clarity. This column represented the Overall Attribute (OVA) of the players in the game.
Handling Units and Conversions
Some columns, such as ‘Height’, ‘Weight’, ‘Value’, and ‘Wage’, were in different units and formats. I wrote functions to convert these values to a consistent format. For example, I converted heights to centimeters, weights to kilograms, and monetary values to euros.
Cleaning Categorical Columns
Columns like ‘W/F’ (Weak Foot), ‘SM’ (Skill Moves), ‘A/W’ (Attack/Work Rate), and ‘IR’ (International Reputation) contained special characters (‘★’) and text values. I removed the special characters, converted the values to numeric format, and renamed the columns for clarity.
Cleaning and Converting Numeric Columns
I also performed some basic data cleaning on numeric columns to remove any extraneous characters. Additionally, I handled the ‘Hits’ column, which contained numeric values with ‘K’ representing thousands. I converted these values to pure integers for consistency.
Final Touches
After applying all the necessary transformations and cleaning steps, I dropped a few more irrelevant columns that were not needed for the analysis.
Conclusion
In this data analyst project, I successfully cleaned and preprocessed the FIFA 2021 player dataset, transforming it from a raw, messy state into a clean and structured format that is now ready for in-depth analysis and visualization. By handling missing values, converting units, and cleaning categorical and numeric columns, I ensured that the dataset is both accurate and meaningful. This project showcased my skills in data cleaning, manipulation, and transformation using Python and pandas, and I am excited to use this clean dataset for future analyses and visualizations.
Thank you for joining me on this journey of data cleaning and preprocessing in the world of FIFA 2021! If you’re interested in exploring the cleaned dataset or learning more about the specific code used in this project, feel free to check out my github for the full code and resources.
Thanks for Reading!
If you enjoyed this, follow me to never miss another article !
If you have any question feel free to ask!