This project focuses on cleaning a dataset of company layoffs to prepare it for further Exploratory Data Analysis (EDA).
The data cleaning was performed entirely in SQL (MySQL).
The raw layoffs.csv dataset contains inconsistencies such as:
- Duplicate rows
- Extra spaces in text fields
- Inconsistent industry and country naming
- Invalid or empty date formats
- Null values in certain columns
Through systematic SQL transformations, these issues were resolved and a clean dataset was produced:
👉 cleaned_data/clean_data_layoffs.csv
- Source: Public dataset of global layoffs (CSV imported into SQL database).
- Columns include:
company– Name of the companylocation– City or region of the companyindustry– Industry sectortotal_laid_off– Number of employees laid offpercentage_laid_off– Layoff percentagedate– Date of layoffdate_added– Date when record was addedstage– Company funding stagecountry– Country of the companyfunds_raised– Funds raised by the companysource- information source
- Created a backup table
layoffs_duplicateto preserve the original dataset before manipulation.
- Identified duplicates using
ROW_NUMBER()in a CTE partitioned by all key columns. - Removed duplicates by keeping only the first occurrence.
- Trimmed whitespace from
companynames. - Standardized country names (e.g.,
united arab emirates→UAE). - Converted date columns (
date,date_added) to SQLDATEtype. - Converted numeric columns:
total_laid_off→BIGINTpercentage_laid_off→DECIMAL(5,1)funds_raised→BIGINT
- Converted empty strings to
NULL. - Populated missing
countryvalues using correspondinglocation. - Checked and handled nulls in
industrycolumn. - Deleted rows where both
total_laid_offandpercentage_laid_offwere null.
- Dropped temporary columns such as
rnused for deduplication.
- Exported the cleaned dataset into
cleaned_data/clean_data_layoffs.csv.
SQL-Data-Cleaning-on-Global-Layoffs-Dataset/
│── README.md
│── Data_Cleaning_Project_using_layoffs_data.sql
│
├── data/
│ ├── layoffs.csv # Raw dataset
│ └── cleaned_data/
│ └── clean_data_layoffs.csv # Final cleaned dataset
│
├── images/
│ ├── before_cleaning.png
│ └── after_cleaning.png
| Before Cleaning | After Cleaning |
|---|---|
![]() |
![]() |
- Clone this repository:
git clone https://github.com/SAHFEERULWASIHF/SQL-Data-Cleaning-on-Global-Layoffs-Dataset.git cd Data-Cleaning-Layoffs-Project - Load the SQL script:
source Data_Cleaning_Project_using_layoffs_data.sql;
- Access the cleaned data from:
data/cleaned_data/clean_data_layoffs.csv
- SQL (MySQL) – For data cleaning, transformation, and manipulation.
- SQL Queries Used:
CREATE TABLE/INSERT INTOROW_NUMBER()withOVER(PARTITION BY ...)for deduplicationUPDATEandTRIM()for standardizationALTER TABLEfor modifying data typesDELETEto remove incomplete records
- Cleaned dataset:
clean_data_layoffs.csv - Row count after cleaning: 3,458 rows
- Dataset ready for Exploratory Data Analysis (EDA) and visualization.
Perform EDA on the cleaned dataset.
Create visual dashboards (Power BI / Tableau).
Apply predictive analytics on layoff trends.
F SAHFEERUL WASIHF
- 🎓 Graduate (Fresher)
- 📊 Interested in Data Analytics, Software Development, and Data Cleaning
- 🛠️ Skills: SQL, Python, Data Visualization, Data Wrangling
- 🔗 GitHub | LinkedIn | Portfolio
- 📧 Contact: wasihfwork@gmail.com

