🧹 SQL Data Cleaning on Global Layoffs

This project focuses on cleaning a dataset of company layoffs to prepare it for further Exploratory Data Analysis (EDA).
The data cleaning was performed entirely in SQL (MySQL).

📌 Project Overview

The raw layoffs.csv dataset contains inconsistencies such as:

Duplicate rows
Extra spaces in text fields
Inconsistent industry and country naming
Invalid or empty date formats
Null values in certain columns

Through systematic SQL transformations, these issues were resolved and a clean dataset was produced:
👉 cleaned_data/clean_data_layoffs.csv

Dataset

Source: Public dataset of global layoffs (CSV imported into SQL database).
Columns include:
- company – Name of the company
- location – City or region of the company
- industry – Industry sector
- total_laid_off – Number of employees laid off
- percentage_laid_off – Layoff percentage
- date – Date of layoff
- date_added – Date when record was added
- stage – Company funding stage
- country – Country of the company
- funds_raised – Funds raised by the company
- source - information source

⚙️ Steps in Data Cleaning

1. Back up the Original Data

Created a backup table layoffs_duplicate to preserve the original dataset before manipulation.

2. Remove Duplicates

Identified duplicates using ROW_NUMBER() in a CTE partitioned by all key columns.
Removed duplicates by keeping only the first occurrence.

3. Standardize Data

Trimmed whitespace from company names.
Standardized country names (e.g., united arab emirates → UAE).
Converted date columns (date, date_added) to SQL DATE type.
Converted numeric columns:
- total_laid_off → BIGINT
- percentage_laid_off → DECIMAL(5,1)
- funds_raised → BIGINT

4. Handle Null Values

Converted empty strings to NULL.
Populated missing country values using corresponding location.
Checked and handled nulls in industry column.
Deleted rows where both total_laid_off and percentage_laid_off were null.

5. Drop Unnecessary Columns

Dropped temporary columns such as rn used for deduplication.

6. Final Output

Exported the cleaned dataset into cleaned_data/clean_data_layoffs.csv.

📂 Repository Structure

SQL-Data-Cleaning-on-Global-Layoffs-Dataset/
│── README.md
│── Data_Cleaning_Project_using_layoffs_data.sql
│
├── data/
│   ├── layoffs.csv # Raw dataset
│   └── cleaned_data/
│       └── clean_data_layoffs.csv # Final cleaned dataset
│
├── images/
│   ├── before_cleaning.png
│   └── after_cleaning.png

🖼️ Before vs After Cleaning

Before Cleaning	After Cleaning

🚀 How to Use

Clone this repository:

git clone https://github.com/SAHFEERULWASIHF/SQL-Data-Cleaning-on-Global-Layoffs-Dataset.git
cd Data-Cleaning-Layoffs-Project

Load the SQL script:

source Data_Cleaning_Project_using_layoffs_data.sql;

Access the cleaned data from:

data/cleaned_data/clean_data_layoffs.csv

🛠️ Tools & Technologies

SQL (MySQL) – For data cleaning, transformation, and manipulation.
SQL Queries Used:
- CREATE TABLE / INSERT INTO
- ROW_NUMBER() with OVER(PARTITION BY ...) for deduplication
- UPDATE and TRIM() for standardization
- ALTER TABLE for modifying data types
- DELETE to remove incomplete records

Final Outcome

Cleaned dataset: clean_data_layoffs.csv
Row count after cleaning: 3,458 rows
Dataset ready for Exploratory Data Analysis (EDA) and visualization.

📊 Future Work

Perform EDA on the cleaned dataset.

Create visual dashboards (Power BI / Tableau).

Apply predictive analytics on layoff trends.

✨ Author

F SAHFEERUL WASIHF

🎓 Graduate (Fresher)
📊 Interested in Data Analytics, Software Development, and Data Cleaning
🛠️ Skills: SQL, Python, Data Visualization, Data Wrangling
🔗 GitHub | LinkedIn | Portfolio
📧 Contact: wasihfwork@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧹 SQL Data Cleaning on Global Layoffs

📌 Project Overview

Dataset

⚙️ Steps in Data Cleaning

1. Back up the Original Data

2. Remove Duplicates

3. Standardize Data

4. Handle Null Values

5. Drop Unnecessary Columns

6. Final Output

📂 Repository Structure

🖼️ Before vs After Cleaning

🚀 How to Use

🛠️ Tools & Technologies

Final Outcome

📊 Future Work

✨ Author

About

Uh oh!

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
images		images
Data_Cleaning_Project_using_layoffs_data.sql		Data_Cleaning_Project_using_layoffs_data.sql
README.md		README.md

SAHFEERULWASIHF/SQL-Data-Cleaning-on-Global-Layoffs-Dataset

Folders and files

Latest commit

History

Repository files navigation

🧹 SQL Data Cleaning on Global Layoffs

📌 Project Overview

Dataset

⚙️ Steps in Data Cleaning

1. Back up the Original Data

2. Remove Duplicates

3. Standardize Data

4. Handle Null Values

5. Drop Unnecessary Columns

6. Final Output

📂 Repository Structure

🖼️ Before vs After Cleaning

🚀 How to Use

🛠️ Tools & Technologies

Final Outcome

📊 Future Work

✨ Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages