Skip to content

Predicting red wine quality using chemical features and machine learning models with evaluation and visualization.

Notifications You must be signed in to change notification settings

tdevelope/wine-quality

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🍷 Wine Quality Analysis (Red Wine)

Python Jupyter Notebook License: MIT


πŸ” Project Overview

This project explores the chemical properties of red wines and their impact on wine quality. Using machine learning models, we aim to predict wine quality based on chemical features, understand feature importance, and evaluate model performance.


πŸ“Š Dataset

  • Source: Wine Quality Dataset (Red Wine)
  • Samples: 1599
  • Features: 11 chemical properties + target quality
  • Goal: Predict the wine quality (score 0–10)
  • Notes: All features are numeric, no missing values. Some features are skewed and require preprocessing.

πŸ§ͺ Data Exploration & Preprocessing

  • Histograms & Boxplots: Check distributions and detect outliers.
  • Log Transformation: Applied to residual sugar, free sulfur dioxide, total sulfur dioxide.
  • Feature Engineering:
    • sulfur_ratio = free sulfur dioxide / total sulfur dioxide
    • alcohol_acidity = alcohol / fixed acidity
  • Scaling: RobustScaler applied to handle outliers.

πŸ€– Model Training & Evaluation

Models Used:

Model Description
Linear Regression Baseline regression model
Random Forest Regressor Ensemble model, handles non-linear relationships

Metrics:

Metric Ideal
RMSE Lower is better
RΒ² Higher is better
MAE Lower is better

Highlights:

  • Random Forest slightly outperformed Linear Regression.
  • Cross-validation confirmed stable generalization.
  • Most important features: alcohol and sulphates.

πŸ“ˆ Visualizations

  • Actual vs Predicted: Scatter plots
  • Residuals: Residual plots & histograms
  • Decision Trees: Visualize shallow trees from Random Forest
  • Prediction Variance: Histogram across ensemble
  • Feature Importance: Horizontal bar chart

These visualizations help interpret model behavior and assess performance.


πŸ”‘ Key Insights

  • alcohol positively correlates with wine quality.
  • volatile acidity negatively correlates with quality.
  • Engineered features (sulfur_ratio) improve predictive power.
  • Some features show weak correlation, contributing less predictive value.

πŸš€ Future Work

  • Test advanced models: Gradient Boosting, XGBoost
  • Further hyperparameter tuning
  • Integrate additional datasets for more robust predictions
  • Deploy as interactive web app for wine quality estimation

⚑ How to Run

  1. Clone the repository:
git clone https://github.com/USERNAME/REPO_NAME.git

2.Install dependencies:

pip install -r requirements.txt
  1. Open the Jupyter Notebook and run all cells.

πŸ“ License

This project is licensed under the MIT License – free to use and modify.

About

Predicting red wine quality using chemical features and machine learning models with evaluation and visualization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published