This project explores the chemical properties of red wines and their impact on wine quality. Using machine learning models, we aim to predict wine quality based on chemical features, understand feature importance, and evaluate model performance.
- Source: Wine Quality Dataset (Red Wine)
- Samples: 1599
- Features: 11 chemical properties + target
quality - Goal: Predict the wine quality (score 0β10)
- Notes: All features are numeric, no missing values. Some features are skewed and require preprocessing.
- Histograms & Boxplots: Check distributions and detect outliers.
- Log Transformation: Applied to
residual sugar,free sulfur dioxide,total sulfur dioxide. - Feature Engineering:
sulfur_ratio = free sulfur dioxide / total sulfur dioxidealcohol_acidity = alcohol / fixed acidity
- Scaling: RobustScaler applied to handle outliers.
| Model | Description |
|---|---|
| Linear Regression | Baseline regression model |
| Random Forest Regressor | Ensemble model, handles non-linear relationships |
| Metric | Ideal |
|---|---|
| RMSE | Lower is better |
| RΒ² | Higher is better |
| MAE | Lower is better |
Highlights:
- Random Forest slightly outperformed Linear Regression.
- Cross-validation confirmed stable generalization.
- Most important features:
alcoholandsulphates.
- Actual vs Predicted: Scatter plots
- Residuals: Residual plots & histograms
- Decision Trees: Visualize shallow trees from Random Forest
- Prediction Variance: Histogram across ensemble
- Feature Importance: Horizontal bar chart
These visualizations help interpret model behavior and assess performance.
alcoholpositively correlates with wine quality.volatile aciditynegatively correlates with quality.- Engineered features (
sulfur_ratio) improve predictive power. - Some features show weak correlation, contributing less predictive value.
- Test advanced models: Gradient Boosting, XGBoost
- Further hyperparameter tuning
- Integrate additional datasets for more robust predictions
- Deploy as interactive web app for wine quality estimation
- Clone the repository:
git clone https://github.com/USERNAME/REPO_NAME.git2.Install dependencies:
pip install -r requirements.txt- Open the Jupyter Notebook and run all cells.
This project is licensed under the MIT License β free to use and modify.