|
| 1 | +--- |
| 2 | +title: The Confusion Matrix |
| 3 | +sidebar_label: Confusion Matrix |
| 4 | +description: "The foundation of classification evaluation: True Positives, False Positives, True Negatives, and False Negatives." |
| 5 | +tags: [machine-learning, model-evaluation, metrics, classification, confusion-matrix] |
| 6 | +--- |
| 7 | + |
| 8 | +A **Confusion Matrix** is a table used to describe the performance of a classification model. While "Accuracy" tells you how often the model is correct, the Confusion Matrix tells you exactly **how** it is failing and which classes are being swapped. |
| 9 | + |
| 10 | +## 1. The 2x2 Layout |
| 11 | + |
| 12 | +For a binary classification (Yes/No, Spam/Ham), the matrix consists of four quadrants: |
| 13 | + |
| 14 | +| | Predicted: **Negative** | Predicted: **Positive** | |
| 15 | +| :--- | :--- | :--- | |
| 16 | +| **Actual: Negative** | **True Negative (TN)** | **False Positive (FP)** | |
| 17 | +| **Actual: Positive** | **False Negative (FN)** | **True Positive (TP)** | |
| 18 | + |
| 19 | +### Breaking Down the Quadrants: |
| 20 | +* **True Positive (TP):** You predicted positive, and it was true. (e.g., You predicted a patient has cancer, and they do). |
| 21 | +* **True Negative (TN):** You predicted negative, and it was true. (e.g., You predicted a patient is healthy, and they are). |
| 22 | +* **False Positive (FP):** You predicted positive, but it was false. (Also known as a **Type I Error** or a "False Alarm"). |
| 23 | +* **False Negative (FN):** You predicted negative, but it was positive. (Also known as a **Type II Error** or a "Miss"). |
| 24 | + |
| 25 | +## 2. Type I vs. Type II Errors |
| 26 | + |
| 27 | +The "cost" of these errors depends entirely on your specific problem. |
| 28 | + |
| 29 | +```mermaid |
| 30 | +graph TB |
| 31 | + TITLE["$$\text{Type I vs. Type II Errors}$$"] |
| 32 | +
|
| 33 | + %% Ground Truth |
| 34 | + TITLE --> TRUTH["$$\text{Actual Condition}$$"] |
| 35 | + TRUTH --> POS["$$\text{Positive (Condition Present)}$$"] |
| 36 | + TRUTH --> NEG["$$\text{Negative (Condition Absent)}$$"] |
| 37 | +
|
| 38 | + %% Model Decisions |
| 39 | + POS --> TP["$$\text{True Positive}$$"] |
| 40 | + POS --> FN["$$\text{Type II Error}$$<br/>$$\text{False Negative}$$"] |
| 41 | +
|
| 42 | + NEG --> TN["$$\text{True Negative}$$"] |
| 43 | + NEG --> FP["$$\text{Type I Error}$$<br/>$$\text{False Positive}$$"] |
| 44 | +
|
| 45 | + %% Costs |
| 46 | + FP --> COST1["$$\text{Cost Depends on Context}$$"] |
| 47 | + FN --> COST2["$$\text{Cost Depends on Context}$$"] |
| 48 | +
|
| 49 | + %% Examples |
| 50 | + COST1 --> EX1["$$\text{Example: Spam Filter}$$<br/>$$\text{Important Email Blocked}$$"] |
| 51 | + COST2 --> EX2["$$\text{Example: Medical Test}$$<br/>$$\text{Disease Missed}$$"] |
| 52 | +
|
| 53 | + %% Emphasis |
| 54 | + EX1 -.->|"$$\text{Type I Cost High}$$"| FP |
| 55 | + EX2 -.->|"$$\text{Type II Cost High}$$"| FN |
| 56 | +
|
| 57 | +``` |
| 58 | + |
| 59 | +* **In Cancer Detection:** A **Type II Error (FN)** is much worse because a sick patient goes untreated. |
| 60 | +* **In Spam Filtering:** A **Type I Error (FP)** is worse because an important work email is hidden in the trash. |
| 61 | + |
| 62 | +## 3. Implementation with Scikit-Learn |
| 63 | + |
| 64 | +```python |
| 65 | +from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay |
| 66 | +import matplotlib.pyplot as plt |
| 67 | + |
| 68 | +# Actual values and Model predictions |
| 69 | +y_true = [0, 1, 0, 1, 0, 1, 1, 0] |
| 70 | +y_pred = [0, 1, 1, 1, 0, 0, 1, 0] |
| 71 | + |
| 72 | +# 1. Generate the matrix |
| 73 | +cm = confusion_matrix(y_true, y_pred) |
| 74 | + |
| 75 | +# 2. Visualize it |
| 76 | +disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=['Negative', 'Positive']) |
| 77 | +disp.plot(cmap=plt.cm.Blues) |
| 78 | +plt.show() |
| 79 | + |
| 80 | +``` |
| 81 | + |
| 82 | +## 4. Multi-Class Confusion Matrices |
| 83 | + |
| 84 | +The matrix isn't just for binary problems. If you are classifying "Cat," "Dog," and "Bird," your matrix will be 3x3. The diagonal line from top-left to bottom-right represents correct predictions. Any numbers off that diagonal show you which animals the model is confusing. |
| 85 | + |
| 86 | +```mermaid |
| 87 | +graph TB |
| 88 | + TITLE["$$\text{Multi-Class Confusion Matrix (3×3)}$$"] |
| 89 | +
|
| 90 | + %% Axes |
| 91 | + TITLE --> ACT["$$\text{Actual Class}$$"] |
| 92 | + TITLE --> PRED["$$\text{Predicted Class}$$"] |
| 93 | +
|
| 94 | + ACT --> CAT_A["$$\text{Cat}$$"] |
| 95 | + ACT --> DOG_A["$$\text{Dog}$$"] |
| 96 | + ACT --> BIRD_A["$$\text{Bird}$$"] |
| 97 | +
|
| 98 | + PRED --> CAT_P["$$\text{Cat}$$"] |
| 99 | + PRED --> DOG_P["$$\text{Dog}$$"] |
| 100 | + PRED --> BIRD_P["$$\text{Bird}$$"] |
| 101 | +
|
| 102 | + %% Diagonal (Correct Predictions) |
| 103 | + CAT_A --> CAT_P["$$\text{Cat → Cat}$$<br/>$$\text{Correct}$$"] |
| 104 | + DOG_A --> DOG_P["$$\text{Dog → Dog}$$<br/>$$\text{Correct}$$"] |
| 105 | + BIRD_A --> BIRD_P["$$\text{Bird → Bird}$$<br/>$$\text{Correct}$$"] |
| 106 | +
|
| 107 | + %% Off-Diagonal (Confusions) |
| 108 | + CAT_A --> DOG_P["$$\text{Cat → Dog}$$<br/>$$\text{Confusion}$$"] |
| 109 | + CAT_A --> BIRD_P["$$\text{Cat → Bird}$$<br/>$$\text{Confusion}$$"] |
| 110 | +
|
| 111 | + DOG_A --> CAT_P["$$\text{Dog → Cat}$$<br/>$$\text{Confusion}$$"] |
| 112 | + DOG_A --> BIRD_P["$$\text{Dog → Bird}$$<br/>$$\text{Confusion}$$"] |
| 113 | +
|
| 114 | + BIRD_A --> CAT_P["$$\text{Bird → Cat}$$<br/>$$\text{Confusion}$$"] |
| 115 | + BIRD_A --> DOG_P["$$\text{Bird → Dog}$$<br/>$$\text{Confusion}$$"] |
| 116 | +
|
| 117 | + %% Emphasis |
| 118 | + CAT_P -.->|"$$\text{Diagonal}$$"| GOOD["$$\text{Correct Predictions}$$"] |
| 119 | + DOG_P -.->|"$$\text{Diagonal}$$"| GOOD |
| 120 | + BIRD_P -.->|"$$\text{Diagonal}$$"| GOOD |
| 121 | +
|
| 122 | + DOG_P -.->|"$$\text{Off-Diagonal}$$"| BAD["$$\text{Model Confusion}$$"] |
| 123 | + BIRD_P -.->|"$$\text{Off-Diagonal}$$"| BAD |
| 124 | +
|
| 125 | +``` |
| 126 | + |
| 127 | +## 5. Summary: What can we calculate from here? |
| 128 | + |
| 129 | +The Confusion Matrix is the "mother" of all classification metrics. From these four numbers, we derive: |
| 130 | + |
| 131 | +* **Accuracy:** |
| 132 | +* **Precision:** |
| 133 | +* **Recall:** |
| 134 | +* **F1-Score:** The balance between Precision and Recall. |
| 135 | + |
| 136 | +## References |
| 137 | + |
| 138 | +* **StatQuest:** [Confusion Matrices Explained](https://www.youtube.com/watch?v=Kdsp6soqA7o) |
| 139 | +* **Scikit-Learn:** [Confusion Matrix API](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html) |
| 140 | + |
| 141 | +--- |
| 142 | + |
| 143 | +**Now that you can see where the model is making mistakes, let's learn how to turn those mistakes into a single score.** |
0 commit comments