Skip to content

Conversation

@aymuos15
Copy link

@aymuos15 aymuos15 commented Jan 7, 2026

Fixes #5702

Description

This PR adds two features requested in issue #5702:

  1. 3D Support: PanopticQualityMetric now accepts both 4D tensors (B2HW for 2D data) and 5D tensors (B2HWD for 3D data). Previously, only 2D inputs were supported.

  2. Confusion Matrix Output: Added return_confusion_matrix parameter to PanopticQualityMetric. When set to True, the aggregate() method returns raw confusion matrix values (tp, fp, fn, iou_sum) instead of computed metrics, enabling custom metric calculations.

  3. Helper Function: Added compute_mean_iou() function to compute mean IoU from confusion matrix values.

Note: While panoptica exists as a standalone library, I feel this would still be a nice addition to MONAI.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 7, 2026

📝 Walkthrough

Walkthrough

Extends PanopticQualityMetric to accept 3D volumetric inputs (B, 2, H, W, D / B, 2, H, W) and adds a constructor flag return_confusion_matrix: bool to optionally return raw confusion-matrix outputs (tp, fp, fn, iou_sum). Adds compute_mean_iou(confusion_matrix, smooth_numerator=1e-6) to compute mean IoU from confusion matrices. Updates dimension validation and docstrings accordingly. Tests expanded with 3D cases covering input acceptance, confusion-matrix returns, mean-IoU computation, metric-name filtering, and invalid-shape error handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately and concisely describes the main changes: 3D support and confusion matrix output.
Description check ✅ Passed Description covers all key features, links to issue #5702, and checks appropriate boxes for non-breaking changes, new tests, and docstring updates.
Linked Issues check ✅ Passed PR fully addresses #5702 requirements: adds 3D support for 5D tensors (B2HWD) [#5702], implements return_confusion_matrix parameter [#5702], and adds compute_mean_iou helper function [#5702].
Out of Scope Changes check ✅ Passed All changes directly address #5702 objectives: 3D tensor support, confusion matrix output, and mean IoU computation. No extraneous modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
monai/metrics/panoptic_quality.py (1)

24-24: Consider sorting __all__ alphabetically.

Static analysis suggests alphabetical ordering for consistency.

♻️ Proposed fix
-__all__ = ["PanopticQualityMetric", "compute_panoptic_quality", "compute_mean_iou"]
+__all__ = ["PanopticQualityMetric", "compute_mean_iou", "compute_panoptic_quality"]
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 57fdd59 and 23d33cf.

📒 Files selected for processing (2)
  • monai/metrics/panoptic_quality.py
  • tests/metrics/test_compute_panoptic_quality.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • monai/metrics/panoptic_quality.py
  • tests/metrics/test_compute_panoptic_quality.py
🧬 Code graph analysis (2)
monai/metrics/panoptic_quality.py (1)
monai/metrics/confusion_matrix.py (1)
  • _compute_tensor (80-99)
tests/metrics/test_compute_panoptic_quality.py (1)
monai/metrics/panoptic_quality.py (3)
  • PanopticQualityMetric (27-168)
  • aggregate (132-168)
  • compute_mean_iou (315-333)
🪛 Ruff (0.14.10)
monai/metrics/panoptic_quality.py

24-24: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)


107-109: Avoid specifying long messages outside the exception class

(TRY003)


148-148: Prefer TypeError exception for invalid type

(TRY004)


148-148: Avoid specifying long messages outside the exception class

(TRY003)


328-331: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: flake8-py3 (mypy)
  • GitHub Check: packaging
  • GitHub Check: quick-py3 (macOS-latest)
  • GitHub Check: min-dep-py3 (3.9)
  • GitHub Check: flake8-py3 (codeformat)
  • GitHub Check: flake8-py3 (pytype)
  • GitHub Check: quick-py3 (windows-latest)
  • GitHub Check: quick-py3 (ubuntu-latest)
  • GitHub Check: build-docs
  • GitHub Check: min-dep-pytorch (2.8.0)
  • GitHub Check: min-dep-pytorch (2.7.1)
  • GitHub Check: min-dep-pytorch (2.5.1)
  • GitHub Check: min-dep-py3 (3.11)
  • GitHub Check: min-dep-pytorch (2.6.0)
  • GitHub Check: min-dep-os (ubuntu-latest)
  • GitHub Check: min-dep-py3 (3.10)
  • GitHub Check: min-dep-os (windows-latest)
  • GitHub Check: min-dep-os (macOS-latest)
  • GitHub Check: min-dep-py3 (3.12)
🔇 Additional comments (12)
monai/metrics/panoptic_quality.py (5)

58-60: LGTM!

The return_confusion_matrix parameter is well-documented and maintains backward compatibility with its default value.

Also applies to: 70-70, 78-78


83-94: LGTM!

Docstrings clearly document 2D and 3D input formats.


106-109: LGTM!

Validation correctly accepts both 4D (2D images) and 5D (3D volumes) tensors. Error message is clear and helpful.


141-156: LGTM!

Early return pattern cleanly handles confusion matrix output. Logic and documentation are correct.


315-333: LGTM!

Function correctly computes mean IoU from confusion matrix. Formula matches Segmentation Quality calculation (line 164), which is appropriate. Docstring is complete and validation is robust.

tests/metrics/test_compute_panoptic_quality.py (7)

92-120: LGTM!

3D test data is properly shaped (B=1, C=2, H=2, W=2, D=2) and test cases are well-defined.


142-152: LGTM!

Test validates 3D input acceptance and correct output shape. Good coverage of the new feature.


154-170: LGTM!

Comprehensive test of confusion matrix return. Validates both shape and value constraints (non-negativity).


172-184: LGTM!

Test validates compute_mean_iou helper with appropriate shape and value checks.


186-203: LGTM!

Test confirms metric filtering works correctly and different metrics produce distinct outputs.


205-218: LGTM!

Test validates proper rejection of invalid tensor dimensions (3D and 6D). Good edge case coverage.


220-232: LGTM!

Test validates proper error handling for invalid confusion matrix shapes. Good coverage of error cases.

@aymuos15 aymuos15 force-pushed the 5702-panoptic-quality-3d-and-confusion-matrix branch from 23d33cf to a3cca00 Compare January 12, 2026 15:09
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
tests/metrics/test_compute_panoptic_quality.py (2)

92-100: Inline comments placement is confusing.

The # instance channel and # class channel comments appear after nested sublists, making it unclear which dimension they refer to. Consider restructuring or clarifying.

Suggested clarification
 sample_3d_pred = torch.as_tensor(
-    [[[[[2, 0], [1, 1]], [[0, 1], [2, 1]]], [[[0, 1], [3, 0]], [[1, 0], [1, 1]]]]],  # instance channel  # class channel
+    [
+        [
+            [[[2, 0], [1, 1]], [[0, 1], [2, 1]]],  # channel 0: instance
+            [[[0, 1], [3, 0]], [[1, 0], [1, 1]]],  # channel 1: class
+        ]
+    ],
     device=_device,
 )

132-142: Consider adding value assertions for 3D output.

Test validates shape but not correctness of computed values. Adding expected metric values (like in 2D tests) would strengthen coverage.

monai/metrics/panoptic_quality.py (2)

24-24: Optional: Sort __all__ alphabetically.

Per Ruff RUF022, isort-style sorting is preferred.

Suggested fix
-__all__ = ["PanopticQualityMetric", "compute_panoptic_quality", "compute_mean_iou"]
+__all__ = ["PanopticQualityMetric", "compute_mean_iou", "compute_panoptic_quality"]

315-333: Missing Raises section in docstring.

Per coding guidelines, docstrings should document raised exceptions.

Suggested fix
     Args:
         confusion_matrix: tensor with shape (..., 4) where the last dimension contains
             [tp, fp, fn, iou_sum] as returned by `compute_panoptic_quality` with `output_confusion_matrix=True`.
         smooth_numerator: a small constant added to the numerator to avoid zero.

     Returns:
         Mean IoU computed as iou_sum / (tp + smooth_numerator).

+    Raises:
+        ValueError: when `confusion_matrix` last dimension is not 4.
+
     """
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 23d33cf and a3cca00.

📒 Files selected for processing (2)
  • monai/metrics/panoptic_quality.py
  • tests/metrics/test_compute_panoptic_quality.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • monai/metrics/panoptic_quality.py
  • tests/metrics/test_compute_panoptic_quality.py
🧬 Code graph analysis (2)
monai/metrics/panoptic_quality.py (3)
monai/utils/misc.py (1)
  • ensure_tuple (166-178)
monai/metrics/confusion_matrix.py (1)
  • _compute_tensor (80-99)
monai/metrics/utils.py (1)
  • do_metric_reduction (71-130)
tests/metrics/test_compute_panoptic_quality.py (1)
monai/metrics/panoptic_quality.py (3)
  • PanopticQualityMetric (27-168)
  • aggregate (132-168)
  • compute_mean_iou (315-333)
🪛 Ruff (0.14.10)
monai/metrics/panoptic_quality.py

24-24: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)


107-109: Avoid specifying long messages outside the exception class

(TRY003)


148-148: Prefer TypeError exception for invalid type

(TRY004)


148-148: Avoid specifying long messages outside the exception class

(TRY003)


328-331: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: flake8-py3 (mypy)
  • GitHub Check: quick-py3 (macOS-latest)
  • GitHub Check: packaging
  • GitHub Check: quick-py3 (ubuntu-latest)
  • GitHub Check: quick-py3 (windows-latest)
  • GitHub Check: flake8-py3 (codeformat)
  • GitHub Check: flake8-py3 (pytype)
  • GitHub Check: build-docs
  • GitHub Check: min-dep-os (windows-latest)
  • GitHub Check: min-dep-py3 (3.12)
  • GitHub Check: min-dep-os (ubuntu-latest)
  • GitHub Check: min-dep-pytorch (2.7.1)
  • GitHub Check: min-dep-py3 (3.10)
  • GitHub Check: min-dep-py3 (3.9)
  • GitHub Check: min-dep-pytorch (2.8.0)
  • GitHub Check: min-dep-os (macOS-latest)
  • GitHub Check: min-dep-py3 (3.11)
  • GitHub Check: min-dep-pytorch (2.5.1)
  • GitHub Check: min-dep-pytorch (2.6.0)
🔇 Additional comments (10)
tests/metrics/test_compute_panoptic_quality.py (5)

144-160: LGTM. Confusion matrix validation is appropriate.

Minor: Comment on line 150 mentions shape (batch_size, num_classes, 4) but after reduction (default MEAN_BATCH), actual shape is (num_classes, 4). Consider updating the comment.


162-174: LGTM. Tests shape and non-negativity constraints appropriately.


176-193: LGTM. Good validation that different metric names produce different results.


195-208: LGTM. Properly tests boundary cases for invalid tensor dimensions.


210-222: LGTM. Validates error handling for malformed confusion matrices.

monai/metrics/panoptic_quality.py (5)

58-60: LGTM. New parameter is well-documented with backward-compatible default.

Also applies to: 70-78


80-95: LGTM. Docstrings properly updated for 3D input support.


106-109: LGTM. Dimension validation correctly accepts 4D (2D input) and 5D (3D input).


141-156: LGTM. Clean implementation of optional confusion matrix return path.


188-189: LGTM. Docstring correctly documents 2D/3D input forms.

…roject-MONAI#5702)

- Add support for 5D tensors (B2HWD) in addition to existing 4D (B2HW)
- Add `return_confusion_matrix` parameter to return raw tp, fp, fn, iou_sum values
- Add `compute_mean_iou` helper function for computing mean IoU from confusion matrix
- Update docstrings to reflect 2D/3D support
- Add comprehensive tests for new functionality

Signed-off-by: Soumya Snigdha Kundu <soumya_snigdha.kundu@kcl.ac.uk>
@aymuos15 aymuos15 force-pushed the 5702-panoptic-quality-3d-and-confusion-matrix branch from a3cca00 to c98d1b7 Compare January 12, 2026 15:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
monai/metrics/panoptic_quality.py (2)

24-24: Consider sorting __all__ alphabetically.

Static analysis flags unsorted __all__. Optional cleanup.

Suggested fix
-__all__ = ["PanopticQualityMetric", "compute_panoptic_quality", "compute_mean_iou"]
+__all__ = ["PanopticQualityMetric", "compute_mean_iou", "compute_panoptic_quality"]

315-332: Missing Raises section in docstring.

Per coding guidelines, docstrings should document raised exceptions.

Suggested fix
     Args:
         confusion_matrix: tensor with shape (..., 4) where the last dimension contains
             [tp, fp, fn, iou_sum] as returned by `compute_panoptic_quality` with `output_confusion_matrix=True`.
         smooth_numerator: a small constant added to the numerator to avoid zero.

     Returns:
         Mean IoU computed as iou_sum / (tp + smooth_numerator).

+    Raises:
+        ValueError: when `confusion_matrix` does not have shape (..., 4).
+
     """
tests/metrics/test_compute_panoptic_quality.py (1)

162-174: Consider importing compute_mean_iou at module level.

Since compute_mean_iou is now in __all__, import it with the other symbols at line 20.

Suggested fix

At line 20:

-from monai.metrics import PanopticQualityMetric, compute_panoptic_quality
+from monai.metrics import PanopticQualityMetric, compute_mean_iou, compute_panoptic_quality

Then remove the local import at line 164.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between a3cca00 and c98d1b7.

📒 Files selected for processing (2)
  • monai/metrics/panoptic_quality.py
  • tests/metrics/test_compute_panoptic_quality.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • monai/metrics/panoptic_quality.py
  • tests/metrics/test_compute_panoptic_quality.py
🧬 Code graph analysis (1)
monai/metrics/panoptic_quality.py (1)
monai/metrics/confusion_matrix.py (1)
  • _compute_tensor (80-99)
🪛 Ruff (0.14.10)
monai/metrics/panoptic_quality.py

24-24: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)


107-109: Avoid specifying long messages outside the exception class

(TRY003)


148-148: Prefer TypeError exception for invalid type

(TRY004)


148-148: Avoid specifying long messages outside the exception class

(TRY003)


328-331: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (10)
monai/metrics/panoptic_quality.py (4)

58-78: LGTM!

Constructor extension with return_confusion_matrix parameter is clean and well-documented.


80-130: LGTM!

3D support implemented correctly. Validation logic and docstrings properly updated.


141-155: LGTM!

Clean conditional return of raw confusion matrix values.


188-188: LGTM!

Docstring correctly reflects 2D/3D input support.

tests/metrics/test_compute_panoptic_quality.py (6)

91-110: LGTM!

3D test fixtures and confusion matrix test case properly defined.


132-142: LGTM!

Valid acceptance test for 3D input. Consider adding expected value assertions in future iterations.


144-160: LGTM!

Good coverage of confusion matrix return functionality.


176-193: LGTM!

Good validation that different metric names produce different results.


195-208: LGTM!

Good negative test coverage for invalid tensor dimensions.


210-222: LGTM!

Good negative test coverage for compute_mean_iou input validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] 3D support for PanopticQualityMetric + provide option to return confusion matrix

1 participant