:seedling: Remove ephemeral aspects of network errors by perdasilva · Pull Request #2507 · operator-framework/operator-controller

perdasilva · 2026-02-13T13:41:21Z

Description

When there's a network error, e.g. the registry is not found, successive error message can change due to the source port changing between retries. This causes unnecessary updates to the resource and triggering reconciliation of downstream resources.

I noticed this error when observing that the ClusterExtension controller was being triggered every 10 seconds even though it had reached a ProgressionDeadlineTimeout terminal state. The issue was that one of the ClusterCatalogs was pointing to a registry that did not exist and the periodic change to the error message triggered reconciliation on the ClusterExtensions.

I would see a Progressing message on the catalog like:

lookup docker-registry.operator-controller.svc on 10.96.0.10:53: read udp 10.244.0.8:46753->10.96.0.10:53: i/o timeout

and every 10 seconds I would see the same message but with 10.244.0.8 having a different port

Reviewer Checklist

API Go Documentation
Tests: Unit Tests (and E2E Tests, if appropriate)
Comprehensive Commit Messages
Links to related GitHub Issue(s)

netlify · 2026-02-13T13:41:27Z

✅ Deploy Preview for olmv1 ready!

Name	Link
🔨 Latest commit	`914cdce`
🔍 Latest deploy log	https://app.netlify.com/projects/olmv1/deploys/6996d88e947de30008bdfc59
😎 Deploy Preview	https://deploy-preview-2507--olmv1.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copilot

Pull request overview

This PR introduces a network-error sanitization utility to remove ephemeral details (notably changing source ports) from error messages so status condition messages remain stable across retries, reducing unnecessary downstream reconciliations.

Changes:

Added SanitizeNetworkError helper for producing stable, sanitized network error messages.
Updated ClusterExtension and ClusterCatalog Progressing condition messages to use the sanitizer.
Added unit tests for sanitization behavior and controller condition-message expectations.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`internal/shared/util/error/sanitize.go`	New helper to sanitize `net.DNSError` / `net.OpError` messages for retry stability.
`internal/shared/util/error/sanitize_test.go`	New unit tests covering sanitizer behavior and consistency across varying ephemeral ports.
`internal/operator-controller/controllers/common_controller.go`	Uses sanitized messages for ClusterExtension Progressing condition (after unwrapping terminal errors).
`internal/operator-controller/controllers/common_controller_test.go`	Adds cases asserting sanitized network/DNS error messages in Progressing condition.
`internal/catalogd/controllers/core/clustercatalog_controller.go`	Uses sanitized messages for ClusterCatalog Progressing condition.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

internal/shared/util/error/sanitize.go

internal/shared/util/error/sanitize_test.go

internal/shared/util/error/sanitize.go

codecov · 2026-02-13T14:01:24Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.20%. Comparing base (fb28936) to head (914cdce).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2507      +/-   ##
==========================================
- Coverage   73.23%   73.20%   -0.04%     
==========================================
  Files         102      103       +1     
  Lines        8505     8509       +4     
==========================================
  Hits         6229     6229              
+ Misses       1801     1800       -1     
- Partials      475      480       +5

Flag	Coverage Δ
e2e	`45.56% <50.00%> (-0.33%)`	⬇️
experimental-e2e	`53.58% <50.00%> (+0.16%)`	⬆️
unit	`57.91% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

internal/operator-controller/controllers/common_controller_test.go

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pedjak · 2026-02-17T15:30:19Z

I noticed this error when observing that the ClusterExtension controller was being triggered every 10 seconds even though it had reached a ProgressionDeadlineTimeout terminal state.

I would suggest that we do not put on the reconcile queue clusterextension that have reached ProgressDeadlineTimeout when ClusterCatalog change is there.

perdasilva · 2026-02-17T19:16:04Z

I noticed this error when observing that the ClusterExtension controller was being triggered every 10 seconds even though it had reached a ProgressionDeadlineTimeout terminal state.

I would suggest that we do not put on the reconcile queue clusterextension that have reached ProgressDeadlineTimeout when ClusterCatalog change is there.

I think this is a separate issue. The main issue here is that we are updating the cluster catalog even though there's no real change in state, and this is triggering unnecessary reconciliation on all CEs (independently of their underlying state).

On ProgressionDeadlineTimeout side, we need to think about the different options for getting the CE unstuck. I was thinking over ShiftWeek that we may want to move the progression deadline check to the CE controller since there are other errors that we might want to stop retrying for after a while that aren't to do with revision, e.g. bundle download / unpack issues, resolution issues, etc.

In some cases it may make sense to let an exhausted CE reconcile on catalog changes - maybe something like there's a type in the bundle image url that gets fixed in a subsequent. Although, I'd say that if it doesn't solve the problem, you'd still be in the ProgressionDeadlineTimeout state.

pedjak

/lgtm

pedjak · 2026-02-18T21:15:15Z

I would say this PR is 🌱 not a bug fix - we are dropping unnecessary reconciliations, not correcting system behavior.. so it is a nice improvement.

Signed-off-by: Per Goncalves da Silva <pegoncal@redhat.com>

pedjak · 2026-02-19T12:01:05Z

/lgtm

camilamacedo86

/lgtm

openshift-ci · 2026-02-19T12:04:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: camilamacedo86

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [camilamacedo86]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

bandrade · 2026-02-19T12:04:28Z

/label qe-approved
/verified by @bandrade

openshift-ci · 2026-02-19T12:04:31Z

@bandrade: The label(s) qe-approved cannot be applied, because the repository doesn't have them.

Details

In response to this:

/label qe-approved
/verified by @bandrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copilot AI review requested due to automatic review settings February 13, 2026 13:41

openshift-ci bot requested review from ankitathomas and dtfranz February 13, 2026 13:41

Copilot started reviewing on behalf of perdasilva February 13, 2026 13:43 View session

Copilot AI reviewed Feb 13, 2026

View reviewed changes

perdasilva marked this pull request as draft February 16, 2026 10:55

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 16, 2026

perdasilva force-pushed the bug-catalog-retrigger branch 2 times, most recently from 2890347 to b7dbac8 Compare February 16, 2026 16:05

Copilot AI review requested due to automatic review settings February 16, 2026 16:05

Copilot started reviewing on behalf of perdasilva February 16, 2026 16:06 View session

Copilot AI reviewed Feb 16, 2026

View reviewed changes

internal/operator-controller/controllers/common_controller_test.go Outdated Show resolved Hide resolved

internal/operator-controller/controllers/common_controller_test.go Outdated Show resolved Hide resolved

perdasilva force-pushed the bug-catalog-retrigger branch from b7dbac8 to ea4e5b4 Compare February 17, 2026 07:50

perdasilva marked this pull request as ready for review February 17, 2026 07:50

Copilot AI review requested due to automatic review settings February 17, 2026 07:50

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 17, 2026

openshift-ci bot requested review from bentito and oceanc80 February 17, 2026 07:50

Copilot started reviewing on behalf of perdasilva February 17, 2026 07:51 View session

Copilot AI reviewed Feb 17, 2026

View reviewed changes

pedjak reviewed Feb 18, 2026

View reviewed changes

openshift-ci bot assigned pedjak Feb 18, 2026

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 18, 2026

Remove ephemeral aspects ofnetwork errors on Progressing condition

914cdce

Signed-off-by: Per Goncalves da Silva <pegoncal@redhat.com>

perdasilva force-pushed the bug-catalog-retrigger branch from ea4e5b4 to 914cdce Compare February 19, 2026 09:31

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 19, 2026

perdasilva changed the title ~~🐛 Remove ephemeral aspects of network errors~~ 🌱 Remove ephemeral aspects of network errors Feb 19, 2026

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 19, 2026

camilamacedo86 approved these changes Feb 19, 2026

View reviewed changes

openshift-ci bot assigned camilamacedo86 Feb 19, 2026

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 19, 2026

openshift-merge-bot bot merged commit 1fd37fa into operator-framework:main Feb 19, 2026
36 of 37 checks passed

Comments

Conversation

perdasilva commented Feb 13, 2026

Description

Reviewer Checklist

Uh oh!

netlify bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for olmv1 ready!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

pedjak commented Feb 17, 2026

Uh oh!

perdasilva commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pedjak left a comment

Choose a reason for hiding this comment

Uh oh!

pedjak commented Feb 18, 2026

Uh oh!

pedjak commented Feb 19, 2026

Uh oh!

camilamacedo86 left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Feb 19, 2026

Uh oh!

bandrade commented Feb 19, 2026

Uh oh!

openshift-ci bot commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

netlify bot commented Feb 13, 2026 •

edited

Loading

codecov bot commented Feb 13, 2026 •

edited

Loading

perdasilva commented Feb 17, 2026 •

edited

Loading