abandon evaluation of any new catalogsource image which is pathological by grokspawn · Pull Request #3766 · operator-framework/operator-lifecycle-manager

grokspawn · 2026-02-11T22:43:20Z

Description of the change:

This PR adds a pathological status check for when container status indicates crashloopbackoff.
If that status matches, OLM will abandon catalog evaluation for that image, dispose of the pod, and pull a new image when the pollInterval comes up again.

This will now cause the catalog operator to emit a message of the form

time="2026-02-11T22:30:17Z" level=info msg="catalog polling result: update pod poison-pill-9qvmp failed to start" UpdatePod=poison-pill-9qvmp

and the offending pod will be deleted.

Motivation for the change:

In the case that a catalogsource defines .spec.grpcPodConfig.extractContent it is possible for OLMv0 to get trapped in an evaluation loop if the catalogsource is not compatible with the on-cluster catalogsource service.

This is because the on-cluster catalog services which use extractContent define two initContainers and a service container. When those initContainers succeed, the pod status progresses to RUNNING regardless of the success/failure of the service container.

If the service container fails, it will halt, and the pod will start being rebooted by kube when it fails readiness/liveness probes. It will remain in RUNNING status, so OLM will requeue its evaluation without end.

Architectural changes:

Testing remarks:

Reviewer Checklist

…y restrts Signed-off-by: grokspawn <jordan@nimblewidget.com>

pkg/controller/registry/reconciler/grpc.go

Signed-off-by: grokspawn <jordan@nimblewidget.com>

joelanford

/approve

Just the one (nit) question about the crashloopbackoff constant.

joelanford · 2026-02-19T00:52:05Z

pkg/controller/registry/reconciler/grpc.go

 	ServiceHashLabelKey         = "olm.service-spec-hash"
 	CatalogPollingRequeuePeriod = 30 * time.Second
+	// containerReasonCrashLoopBackOff is the kubelet Waiting reason when a container is backing off after repeated crashes.
+	containerReasonCrashLoopBackOff = "CrashLoopBackOff"


Just checking there isn't already a constant defined for this in corev1 of k8s.io/api?

joelanford · 2026-02-19T01:19:22Z

pkg/controller/registry/reconciler/grpc.go

+			return true
+		}
+	}
+	// TODO: currently no ephemeral containers in a catalogsource, should we add checks anyway?


Ephemeral containers should never be part of the equation I don't think.

See: https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/

openshift-ci · 2026-02-19T01:20:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: joelanford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [joelanford]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

abandon evaluation of any new catalogsource image which pathologicall…

9c2c100

…y restrts Signed-off-by: grokspawn <jordan@nimblewidget.com>

openshift-ci bot requested review from kevinrizza and perdasilva February 11, 2026 22:43

joelanford reviewed Feb 12, 2026

View reviewed changes

pkg/controller/registry/reconciler/grpc.go Outdated Show resolved Hide resolved

joelanford reviewed Feb 12, 2026

View reviewed changes

pkg/controller/registry/reconciler/grpc.go Outdated Show resolved Hide resolved

add CLBO container considerations to detection

1098551

Signed-off-by: grokspawn <jordan@nimblewidget.com>

grokspawn force-pushed the fix-wedged-newcat-eval branch from 9985929 to 1098551 Compare February 12, 2026 15:56

grokspawn changed the title ~~abandon evaluation of any new catalogsource image which pathologically restarts~~ abandon evaluation of any new catalogsource image which is pathological Feb 13, 2026

grokspawn requested a review from joelanford February 18, 2026 20:02

joelanford approved these changes Feb 19, 2026

View reviewed changes

openshift-ci bot assigned joelanford Feb 19, 2026

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 19, 2026

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 19, 2026

openshift-merge-bot bot merged commit feecd01 into operator-framework:master Feb 19, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

abandon evaluation of any new catalogsource image which is pathological#3766

abandon evaluation of any new catalogsource image which is pathological#3766
openshift-merge-bot[bot] merged 2 commits intooperator-framework:masterfrom
grokspawn:fix-wedged-newcat-eval

grokspawn commented Feb 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

joelanford left a comment

Uh oh!

joelanford Feb 19, 2026

Uh oh!

joelanford Feb 19, 2026

Uh oh!

openshift-ci bot commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

grokspawn commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joelanford left a comment

Choose a reason for hiding this comment

Uh oh!

joelanford Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

joelanford Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

grokspawn commented Feb 11, 2026 •

edited

Loading