Skip to content

Commit 9effb34

Browse files
authored
Ensure OLM finalizer runs to prevent px-operator namespace from being stuck terminating (#2059)
Summary: Ensure OLM finalizer runs to prevent px-operator namespace from being stuck terminating The helm install process followed by a helm uninstall does not fully clean up all pixie resources in the v0.1.7 operator release. The OLM project [added](operator-framework/operator-lifecycle-manager@f94a5ed) a csv-cleanup finalizer in [v0.27.0](https://github.com/operator-framework/operator-lifecycle-manager/releases/tag/v0.27.0) that causes the px-operator to get stuck in a terminating state if the `olm` and `px-operator` namespaces are deleted at the same time. In order to address this, a new Job is introduced within the olm namespace that triggers the deletion of the olm operator namespace (px-operator) from a `pre-delete` hook. This bug is not present when OLM is installed outside of the helm since the finalizer has time to run. Therefore this job only needs to run if `deployOLM` is set (helm is managing OLM). The other alternative I considered was writing another one off utility similar to the `vizier_deleter` Job. This would have the benefit of having a small surface area and wouldn't rely on third party images. Let me know if you have opinions/thoughts on that option or any other alternatives. Relevant Issues: #1917 Type of change: /kind bug Test Plan: Verified that the operator dev helm chart from this branch uninstalls properly ``` $ helm install pixie pixie-dev-operator/pixie-operator-chart --version 0.1.7-pre-ddelnano-fix-helm-uninstall-olm-finalizer.0 --set cloudAddr=<cloud_addr> --set deployKey=<deploy_key> --set clusterName='helm-uninstall-test' --namespace pl --create-namespace NAME: pixie LAST DEPLOYED: Wed Dec 11 03:13:42 2024 NAMESPACE: pl STATUS: deployed REVISION: 1 TEST SUITE: None $ helm -n pl uninstall pixie release "pixie" uninstalled $ kubectl get namespaces | grep 'px-operator\|olm\|pl' pl Active 6m31s $ kubectl -n pl get all No resources found in pl namespace. ``` - [x] Verified deployOLM controls if Job is present with `helm template` ``` $ helm template --set deployOLM=true k8s/operator/helm/ | grep -A 5 'Job' kind: Job metadata: name: csv-deleter namespace: olm annotations: "helm.sh/hook": pre-delete -- kind: Job metadata: name: vizier-deleter annotations: "helm.sh/hook": pre-delete "helm.sh/hook-delete-policy": hook-succeeded $ helm template --set deployOLM=false k8s/operator/helm/ | grep -A 5 'Job' kind: Job metadata: name: vizier-deleter annotations: "helm.sh/hook": pre-delete "helm.sh/hook-delete-policy": hook-succeeded ``` Changelog Message: Fix bug with the v0.1.7 operator helm chart that would cause a stuck `px-operator` namespace on uninstall --------- Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
1 parent e2a6737 commit 9effb34

File tree

1 file changed

+51
-0
lines changed

1 file changed

+51
-0
lines changed

k8s/operator/helm/templates/00_olm.yaml

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,4 +228,55 @@ metadata:
228228
spec:
229229
targetNamespaces:
230230
- {{ .Values.olmNamespace }}
231+
---
232+
apiVersion: batch/v1
233+
kind: Job
234+
metadata:
235+
name: csv-deleter
236+
namespace: {{ .Values.olmNamespace }}
237+
annotations:
238+
"helm.sh/hook": pre-delete
239+
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed
240+
spec:
241+
template:
242+
spec:
243+
restartPolicy: Never
244+
serviceAccountName: olm-operator-serviceaccount
245+
containers:
246+
- name: trigger-csv-finalizer
247+
image: ghcr.io/pixie-io/pixie-oss-pixie-dev-public-curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
248+
command:
249+
- /bin/sh
250+
- -c
251+
- |
252+
NAMESPACE="{{ .Values.olmOperatorNamespace }}"
253+
API_SERVER="https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT"
254+
CA_CERT=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
255+
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
256+
257+
DELETE_STATUS=$(curl --cacert $CA_CERT \
258+
-H "Authorization: Bearer $TOKEN" \
259+
-X DELETE -s \
260+
-o /dev/null -w "%{http_code}" \
261+
$API_SERVER/api/v1/namespaces/$NAMESPACE)
262+
263+
if [ "$DELETE_STATUS" -ne 200 ] && [ "$DELETE_STATUS" -ne 202 ]; then
264+
echo "Failed to initiate deletion for namespace $NAMESPACE. HTTP status code: $DELETE_STATUS"
265+
exit 1
266+
fi
267+
268+
echo "Waiting for finalizer in $NAMESPACE to complete..."
269+
while true; do
270+
STATUS=$(curl --cacert $CA_CERT \
271+
-H "Authorization: Bearer $TOKEN" \
272+
-o /dev/null -w "%{http_code}" -s \
273+
$API_SERVER/api/v1/namespaces/$NAMESPACE)
274+
if [ "$STATUS" = "404" ]; then
275+
echo "Namespace $NAMESPACE finalizer completed."
276+
break
277+
else
278+
echo "Finalizer still running in $NAMESPACE. Retrying in 5 seconds..."
279+
sleep 5
280+
fi
281+
done
231282
{{- end}}

0 commit comments

Comments
 (0)