Skip to content

Commit 2be5033

Browse files
author
gdgate
authored
Merge pull request #1712 from chiendm/msf-master
MSF-17779: Monitor undesired LCM pod status Reviewed-by: Binh Pham https://github.com/helobinvn
2 parents ca2c7f5 + 5639560 commit 2be5033

File tree

2 files changed

+12
-2
lines changed

2 files changed

+12
-2
lines changed

k8s/charts/lcm-bricks/Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
apiVersion: v1
22
name: lcm-bricks
33
description: LCM Bricks
4-
version: 2.0.3
4+
version: 2.0.6

k8s/charts/lcm-bricks/templates/prometheus/alertingRules.yaml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,4 +92,14 @@ data:
9292
annotations:
9393
description: "We are hitting CPU limit in LCM namespace."
9494
summary: "We are hitting CPU limit in LCM namespace."
95-
95+
- alert: "[LCM] POD is in undesirable state on cluster={{ .Values.clusterId }}"
96+
expr: kube_pod_status_phase{namespace='{{ .Release.Namespace }}', phase!~"Running|Succeeded|Failed"} > 0
97+
for: 5m
98+
labels:
99+
cluster_id: {{ .Values.clusterId }}
100+
severity: critical
101+
team: lcm
102+
annotations:
103+
description: "POD {{`{{ $labels.pod }}`}} is not in desirable state"
104+
summary: "POD is not in desirable state"
105+
runbook: "https://confluence.intgdc.com/display/plat/Generic+runbooks#Genericrunbooks-Podisinundesirablestate"

0 commit comments

Comments
 (0)