Skip to content

fix(helm): remove duplicate metrics service causing install failures#82

Closed
ian-flores wants to merge 3 commits intomainfrom
fix-duplicate-metrics-service
Closed

fix(helm): remove duplicate metrics service causing install failures#82
ian-flores wants to merge 3 commits intomainfrom
fix-duplicate-metrics-service

Conversation

@ian-flores
Copy link
Collaborator

Summary

  • Remove dist/chart/templates/rbac/auth_proxy_service.yaml which duplicates the metrics service already defined in dist/chart/templates/metrics/metrics-service.yaml
  • Add rm -f post-processing step to make helm-generate to prevent kubebuilder from re-introducing the file

Context

The duplicate was originally fixed in PR #44 (Jan 20) and comprehensively addressed in PR #60 (Feb 2), but was re-introduced in commit ec76694 ("fix tests", Feb 3). Every release from v1.6.0 onward has both files, causing services "team-operator-controller-manager-metrics-service" already exists errors when both rbac.enable and metrics.enable are true.

Test plan

  • helm template renders only one metrics service
  • make helm-generate does not re-create auth_proxy_service.yaml
  • Deploy to staging cluster and verify no "already exists" error

The auth_proxy_service.yaml in rbac/ creates the same
team-operator-controller-manager-metrics-service as the one in
metrics/metrics-service.yaml. When both rbac.enable and metrics.enable
are true, Helm fails with "already exists".

Add rm -f to helm-generate post-processing to prevent kubebuilder from
re-introducing the duplicate.
@claude
Copy link

claude bot commented Feb 12, 2026

Claude finished @ian-flores's task —— View job


PR Review: fix(helm): remove duplicate metrics service causing install failures

  • Read review guidelines
  • Analyze PR diff and changed files
  • Review API changes
  • Review controller changes
  • Review Helm chart changes
  • Review tests
  • Submit review

Note: I was unable to submit inline review comments due to missing mcp__github__add_comment_to_pending_review tool permissions. Full review below.


Scope observation

This PR bundles two distinct changes: (1) the Helm duplicate-service fix from the title, and (2) a new audited jobs configuration feature. Consider whether splitting these into separate PRs would make review and changelog tracking clearer, since semantic-release will derive the version bump from the fix: prefix.


Helm chart fix — dist/chart/templates/rbac/auth_proxy_service.yaml deletion + Makefile

The core fix is clean. The rm -f in the Makefile (Makefile:184-187) is the right approach to prevent kubebuilder from re-introducing the file. Comments explain the "why" clearly.


API changes — AuditedJobsConfig (api/core/v1beta1/workbench_config.go:906-959)

Suggestion: The AuditedJobsConfig struct uses *int for boolean-like fields (Enabled, VanillaRequired, etc.) to distinguish "unset" from "0". However, WorkbenchRServerConfig uses plain int for the same fields (lines 1036-1044), meaning unset and disabled (0) are indistinguishable once propagated.

This is consistent with how every other field in WorkbenchRServerConfig works. But it means a partial config will emit audited-jobs-log-limit=0 and audited-jobs-deletion-expiry=0 in rserver.conf even when not explicitly set. The partial config test confirms this is intentional. Worth verifying that 0 is a safe no-op value for LogLimit and DeletionExpiry in Workbench (vs. being interpreted as "zero entries allowed" or "delete immediately").

Kubebuilder markers: Validation markers (Minimum=0, Maximum=1) are present on the boolean-like fields. LogLimit and DeletionExpiry have no Minimum validation — consider adding +kubebuilder:validation:Minimum=0 to prevent negative values.


Controller changes — site_controller_workbench.go:400-430

Important: The audited jobs propagation accesses targetWorkbench.Spec.Config.RServer without a nil check. This works because RServer is initialized at line 130, but other areas of this file (e.g., Databricks at line 389) defensively check RServer == nil before use. Consider adding the same guard for consistency and safety:

if targetWorkbench.Spec.Config.RServer == nil {
    targetWorkbench.Spec.Config.RServer = &v1beta1.WorkbenchRServerConfig{}
}

Idempotency: The reconciliation logic is idempotent — re-running with the same config produces the same result. Config flows correctly from Site → Workbench.


Tests — workbench_config_test.go and site_test.go

Good coverage:

  • TestWorkbenchConfig_AuditedJobs — full config, verifies all fields render to rserver.conf
  • TestWorkbenchConfig_AuditedJobsPartial — partial config, verifies zero-value behavior and string omission
  • TestSiteAuditedJobsConfiguration — end-to-end Site → Workbench propagation
  • TestSiteAuditedJobsPartialConfiguration — partial propagation with zero-value assertions

The intPtr helper is defined locally in each test function. Minor duplication but acceptable for test clarity.


Generated files

zz_generated.deepcopy.go, client-go/, and CRD YAMLs appear consistent with the API changes. The AuditedJobsConfig.DeepCopyInto correctly handles all *int pointer fields.


Documentation — docs/guides/workbench-configuration.md

The new "Audited Jobs" section is comprehensive with config examples, key management instructions, and requirements. Table of contents updated correctly.


Summary

The Helm fix is straightforward and correct. The audited jobs feature follows existing patterns well. Two items worth addressing:

  1. Defensive nil check on RServer in the controller (consistency with existing Databricks pattern)
  2. Validation markers on LogLimit/DeletionExpiry to prevent negative values

@ian-flores
Copy link
Collaborator Author

Closing in favor of a clean branch from main.

@ian-flores ian-flores closed this Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments