-
Notifications
You must be signed in to change notification settings - Fork 9
Add ServerMaintenanceSet type and controller implementation
#272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Nuckal777
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this.
* Add helm charts * Add node selector and tolerations to charts * Add charts documentation * Add make helm and check it in a workflow * Add hostNetwork value * Restrict helm workflow to read only
Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.36.2 to 1.36.3. - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.36.2...v1.36.3) --- updated-dependencies: - dependency-name: github.com/onsi/gomega dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump golangci/golangci-lint-action from 6 to 7 Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 6 to 7. - [Release notes](https://github.com/golangci/golangci-lint-action/releases) - [Commits](golangci/golangci-lint-action@v6...v7) --- updated-dependencies: - dependency-name: golangci/golangci-lint-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * Bump golangci-lint to v2.0 and fix linting issues --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andreas Fritzler <andreas.fritzler@sap.com>
* implements maintenance controller * fixes lint errors * adds missing license info to maintenance controller * minor refactoring * uses server/serverclaim watches * fixes review comments * adds rbac roles; fixes test * enhances test with multiple maintenance objects * fixes enqueueMaintenanceByServerRefs * fixes lint errors * updates server reconciliation logic to handle ServerClaimRef state transitions * rebase main; fixes linting errors * renames createServerBootConfiguration func --------- Co-authored-by: Stefan Hipfel <stefan@Mac.fritz.box>
ServerMaintenanceSet type and controller implementation
|
Can we also add some docs under https://github.com/ironcore-dev/metal-operator/tree/main/docs/concepts and reference it in the overall arch chapter https://github.com/ironcore-dev/metal-operator/blob/main/docs/architecture.md (no need to expand the graph here as this will be to big) |
Nuckal777
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking another look at this. 👍
| for _, maintenance := range maintenancelist.Items { | ||
| log.V(1).Info("Checking for orphaned maintenance", "maintenance", maintenance.Name) | ||
| // Check if the maintenance is part of the server maintenance set | ||
| if !slices.Contains(serverNames, maintenance.Spec.ServerRef.Name) { | ||
| log.V(1).Info("Maintenance is orphaned, deleting", "maintenance", maintenance.Name) | ||
| if err := r.Delete(ctx, &maintenance); err != nil { | ||
| errs = errors.Join(errs, fmt.Errorf("failed to delete orphaned maintenance %s: %w", maintenance.Name, err)) | ||
| } else { | ||
| log.V(1).Info("Deleted orphaned maintenance", "maintenance", maintenance.Name) | ||
| } | ||
| continue | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we simply check that the Server targeted by the managed ServerMaintenances still exist in serverList (similar to what is done here)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking if server is being deleted happens in line 188
… remove unnecessary whitespace in helper.go
…intenance set documentation
…server references
… enhance tests for server deletion and status updates
WalkthroughThis PR introduces a new ServerMaintenanceSet CRD to manage multiple ServerMaintenance resources. It refactors ServerMaintenance to use an embedded ServerMaintenanceTemplate for configuration fields, adds a dedicated controller for ServerMaintenanceSet reconciliation, and provides supporting infrastructure including RBAC rules, documentation, and tests. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant K8s API
participant SMSReconciler as ServerMaintenanceSet<br/>Reconciler
participant Server as Server<br/>Resource
participant SM as ServerMaintenance<br/>Resource
User->>K8s API: Create ServerMaintenanceSet<br/>(with serverSelector)
K8s API->>SMSReconciler: Reconcile triggered
SMSReconciler->>SMSReconciler: Ensure finalizer attached
SMSReconciler->>K8s API: Query servers matching<br/>serverSelector labels
K8s API-->>SMSReconciler: List of matching Servers
loop For each matching Server
SMSReconciler->>SMSReconciler: Check if ServerMaintenance<br/>already exists
alt ServerMaintenance not owned
SMSReconciler->>K8s API: Create ServerMaintenance<br/>(from template + serverRef)
K8s API-->>SM: ServerMaintenance created
SMSReconciler->>SM: Set ControllerReference<br/>to ServerMaintenanceSet
end
end
SMSReconciler->>K8s API: Query all owned<br/>ServerMaintenance resources
K8s API-->>SMSReconciler: List of owned maintenances
loop For each owned ServerMaintenance
SMSReconciler->>SM: Get current state<br/>(Pending/InMaintenance/Failed/Completed)
SM-->>SMSReconciler: State + condition info
SMSReconciler->>SMSReconciler: Aggregate status counts
end
alt ServerMaintenance references<br/>deleted server
SMSReconciler->>K8s API: Delete orphaned<br/>ServerMaintenance
end
SMSReconciler->>K8s API: Patch ServerMaintenanceSet<br/>status with aggregated counts
K8s API-->>User: Status updated<br/>(pending, inMaintenance,<br/>failed, completed, total)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (4)
internal/controller/helper.go (1)
156-162: Consider edge cases fortruncateString.Two potential issues:
- Negative
maxLength: IfmaxLength < 0, the slices[:maxLength]will panic. Consider adding a guard.- Multi-byte UTF-8 characters: Slicing by byte index can split a multi-byte character, producing invalid UTF-8. If this is used for Kubernetes object names (which should be ASCII anyway), this may be acceptable, but worth noting.
Proposed defensive fix
func truncateString(s string, maxLength int) string { + if maxLength <= 0 { + return "" + } if len(s) <= maxLength { return s } return s[:maxLength] }api/v1alpha1/servermaintenanceset_types.go (1)
11-16: Consider adding kubebuilder validation for required fields.The
ServerSelectorfield appears to be required for the controller to function correctly. Consider adding a+kubebuilder:validation:Requiredmarker or ensuring the field has validation to prevent creating aServerMaintenanceSetwithout a selector.🔎 Suggested enhancement
// ServerMaintenanceSetSpec defines the desired state of ServerMaintenanceSet. type ServerMaintenanceSetSpec struct { // ServerLabelSelector specifies a label selector to identify the servers that are to be maintained. + // +required ServerSelector metav1.LabelSelector `json:"serverSelector"` // Template specifies the template for the server maintenance. ServerMaintenanceTemplate ServerMaintenanceTemplate `json:"serverMaintenanceTemplate"` }internal/controller/servermaintenance_controller.go (1)
235-255: Redundant error check and potential nil dereference.In
handleCompletedState:
- Lines 239-243: The
apierrors.IsNotFound(err)check is redundant becausegetServerRefalready returns(nil, nil)for not found errors (lines 356-358).- Line 245: The condition checks
server != nil && server.Spec.ServerMaintenanceRef == nil, but line 250 callsr.cleanup(ctx, log, server)without rechecking ifserveris nil after the early return at line 248. While logically correct due to control flow, line 253 accessesserver.Namewhich could panic ifserveris nil (though current logic prevents this).🔎 Suggested cleanup
func (r *ServerMaintenanceReconciler) handleCompletedState(ctx context.Context, log logr.Logger, serverMaintenance *metalv1alpha1.ServerMaintenance) (ctrl.Result, error) { log.V(1).Info("Handling completed state for server maintenance", "ServerMaintenance", serverMaintenance.Name) server, err := r.getServerRef(ctx, serverMaintenance) if err != nil { - if apierrors.IsNotFound(err) { - log.V(1).Info("Server not found, nothing to cleanup", "ServerMaintenance", serverMaintenance.Name) - return ctrl.Result{}, nil - } return ctrl.Result{}, err } - if server != nil && server.Spec.ServerMaintenanceRef == nil { + if server == nil { + log.V(1).Info("Server not found, nothing to cleanup", "ServerMaintenance", serverMaintenance.Name) + return ctrl.Result{}, nil + } + if server.Spec.ServerMaintenanceRef == nil { log.V(1).Info("Server is not in maintenance, nothing to cleanup", "ServerMaintenance", serverMaintenance.Name) return ctrl.Result{}, nil }internal/controller/servermaintenanceset_controller.go (1)
275-314: Consider optimizing enqueue logic for large clusters.The
enqueueMaintenanceSetByServershandler lists allServerMaintenanceSetresources for every server event, which could be expensive in clusters with many sets. Consider:
- Adding an index on servers by label keys used in selectors
- Caching selector matches
- Using informer caches more efficiently
For now, this is acceptable for typical deployments but worth noting for future optimization.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (5)
dist/chart/templates/crd/metal.ironcore.dev_servermaintenancesets.yamlis excluded by!**/dist/**dist/chart/templates/rbac/role.yamlis excluded by!**/dist/**dist/chart/templates/rbac/servermaintenanceset_admin_role.yamlis excluded by!**/dist/**dist/chart/templates/rbac/servermaintenanceset_editor_role.yamlis excluded by!**/dist/**dist/chart/templates/rbac/servermaintenanceset_viewer_role.yamlis excluded by!**/dist/**
📒 Files selected for processing (25)
PROJECTapi/v1alpha1/servermaintenance_types.goapi/v1alpha1/servermaintenanceset_types.goapi/v1alpha1/zz_generated.deepcopy.goconfig/crd/bases/metal.ironcore.dev_servermaintenancesets.yamlconfig/crd/kustomization.yamlconfig/rbac/role.yamlconfig/rbac/servermaintenanceset_admin_role.yamlconfig/rbac/servermaintenanceset_editor_role.yamlconfig/rbac/servermaintenanceset_viewer_role.yamlconfig/samples/kustomization.yamlconfig/samples/metal_v1alpha1_servermaintenance.yamlconfig/samples/metal_v1alpha1_servermaintenanceset.yamldocs/api-reference/api.mddocs/architecture.mddocs/concepts/servermaintenanceset.mdinternal/controller/helper.gointernal/controller/index_fields.gointernal/controller/server_controller_test.gointernal/controller/servermaintenance_controller.gointernal/controller/servermaintenance_controller_test.gointernal/controller/servermaintenanceset_controller.gointernal/controller/servermaintenanceset_controller_test.gointernal/controller/suite_test.gointernal/controller/test_helper.go
🧰 Additional context used
🧬 Code graph analysis (9)
internal/controller/suite_test.go (1)
internal/controller/servermaintenanceset_controller.go (1)
ServerMaintenanceSetReconciler(34-37)
internal/controller/servermaintenanceset_controller_test.go (4)
api/v1alpha1/servermaintenanceset_types.go (2)
ServerMaintenanceSet(36-42)ServerMaintenanceSetSpec(11-16)api/v1alpha1/bmcsecret_types.go (3)
BMCSecret(24-57)BMCSecretUsernameKeyName(13-13)BMCSecretPasswordKeyName(15-15)api/v1alpha1/bmc_types.go (2)
Protocol(118-129)ProtocolRedfishLocal(18-18)api/v1alpha1/servermaintenance_types.go (4)
ServerMaintenanceReasonAnnotationKey(15-15)ServerMaintenanceStateInMaintenance(77-77)ServerMaintenanceStateCompleted(81-81)ServerMaintenancePolicyOwnerApproval(59-59)
internal/controller/index_fields.go (1)
api/v1alpha1/servermaintenance_types.go (1)
ServerMaintenance(94-100)
internal/controller/server_controller_test.go (1)
api/v1alpha1/servermaintenance_types.go (2)
ServerMaintenanceTemplate(32-43)ServerMaintenancePolicyEnforced(61-61)
internal/controller/servermaintenance_controller_test.go (3)
api/v1alpha1/servermaintenance_types.go (4)
ServerMaintenanceTemplate(32-43)ServerMaintenancePolicyEnforced(61-61)ServerBootConfigurationTemplate(21-29)ServerMaintenancePolicyOwnerApproval(59-59)api/v1alpha1/server_types.go (1)
ServerStateDiscovery(150-150)api/v1alpha1/serverbootconfiguration_types.go (1)
ServerBootConfigurationSpec(12-26)
internal/controller/servermaintenance_controller.go (2)
api/v1alpha1/servermaintenance_types.go (4)
ServerMaintenanceStateFailed(79-79)ServerMaintenanceStateCompleted(81-81)ServerMaintenance(94-100)ServerMaintenanceList(105-109)api/v1alpha1/server_types.go (1)
Server(428-434)
internal/controller/test_helper.go (1)
api/v1alpha1/servermaintenance_types.go (1)
ServerMaintenanceList(105-109)
api/v1alpha1/zz_generated.deepcopy.go (2)
api/v1alpha1/servermaintenanceset_types.go (4)
ServerMaintenanceSet(36-42)ServerMaintenanceSetList(47-51)ServerMaintenanceSetSpec(11-16)ServerMaintenanceSetStatus(19-30)api/v1alpha1/servermaintenance_types.go (3)
ServerMaintenanceTemplate(32-43)ServerMaintenanceSpec(46-52)ServerBootConfigurationTemplate(21-29)
internal/controller/servermaintenanceset_controller.go (3)
api/v1alpha1/servermaintenanceset_types.go (3)
ServerMaintenanceSet(36-42)ServerMaintenanceSetStatus(19-30)ServerMaintenanceSetList(47-51)api/v1alpha1/servermaintenance_types.go (9)
ServerMaintenanceList(105-109)ServerMaintenance(94-100)ServerMaintenanceReasonAnnotationKey(15-15)ServerMaintenanceSpec(46-52)ServerMaintenanceTemplate(32-43)ServerMaintenanceStatePending(75-75)ServerMaintenanceStateInMaintenance(77-77)ServerMaintenanceStateFailed(79-79)ServerMaintenanceStateCompleted(81-81)api/v1alpha1/server_types.go (1)
ServerList(439-443)
🪛 Checkov (3.2.334)
config/rbac/servermaintenanceset_admin_role.yaml
[medium] 8-27: Minimize wildcard use in Roles and ClusterRoles
(CKV_K8S_49)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Run linter
- GitHub Check: Run tests
- GitHub Check: Run test e2e
- GitHub Check: Run test chart
🔇 Additional comments (37)
config/rbac/servermaintenanceset_viewer_role.yaml (1)
1-29: LGTM!The viewer role is well-structured with appropriate read-only permissions and clear documentation. The role follows standard Kubernetes RBAC patterns.
config/samples/metal_v1alpha1_servermaintenance.yaml (1)
10-16: Sample looks good; minor trailing whitespace.The sample correctly demonstrates the updated
ServerMaintenancespec structure withpolicy,serverPower, andserverReffields. There appears to be trailing blank lines at the end of the file (lines 15-16) which could be cleaned up.PROJECT (1)
77-85: LGTM!The new
ServerMaintenanceSetentry follows the established pattern and correctly configures the CRD as namespaced with controller scaffolding.config/crd/bases/metal.ironcore.dev_servermaintenancesets.yaml (2)
82-99: VerifyserverRefsemantics in the template.The
serverBootConfigurationTemplate.spec.serverRefis marked as required (line 97-98). In aServerMaintenanceSetcontext where multiple servers are selected viaserverSelector, having a staticserverRefin the template seems unusual—typically the controller would populate this dynamically per-server.Please verify this is the intended behavior, or if
serverRefshould be optional in the template and populated by the controller duringServerMaintenancecreation.
1-194: CRD structure looks good overall.The CRD is well-defined with:
- Proper metadata and controller-gen annotations
- Comprehensive spec schema with
serverMaintenanceTemplateandserverSelector- Status counters for tracking maintenance lifecycle states
- Appropriate use of
x-kubernetes-map-type: atomicfor label selectorsconfig/crd/kustomization.yaml (1)
12-12: LGTM!The new CRD base is correctly added to the kustomization resources in the appropriate position.
config/samples/kustomization.yaml (1)
10-10: LGTM!The new sample resource is correctly added to the kustomization.
docs/architecture.md (1)
86-86: LGTM!The
ServerMaintenanceSetCRD entry is appropriately documented and linked to its concept page. This addresses the reviewer's request to add documentation references to the architecture chapter.internal/controller/test_helper.go (1)
46-48: LGTM!The
ServerMaintenanceListcleanup validation follows the established pattern and is correctly positioned to ensure all maintenance resources are cleaned up after tests.internal/controller/index_fields.go (1)
59-70: LGTM!The new indexer for
ServerMaintenancefollows the established pattern and correctly handles nil checks. Reusing theserverRefFieldconstant is appropriate since bothBIOSSettingsandServerMaintenanceindex by the same field path.internal/controller/suite_test.go (1)
215-218: LGTM!The
ServerMaintenanceSetReconcilersetup follows the established pattern of other reconcilers in the test suite.config/samples/metal_v1alpha1_servermaintenanceset.yaml (1)
1-16: LGTM!The sample is now complete with a meaningful example demonstrating
serverSelectorwith label matching andserverMaintenanceTemplateconfiguration. This addresses the previous reviewer's request to fill in the example.config/rbac/servermaintenanceset_editor_role.yaml (1)
1-33: LGTM!The editor role follows least-privilege principles with explicit verb permissions and read-only access to the status subresource. The header comments appropriately explain the role's intended use.
config/rbac/role.yaml (1)
57-57: LGTM!The RBAC permissions for
servermaintenancesetsare correctly added across all three rule groups (main resource, finalizers, and status), following the established pattern and maintaining alphabetical ordering.Also applies to: 83-83, 103-103
config/rbac/servermaintenanceset_admin_role.yaml (1)
15-27: Wildcard verb usage is appropriate and consistent with all admin roles in the project.This ClusterRole follows the standard pattern used across all admin roles in the project (9 total files), and includes clear documentation explaining that it grants full permissions for users authorized to manage roles and bindings. The wildcard
'*'verb on the main resource (line 21) is intentional and properly scoped to the admin role context with appropriate status subresource restrictions.internal/controller/server_controller_test.go (1)
716-721: LGTM!The test correctly uses the new
ServerMaintenanceTemplatestructure, placingPolicyinside the template while keepingServerRefat the top level ofServerMaintenanceSpec. This aligns with the API refactoring introduced in this PR.api/v1alpha1/servermaintenanceset_types.go (2)
32-42: LGTM!The
ServerMaintenanceSettype definition follows Kubernetes CRD conventions correctly with proper kubebuilder markers for the status subresource. The type is well-structured.
53-55: LGTM!The type registration in
init()follows the standard pattern for registering Kubernetes API types with the scheme builder.internal/controller/servermaintenanceset_controller_test.go (2)
96-117: LGTM!The test correctly initializes a
ServerMaintenanceSetwith the proper spec structure includingServerSelectorandServerMaintenanceTemplate. The annotation for maintenance reason is also correctly propagated.
330-378: Good coverage for server label change scenarios.This test verifies that the controller correctly reacts when server labels change and no longer match the selector, properly updating the maintenance count. This addresses the past review comment requesting tests for label change scenarios.
internal/controller/servermaintenance_controller.go (4)
353-362: LGTM!The
getServerRefhelper cleanly encapsulates the server lookup logic, returning(nil, nil)for not found cases. This allows callers to distinguish between "not found" and actual errors.
294-297: Good defensive check for empty ServerPower.This addition prevents unnecessary patches when no power state is specified in the maintenance template, improving efficiency and avoiding unintended state changes.
488-493: LGTM!The optimized list query using
client.MatchingFieldswith theserverRefFieldindex improves performance by filtering at the API server level rather than client-side.
523-526: Good nil check for ServerRef on claims.This guard prevents nil pointer dereferences when processing claims without a
ServerRef, which could occur during claim creation or in edge cases.docs/api-reference/api.md (3)
1336-1390: LGTM!The API documentation for
ServerMaintenanceSet,ServerMaintenanceSetSpec, andServerMaintenanceSetStatusis complete and follows the established documentation format. All fields are properly documented with descriptions.
1446-1463: LGTM!The
ServerMaintenanceTemplatedocumentation is complete with proper field descriptions and "Appears in" references toServerMaintenanceSetSpecandServerMaintenanceSpec.
1427-1427: LGTM!The new
Completedstate is properly documented in theServerMaintenanceStateenum table.api/v1alpha1/servermaintenance_types.go (3)
31-43: LGTM!The
ServerMaintenanceTemplatetype is well-designed, encapsulating the reusable maintenance configuration fields (Policy, ServerPower, ServerBootConfigurationTemplate). This enables clean sharing betweenServerMaintenanceandServerMaintenanceSetresources.
45-52: LGTM!The refactored
ServerMaintenanceSpeccorrectly uses Go's inline struct embedding forServerMaintenanceTemplatewhile keepingServerRefas a top-level required field. This design cleanly separates instance-specific data (ServerRef) from template data that can be shared viaServerMaintenanceSet.
80-81: LGTM!The new
ServerMaintenanceStateCompletedstate enables the maintenance lifecycle to be completed without deleting the resource, providing more flexibility in maintenance workflows.internal/controller/servermaintenanceset_controller.go (3)
119-172: LGTM!The
createMaintenancesfunction correctly:
- Uses a map for O(1) lookup of existing maintenances (addresses past review comment)
- Uses
GenerateNamewith truncation to avoid DNS name length issues- Propagates annotations from the set to individual maintenances
- Sets controller references for proper GC
238-263: LGTM!The
calculateStatusfunction correctly aggregates maintenance states into the set's status counters. All four states (Pending, InMaintenance, Failed, Completed) are properly counted.
69-77: LGTM!The delete handler correctly removes only the finalizer, allowing Kubernetes garbage collection to clean up owned
ServerMaintenanceresources via the controller references. This aligns with the discussion in past review comments.internal/controller/servermaintenance_controller_test.go (3)
76-89: LGTM!The test correctly uses the new
ServerMaintenanceSpecstructure withServerRefat the top level andServerMaintenanceTemplatecontainingPolicy,ServerPower, andServerBootConfigurationTemplate. This aligns with the API refactoring.
358-372: Good test coverage for the new Completed state.This test verifies that setting a maintenance to
Completedstate properly triggers cleanup of the server's maintenance references (ServerMaintenanceRefandMaintenanceBootConfigurationRef), validating the newhandleCompletedStatelogic in the controller.
110-112: LGTM on cleanup additions.The explicit cleanup of
serverandbmcSecretat the end of each test ensures a clean state for subsequent tests, preventing resource leaks in the test environment.api/v1alpha1/zz_generated.deepcopy.go (1)
1668-1813: Auto-generated DeepCopy methods look correct.The generated DeepCopy implementations for the new ServerMaintenanceSet types and ServerMaintenanceTemplate follow the correct patterns and are consistent with:
- Existing generated code in this file (e.g., BIOSSettingsSet pattern)
- Type definitions from the relevant code snippets
- The refactored structure where ServerMaintenanceSpec now embeds ServerMaintenanceTemplate (line 1762)
All pointer fields are properly nil-checked, nested types correctly delegate to their own DeepCopyInto methods, and the ServerMaintenanceSetStatus correctly uses simple assignment for its primitive int32 fields.
|
|
||
| - `ServerMaintenanceSet` is namespaced and may represent various maintenance operations. | ||
| - Selection: Supports selecting target servers via label selectors. | ||
| - `ServerMaintenanceSet` shows the number of server in maintenance, pending or failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor grammar fix needed.
"server in maintenance" should be plural to match the context.
🔎 Proposed fix
-- `ServerMaintenanceSet` shows the number of server in maintenance, pending or failed.
+- `ServerMaintenanceSet` shows the number of servers in maintenance, pending, or failed.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - `ServerMaintenanceSet` shows the number of server in maintenance, pending or failed. | |
| - `ServerMaintenanceSet` shows the number of servers in maintenance, pending, or failed. |
🤖 Prompt for AI Agents
In docs/concepts/servermaintenanceset.md around line 10, the sentence uses
"server in maintenance" singular; change it to the plural "servers in
maintenance" so it reads: "`ServerMaintenanceSet` shows the number of servers in
maintenance, pending or failed." Ensure pluralization matches the rest of the
context.
| template: | ||
| policy: OwnerApproval | ||
| serverPower: On # or Off | ||
| serverBootConfigurationTemplate: | ||
| name: bios-update-config | ||
| spec: | ||
| image: "bios-update-image" | ||
| ignitionSecretRef: | ||
| name: bios-update-ignition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the API type definition for ServerMaintenanceSet
rg -n "serverMaintenanceTemplate|template:" api/v1alpha1/servermaintenanceset_types.go -A 2 -B 2Repository: ironcore-dev/metal-operator
Length of output: 301
🏁 Script executed:
#!/bin/bash
# Also check the sample YAML file to see what field name is used
cat config/samples/metal_v1alpha1_servermaintenanceset.yaml 2>/dev/null || echo "Sample file not found, searching for it..."
fd metal_v1alpha1_servermaintenanceset.yamlRepository: ironcore-dev/metal-operator
Length of output: 583
🏁 Script executed:
#!/bin/bash
# Search for ServerMaintenanceSet struct definition
ast-grep --pattern 'type ServerMaintenanceSet struct {
$$$
}'Repository: ironcore-dev/metal-operator
Length of output: 633
🏁 Script executed:
#!/bin/bash
# Verify the documentation file contains the incorrect field name
sed -n '30,42p' docs/concepts/servermaintenanceset.mdRepository: ironcore-dev/metal-operator
Length of output: 376
Change template: to serverMaintenanceTemplate: to match the API schema.
The documentation uses template: but the actual API field is named serverMaintenanceTemplate: (line 15 of api/v1alpha1/servermaintenanceset_types.go). The sample YAML file already demonstrates the correct field name.
🤖 Prompt for AI Agents
In docs/concepts/servermaintenanceset.md around lines 32 to 40, the example uses
the incorrect top-level key "template:" whereas the API expects
"serverMaintenanceTemplate:"; update the sample YAML to replace "template:" with
"serverMaintenanceTemplate:" so it matches
api/v1alpha1/servermaintenanceset_types.go and ensure indentation and nested
fields remain unchanged.
| By("Checking the Annotations and OwnerReferences of the created maintenances") | ||
| for _, m := range maintenanceList.Items { | ||
| Expect(m.OwnerReferences).To(ContainElement(metav1.OwnerReference{ | ||
| APIVersion: "metal.ironcore.dev/v1alpha1", | ||
| Kind: "ServerMaintenanceSet", | ||
| Name: servermaintenanceset.Name, | ||
| UID: servermaintenanceset.UID, | ||
| Controller: ptr.To(true), | ||
| BlockOwnerDeletion: ptr.To(true), | ||
| })) | ||
| Expect(m.Annotations[metalv1alpha1.ServerMaintenanceReasonAnnotationKey]).To(Equal("foo")) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stale list reference may cause flaky tests.
The maintenanceList used for checking OwnerReferences and Annotations was fetched in the previous Eventually block (lines 140-156). By the time this assertion runs, the list reference may be stale. Consider wrapping these assertions in an Eventually block or re-fetching the list.
🔎 Suggested fix
By("Checking the Annotations and OwnerReferences of the created maintenances")
+ Eventually(func(g Gomega) {
+ err := k8sClient.List(ctx, maintenanceList, &client.ListOptions{Namespace: ns.Name})
+ g.Expect(err).NotTo(HaveOccurred())
for _, m := range maintenanceList.Items {
- Expect(m.OwnerReferences).To(ContainElement(metav1.OwnerReference{
+ g.Expect(m.OwnerReferences).To(ContainElement(metav1.OwnerReference{
APIVersion: "metal.ironcore.dev/v1alpha1",
Kind: "ServerMaintenanceSet",
Name: servermaintenanceset.Name,
UID: servermaintenanceset.UID,
Controller: ptr.To(true),
BlockOwnerDeletion: ptr.To(true),
}))
- Expect(m.Annotations[metalv1alpha1.ServerMaintenanceReasonAnnotationKey]).To(Equal("foo"))
+ g.Expect(m.Annotations[metalv1alpha1.ServerMaintenanceReasonAnnotationKey]).To(Equal("foo"))
}
+ }).Should(Succeed())📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| By("Checking the Annotations and OwnerReferences of the created maintenances") | |
| for _, m := range maintenanceList.Items { | |
| Expect(m.OwnerReferences).To(ContainElement(metav1.OwnerReference{ | |
| APIVersion: "metal.ironcore.dev/v1alpha1", | |
| Kind: "ServerMaintenanceSet", | |
| Name: servermaintenanceset.Name, | |
| UID: servermaintenanceset.UID, | |
| Controller: ptr.To(true), | |
| BlockOwnerDeletion: ptr.To(true), | |
| })) | |
| Expect(m.Annotations[metalv1alpha1.ServerMaintenanceReasonAnnotationKey]).To(Equal("foo")) | |
| } | |
| By("Checking the Annotations and OwnerReferences of the created maintenances") | |
| Eventually(func(g Gomega) { | |
| err := k8sClient.List(ctx, maintenanceList, &client.ListOptions{Namespace: ns.Name}) | |
| g.Expect(err).NotTo(HaveOccurred()) | |
| for _, m := range maintenanceList.Items { | |
| g.Expect(m.OwnerReferences).To(ContainElement(metav1.OwnerReference{ | |
| APIVersion: "metal.ironcore.dev/v1alpha1", | |
| Kind: "ServerMaintenanceSet", | |
| Name: servermaintenanceset.Name, | |
| UID: servermaintenanceset.UID, | |
| Controller: ptr.To(true), | |
| BlockOwnerDeletion: ptr.To(true), | |
| })) | |
| g.Expect(m.Annotations[metalv1alpha1.ServerMaintenanceReasonAnnotationKey]).To(Equal("foo")) | |
| } | |
| }).Should(Succeed()) |
🤖 Prompt for AI Agents
In internal/controller/servermaintenanceset_controller_test.go around lines 158
to 169, the test iterates over maintenanceList fetched earlier and asserts
OwnerReferences and Annotations which can be stale and cause flakes; re-fetch
the maintenance list immediately before making these assertions or wrap the loop
in an Eventually that re-queries the API (using the same list lookup used
earlier) so the assertions run against the latest state, and keep the Expect
calls inside that retry block until they pass.
| By("Setting the maintenances to completed") | ||
| Eventually(UpdateStatus(&maintenanceList.Items[0], func() { | ||
| maintenanceList.Items[0].Status.State = metalv1alpha1.ServerMaintenanceStateCompleted | ||
| })).Should(Succeed()) | ||
| Eventually(UpdateStatus(&maintenanceList.Items[1], func() { | ||
| maintenanceList.Items[1].Status.State = metalv1alpha1.ServerMaintenanceStateCompleted | ||
| })).Should(Succeed()) | ||
| Eventually(UpdateStatus(&maintenanceList.Items[2], func() { | ||
| maintenanceList.Items[2].Status.State = metalv1alpha1.ServerMaintenanceStateCompleted | ||
| })).Should(Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential race condition when updating maintenance states.
The code updates maintenanceList.Items[0], [1], and [2] status to Completed, but these indices reference items from a list fetched earlier. If the list order changes or items are recreated (e.g., after deletion at line 180), the indices may not correspond to the expected maintenances. Consider re-fetching the list before these updates.
🤖 Prompt for AI Agents
In internal/controller/servermaintenanceset_controller_test.go around lines 270
to 279, the test updates maintenanceList.Items[0..2] statuses but reuses an
earlier-fetched list which can be reordered or changed (race) after prior
delete/recreate operations; re-fetch the current maintenance list (or retrieve
each maintenance by a stable identifier like name/UID) immediately before
performing each UpdateStatus call so you update the intended objects, then
proceed to set Status.State to Completed and assert success.
| for _, maintenance := range maintenancelist.Items { | ||
| log.V(1).Info("Checking for orphaned maintenance", "maintenance", maintenance.Name) | ||
| // Check if the maintenance is part of the server maintenance set | ||
| if _, exists := activeServers[maintenance.Spec.ServerRef.Name]; !exists { | ||
| log.V(1).Info("Maintenance is orphaned, deleting", "maintenance", maintenance.Name) | ||
| if err := r.Delete(ctx, &maintenance); err != nil { | ||
| errs = errors.Join(errs, fmt.Errorf("failed to delete orphaned maintenance %s: %w", maintenance.Name, err)) | ||
| } else { | ||
| log.V(1).Info("Deleted orphaned maintenance", "maintenance", maintenance.Name) | ||
| } | ||
| continue | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential nil pointer dereference in orphan deletion.
Line 191 accesses maintenance.Spec.ServerRef.Name without checking if ServerRef is nil. While createMaintenances always sets ServerRef, orphaned maintenances from external sources or previous versions could have nil ServerRef.
🔎 Suggested fix
for _, maintenance := range maintenancelist.Items {
log.V(1).Info("Checking for orphaned maintenance", "maintenance", maintenance.Name)
+ if maintenance.Spec.ServerRef == nil {
+ log.V(1).Info("Maintenance has no ServerRef, deleting", "maintenance", maintenance.Name)
+ if err := r.Delete(ctx, &maintenance); err != nil {
+ errs = errors.Join(errs, fmt.Errorf("failed to delete maintenance without ServerRef %s: %w", maintenance.Name, err))
+ }
+ continue
+ }
// Check if the maintenance is part of the server maintenance set
if _, exists := activeServers[maintenance.Spec.ServerRef.Name]; !exists {🤖 Prompt for AI Agents
In internal/controller/servermaintenanceset_controller.go around lines 188 to
199, the code dereferences maintenance.Spec.ServerRef without checking for nil
which can panic for legacy/external objects; change the logic to first test if
maintenance.Spec.ServerRef == nil and treat that as an orphan (log a clear
message that ServerRef is nil), then proceed to delete the maintenance (joining
errors as currently done) or skip deletion per policy — in any case avoid
dereferencing when ServerRef is nil and only access ServerRef.Name after the nil
check, keeping existing logging/error aggregation behavior.
| logf.SetLogger(zap.New(zap.WriteTo(GinkgoWriter), zap.UseDevMode(false))) | ||
|
|
||
| By("bootstrapping test environment") | ||
| logf.SetLogger(zap.New(zap.WriteTo(GinkgoWriter), zap.UseDevMode(true))) | ||
| logf.SetLogger(zap.New(zap.WriteTo(GinkgoWriter), zap.UseDevMode(false))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate logger setup.
The logger is configured twice with identical settings. The second call on line 69 is redundant and can be removed.
🔎 Proposed fix
var _ = BeforeSuite(func() {
logf.SetLogger(zap.New(zap.WriteTo(GinkgoWriter), zap.UseDevMode(false)))
By("bootstrapping test environment")
- logf.SetLogger(zap.New(zap.WriteTo(GinkgoWriter), zap.UseDevMode(false)))
testEnv = &envtest.Environment{📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logf.SetLogger(zap.New(zap.WriteTo(GinkgoWriter), zap.UseDevMode(false))) | |
| By("bootstrapping test environment") | |
| logf.SetLogger(zap.New(zap.WriteTo(GinkgoWriter), zap.UseDevMode(true))) | |
| logf.SetLogger(zap.New(zap.WriteTo(GinkgoWriter), zap.UseDevMode(false))) | |
| logf.SetLogger(zap.New(zap.WriteTo(GinkgoWriter), zap.UseDevMode(false))) | |
| By("bootstrapping test environment") |
🤖 Prompt for AI Agents
In internal/controller/suite_test.go around lines 66 to 69, the logger is
configured twice with identical settings; remove the redundant second call (the
duplicate zap.New(zap.WriteTo(GinkgoWriter), zap.UseDevMode(false)) invocation
on line 69) so the logger is only set once, leaving a single logf.SetLogger(...)
call.
Proposed Changes
Completestate to Maintenance CRD:CompletedSummary by CodeRabbit
Release Notes
New Features
Updates
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.