fix(chart): repair immutable Deployment selector on skyhook->nodewright upgrade#286
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a shared controller-manager selector-label helper, updates the Deployment selector to use it, and introduces a default-enabled pre-upgrade Job with dedicated ServiceAccount, Role, and RoleBinding that compares live selector labels before deleting the Deployment when they differ. The chart also switches cleanup jobs to pinned alpine/kubectl images, updates values and docs, and extends Helm template tests. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/release-process.md`:
- Around line 306-308: The follow-up instruction is stale and references the
wrong image updates. In the release-process guidance near the maintenance-job
`alpine/kubectl` note, update the next sentence to describe the correct action
for `webhook.removalTag` and keep it aligned with the supported Kubernetes
range, rather than telling readers to change the `kube-rbac-proxy`, operator,
and agent digests. Use the existing `webhook.removalTag` and
`docs/kubernetes-support.md` references to keep the wording tied to the right
config.
In `@k8s-tests/chainsaw/helm/helm-template-test/chainsaw-test.yaml`:
- Around line 175-186: The helm template test only checks that the rendered hook
image comes from alpine/kubectl, so it can still pass if the image is unpinned.
Tighten the assertions in the chainsaw test script by updating the check near
the helm template command to also verify the digest pin, either by matching the
full rendered image or by asserting the output contains `@sha256`:, while keeping
the existing repository-switch checks intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 900b7716-ea05-4381-817d-b09c5bc028fe
📒 Files selected for processing (13)
ROADMAP.mdchart/RELEASE_NOTES.mdchart/templates/_helpers.tplchart/templates/cleanup-skyhooks-job.yamlchart/templates/cleanup-webhook-job.yamlchart/templates/deployment.yamlchart/templates/selector-migration-job.yamlchart/templates/selector-migration-role.yamlchart/templates/selector-migration-rolebinding.yamlchart/templates/selector-migration-serviceaccount.yamlchart/values.yamldocs/release-process.mdk8s-tests/chainsaw/helm/helm-template-test/chainsaw-test.yaml
Coverage Report for CI Build 28252625797Coverage decreased (-0.02%) to 81.683%Details
Uncovered ChangesNo uncovered changes found. Coverage Regressions2 previously-covered lines in 1 file lost coverage.
Coverage Stats
💛 - Coveralls |
e188e55 to
ea75b77
Compare
|
Addressed both CodeRabbit findings (force-pushed):
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@chart/templates/selector-migration-job.yaml`:
- Around line 83-94: The selector migration job’s kubectl execution is running
with a read-only root filesystem, so it needs an explicit writable cache
location. Update the selector migration hook/template by adding a writable
emptyDir volume and mounting it in the same pod spec used by kubectl, then
configure kubectl to use that path via KUBECACHEDIR or the --cache-dir option so
discovery caching works without permission errors.
In `@k8s-tests/chainsaw/helm/helm-template-test/chainsaw-test.yaml`:
- Around line 145-174: The selector-migration-job-renders test only checks the
Job shape, so it can miss regressions in the selector payload. Update the
chainsaw assertion for the selector-migration job to verify the rendered command
in the job container uses the full managerSelectorLabels set, not just
control-plane=controller-manager. Focus on the selector-migration-job template
and its migration container command so the rendered output includes
control-plane, app.kubernetes.io/name, and app.kubernetes.io/instance.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 9c696aed-a479-4678-b2b5-af4df15a0fd5
📒 Files selected for processing (13)
ROADMAP.mdchart/RELEASE_NOTES.mdchart/templates/_helpers.tplchart/templates/cleanup-skyhooks-job.yamlchart/templates/cleanup-webhook-job.yamlchart/templates/deployment.yamlchart/templates/selector-migration-job.yamlchart/templates/selector-migration-role.yamlchart/templates/selector-migration-rolebinding.yamlchart/templates/selector-migration-serviceaccount.yamlchart/values.yamldocs/release-process.mdk8s-tests/chainsaw/helm/helm-template-test/chainsaw-test.yaml
ea75b77 to
1836257
Compare
|
Addressed both findings from the latest review (force-pushed):
Re-validated: |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@chart/templates/deployment.yaml`:
- Around line 18-24: The Deployment template is still duplicating selector
labels in the pod template instead of reusing the shared selector helper. Update
the pod template labels in chart/templates/deployment.yaml to use
chart.managerSelectorLabels the same way spec.selector already does, so the
selector and template stay in sync if the helper changes. Use the existing
Deployment.spec.selector and pod template labels section as the places to align.
In `@chart/templates/selector-migration-role.yaml`:
- Around line 13-21: The hook Role currently grants namespace-wide Deployment
access in the selector migration template. Update the rules in the
selector-migration Role manifest to scope the controller-manager access by using
resourceNames for the target Deployment in the get and delete permissions, and
remove the list verb entirely. Keep the change localized to the Role rules so
the hook token only has access to the controller-manager Deployment.
In `@chart/values.yaml`:
- Around line 159-165: The maintenance kubectl image pinned in the values for
the removal jobs is outside the supported Kubernetes skew for the chart’s
documented minimum version. Update the removalImage/removalTag/removalDigest
entry so the kubectl version matches the supported cluster range, or raise the
chart’s minimum Kubernetes version in the same release; keep the fix aligned
with the maintenance job settings referenced by removalImage and removalTag.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 89bbaea0-85a4-4ced-9422-046f7a366f08
📒 Files selected for processing (13)
ROADMAP.mdchart/RELEASE_NOTES.mdchart/templates/_helpers.tplchart/templates/cleanup-skyhooks-job.yamlchart/templates/cleanup-webhook-job.yamlchart/templates/deployment.yamlchart/templates/selector-migration-job.yamlchart/templates/selector-migration-role.yamlchart/templates/selector-migration-rolebinding.yamlchart/templates/selector-migration-serviceaccount.yamlchart/values.yamldocs/release-process.mdk8s-tests/chainsaw/helm/helm-template-test/chainsaw-test.yaml
1836257 to
ffd9375
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/release-process.md`:
- Line 286: The release-process snippet uses the wrong Docker command name, so
update the prerequisite and the example to use docker buildx instead of
docker-buildx. Make the change in the kubectl image inspection command so the
documented command matches the standard Docker CLI and is consistent with the
rest of the release-process instructions.
In `@k8s-tests/chainsaw/helm/helm-template-test/chainsaw-test.yaml`:
- Around line 228-247: The inline least-privilege comment is outdated because
the rendered Role in the helm template test only grants get and delete, not
list. Update the comment near the helm template assertion in chainsaw-test.yaml
to match the actual permissions enforced by the selector-migration Role, using
the same nodewright-selector-migration and selector-migration-role.yaml context
to keep it accurate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: cfbacf62-9df7-4ea2-8cd7-35d35860da3a
📒 Files selected for processing (13)
ROADMAP.mdchart/RELEASE_NOTES.mdchart/templates/_helpers.tplchart/templates/cleanup-skyhooks-job.yamlchart/templates/cleanup-webhook-job.yamlchart/templates/deployment.yamlchart/templates/selector-migration-job.yamlchart/templates/selector-migration-role.yamlchart/templates/selector-migration-rolebinding.yamlchart/templates/selector-migration-serviceaccount.yamlchart/values.yamldocs/release-process.mdk8s-tests/chainsaw/helm/helm-template-test/chainsaw-test.yaml
ffd9375 to
bb5cb29
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/release-process.md`:
- Line 299: The kubectl image reference in the release-process guidance is
pinned outside the supported client/server skew for the documented minimum
Kubernetes version. Update the kubectl tag used for the short-lived maintenance
jobs to a version within one minor of the oldest supported cluster, or revise
the documented minimum Kubernetes version so it matches the pinned kubectl
version; keep the change aligned with the existing kubectl guidance in the
release-process document.
In `@k8s-tests/chainsaw/helm/helm-template-test/chainsaw-test.yaml`:
- Around line 145-196: The selector-migration job template test is missing
coverage for the writable kubectl cache setup added to support a read-only root
filesystem. Update the assertions in the selector-migration-job render test to
verify the Job spec from selector-migration-job.yaml includes the
KUBECACHEDIR=/tmp/kubecache environment setting and the writable emptyDir volume
mount for the kubectl cache, alongside the existing image and selector checks.
Focus on the rendered Job manifest and its container spec so regressions in the
cache contract are caught.
- Around line 119-196: The current helm template tests only cover default chart
values, so they miss the rename/override upgrade path that breaks immutable
selectors. Update the chainsaw cases in the deployment and selector-migration
job tests to render with a nameOverride/fullnameOverride scenario that preserves
the controller-manager identity, and assert the Deployment selector and
selector-migration hook still match there. Use the existing test steps around
deployment-selector-pinned-labels and selector-migration-job-renders to add the
override-based render and corresponding expectations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 4e8ab392-94b4-416b-929b-40e25d85c329
📒 Files selected for processing (13)
ROADMAP.mdchart/RELEASE_NOTES.mdchart/templates/_helpers.tplchart/templates/cleanup-skyhooks-job.yamlchart/templates/cleanup-webhook-job.yamlchart/templates/deployment.yamlchart/templates/selector-migration-job.yamlchart/templates/selector-migration-role.yamlchart/templates/selector-migration-rolebinding.yamlchart/templates/selector-migration-serviceaccount.yamlchart/values.yamldocs/release-process.mdk8s-tests/chainsaw/helm/helm-template-test/chainsaw-test.yaml
bb5cb29 to
9bdcec0
Compare
|
Note on the |
…ht upgrade helm upgrade fails with "spec.selector: field is immutable" when a release first installed from the pre-rename skyhook-operator chart preserves the controller- manager Deployment name across the rename (via fullnameOverride/nameOverride, as AICR does). The selector's app.kubernetes.io/name is sourced from the chart name, which changed skyhook-operator -> nodewright, and spec.selector is immutable. - source the controller-manager selector from a shared chart.managerSelectorLabels helper used by both the Deployment and the new migration hook - add a smart pre-upgrade hook (Job + least-privilege ServiceAccount/Role/ RoleBinding, gated by selectorMigration.enabled, default on) that deletes the Deployment only when its live selector does not match the desired one, so helm recreates it; verified no-op (no downtime) on normal upgrades - migrate the maintenance-job image off the now-paywalled bitnami/kubectl to alpine/kubectl:1.36.2 pinned by multi-arch digest (closes #207) - add helm-template chainsaw asserts for the selector pin and each hook resource - docs: add chart RELEASE_NOTES, update release-process image example and ROADMAP Fixes #285 Closes #207 Signed-off-by: Brian Lockwood <lockwobr@gmail.com>
9bdcec0 to
2dc48bd
Compare
…ht upgrade (#286) (#289) helm upgrade fails with "spec.selector: field is immutable" when a release first installed from the pre-rename skyhook-operator chart preserves the controller- manager Deployment name across the rename (via fullnameOverride/nameOverride, as AICR does). The selector's app.kubernetes.io/name is sourced from the chart name, which changed skyhook-operator -> nodewright, and spec.selector is immutable. - source the controller-manager selector from a shared chart.managerSelectorLabels helper used by both the Deployment and the new migration hook - add a smart pre-upgrade hook (Job + least-privilege ServiceAccount/Role/ RoleBinding, gated by selectorMigration.enabled, default on) that deletes the Deployment only when its live selector does not match the desired one, so helm recreates it; verified no-op (no downtime) on normal upgrades - migrate the maintenance-job image off the now-paywalled bitnami/kubectl to alpine/kubectl:1.36.2 pinned by multi-arch digest (closes #207) - add helm-template chainsaw asserts for the selector pin and each hook resource - docs: add chart RELEASE_NOTES, update release-process image example and ROADMAP Fixes #285 Closes #207 Signed-off-by: Brian Lockwood <lockwobr@gmail.com>
What
Fixes the
helm upgradefailure where Kubernetes rejects an immutablespec.selectorchange introduced by the skyhook-operator -> nodewright chartrename, and migrates the maintenance-job image off the now-paywalled
bitnami/kubectl.Fixes #285. Closes #207. (Reported downstream in NVIDIA/aicr#1382.)
Why it fails (and when)
chart/templates/deployment.yamlsourced the controller-manager selector'sapp.kubernetes.io/namefrom the chart name, which changed at v0.16.0(
skyhook-operator->nodewright).Deployment.spec.selectoris immutable.The failure only triggers when the controller-manager Deployment name is
preserved across the rename, forcing Helm to patch in place. That happens when
a release pins the name via
fullnameOverride/nameOverride(e.g. AICR setsfullnameOverride: skyhook-operator). Without a pinned name, Helm renders adifferently named Deployment and simply create+deletes it, so the upgrade
succeeds (silently replacing the operator) rather than erroring.
Changes
chart.managerSelectorLabelsin_helpers.tpl,used by both the Deployment selector and the migration hook so they reason
about the identical label set. This does not change the selector's rendered
value; it keeps the chart's existing three-label selector.
selector-migration-{job,serviceaccount,role, rolebinding}.yaml(gated byselectorMigration.enabled, defaulttrue). TheJob reads the live Deployment selector and deletes it only when it does not
match the desired selector (per-label
jsonpathcompare), letting Helmrecreate it. It is a no-op (no downtime) on normal upgrades. It runs under its
own least-privilege ServiceAccount/Role (get/list/delete on deployments),
because during
pre-upgradethe release's manager RBAC is not applied yet.bitnami/kubectlcleanup-job image to a maintained, versioned alternative #207).webhook.removalImage/Tag/Digestand the threemaintenance Jobs move from
bitnami/kubectltoalpine/kubectl:1.36.2pinned by multi-arch digest.helm-template-testgains structural chainsaw asserts for theselector pin and each hook resource (single-doc templates so
assertcan matchby kind), plus a disabled-gate check.
chart/RELEASE_NOTES.md(upgrade notes), updateddocs/release-process.mdimage example, andROADMAP.md.New pattern (flagged per CONTRIBUTING)
This introduces the chart's first
pre-upgradehook with its own dedicatedhook RBAC (the existing hooks are
pre-deleteand reuse the controller-managerSA). A dedicated SA is required because the new manager RBAC is not yet applied
during
pre-upgrade, and the manager SA lacks deployment-delete permission. Thehook is modeled on the existing
cleanup-*-job.yamlhooks otherwise.chart/ vs operator/config
This pins the chart's existing three-label selector; it does not re-align to
operator/config/manager/manager.yaml, which intentionally stays narrower(
control-planeonly). No kustomize mirror is needed (this is a Helm-onlyinstall-time concern). Other selector-bearing resources were audited: the PDB
keys on
chart.fullname(moves with its own name, not affected); Service /NetworkPolicy selectors are mutable; CRD
matchLabelsare schema fields. TheDeployment selector was the only immutable-field instance.
Testing
skyhook-public/skyhook-operatorv0.15.1 ->this chart, with
fullnameOverride: skyhook-operator):spec.selector ... field is immutablenodewright, Deploymentrecreated
helm lint: clean