Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope ValidatingAdmissionPolicy in DRA e2e tests more narrowly #129514

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nojnhuh
Copy link
Contributor

@nojnhuh nojnhuh commented Jan 8, 2025

What type of PR is this?

/kind bug
/kind flake

What this PR does / why we need it:

This PR fixes a flake in the DRA e2e tests where a ValidatingAdmissionPolicy used for one test is interfering with the creation of a ResourceClaimTemplate in an unrelated test when both happen to run in parallel at about the same time.

Which issue(s) this PR fixes:

Fixes #128493

Special notes for your reviewer:

Before this PR, I got the test to fail reliably with the following changes:

diff --git a/test/e2e/dra/dra.go b/test/e2e/dra/dra.go
index c90eac646d7..9862840fafe 100644
--- a/test/e2e/dra/dra.go
+++ b/test/e2e/dra/dra.go
@@ -917,6 +917,8 @@ var _ = framework.SIGDescribe("node")("DRA", feature.DynamicResourceAllocation,
 				}),
 			)))
 
+			time.Sleep(10 * time.Second)
+
 			// Attempt to create claim and claim template with admin access. Must fail eventually.
 			claim := b.externalClaim()
 			claim.Spec.Devices.Requests[0].AdminAccess = ptr.To(true)
@@ -1019,6 +1021,7 @@ var _ = framework.SIGDescribe("node")("DRA", feature.DynamicResourceAllocation,
 		})
 
 		f.It("DaemonSet with admin access", feature.DRAAdminAccess, framework.WithFeatureGate(features.DRAAdminAccess), framework.WithFeatureGate(features.DynamicResourceAllocation), func(ctx context.Context) {
+			time.Sleep(5 * time.Second)
 			pod, template := b.podInline()
 			template.Spec.Spec.Devices.Requests[0].AdminAccess = ptr.To(true)
 			// Limit the daemon set to the one node where we have the driver.

and running the e2e tests with the following command, where the only interesting part is -ginkgo.focus='admin access':

rm -rf _artifacts && make WHAT="github.com/onsi/ginkgo/v2/ginkgo k8s.io/kubernetes/test/e2e/e2e.test" && kind create cluster --retain --config <(cat test/e2e/dra/kind.yaml; echo "  DRAAdminAccess: true") --image dra/node:latest && KUBERNETES_PROVIDER=local KUBECONFIG=${HOME}/.kube/config GINKGO_PARALLEL_NODES=8 E2E_REPORT_DIR=$HOME/github/kubernetes/_artifacts GINKGO_TIMEOUT=1h hack/ginkgo-e2e.sh -ginkgo.focus='admin access'; kind export logs $HOME/github/kubernetes/_artifacts/kind; kind delete cluster

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/flake Categorizes issue or PR as related to a flaky test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 8, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jan 8, 2025
@k8s-ci-robot k8s-ci-robot requested review from klueska and pohly January 8, 2025 03:12
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nojnhuh
Once this PR has been reviewed and has the lgtm label, please assign klueska for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. wg/device-management Categorizes an issue or PR as relevant to WG Device Management. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 8, 2025
Comment on lines +25 to +28
objectSelector:
matchExpressions:
- key: vap-enforced
operator: Exists
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that using a namespaceSelector here like @pohly tried in #128493 doesn't work because that selector operates only on labels, not other fields like the name. Open to suggestions if there's a better way to apply this policy less widely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea is that this VAP only applies to claims and claim templates which opt into using it. I can see how this works for E2E testing.

But it's not something that admins should do for real, so this would need more comments along the lines of "don't do this when setting up admin access in your cluster".

Given that we probably will have a different enforcement mechanism for admin access (kubernetes/enhancements#5018), let's perhaps put this PR on hold for now?

validations:
- expression: '! object.spec.devices.requests.exists(e, has(e.adminAccess) && e.adminAccess)'
reason: Forbidden
messageExpression: '"admin access to devices not enabled"' # in namespace " + object.metadata.namespace' - need to use __namespace__, but somehow that also doesn't work.
messageExpression: '"admin access to devices not enabled in namespace " + namespaceObject.metadata.name'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't related to the flake fix, but this seemed like a TODO that I happened to figure out along the way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth a separate PR, but I'm also okay with a drive-by change here.

Have you figured out why object.metadata.namespace or (because of the reserved name?) object.metadata.__namespace__ didn't work?

I didn't know that namespaceObject exists. It's not described in https://kubernetes.io/docs/reference/using-api/cel/. Where did you learn about it? Should it be in the reference docs?

/cc @jpbetz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found namespaceObject in the reference docs for this particular field, though the website renders the list unfortunately: https://kubernetes.io/docs/reference/kubernetes-api/policy-resources/validating-admission-policy-v1/#:~:text=%2D%20%27namespaceObject%27%20%2D%20The%20namespace%20object%20that%20the%20incoming%20object%20belongs%20to.%20The%20value%20is%20null%20for%20cluster%2Dscoped%20resources. It's easier to read the code comment itself:

// - 'namespaceObject' - The namespace object that the incoming object belongs to. The value is null for cluster-scoped resources.

Elsewhere in that comment it explains why object.metadata.namespace doesn't work I think:

// The `apiVersion`, `kind`, `metadata.name` and `metadata.generateName` are always accessible from the root of the
// object. No other metadata properties are accessible.

And the way I'm reading it, object.__namespace__ is a hypothetical example not related to an object's metadata.namespace. Still not exactly sure what a "property" means in this context though:

// - Expression accessing a property named "namespace": {"Expression": "object.__namespace__ > 0"}

I'll split this out into a separate PR since we might be holding out on the rest of these changes for a bit while we iterate on the admin access KEP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #129530 for these changes and removed them from this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it is part of the VAP API. I had not looked there before.

// The `apiVersion`, `kind`, `metadata.name` and `metadata.generateName` are always accessible from the root of the
// object. No other metadata properties are accessible.

looks like the right explanation: there is an explicit allow list for fields in ObjectMeta, and namespace is not included - for whatever reason.

@k8s-ci-robot k8s-ci-robot requested a review from jpbetz January 8, 2025 08:31
@AnishShah
Copy link
Contributor

/cc @bart0sh

@k8s-ci-robot k8s-ci-robot requested a review from bart0sh January 8, 2025 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/flake Categorizes issue or PR as related to a flaky test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.
Projects
Status: PRs - Needs Reviewer
Status: 🏗 In progress
4 participants