Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ContainerRegistryProvider to skip Match ContainerRegistryUrls when query the token #793

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cy-google
Copy link

We will have the credential provider unconditionally attach the the access token to the registry in the image request, and then rely on the Kubelet config to restrict which registries the access token will be sent to

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 20, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If the repository mantainers determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

Hi @cy-google. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 20, 2024
@liggitt
Copy link
Member

liggitt commented Dec 20, 2024

one nit on the variable name, lgtm otherwise

@liggitt
Copy link
Member

liggitt commented Dec 20, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 20, 2024
@liggitt
Copy link
Member

liggitt commented Dec 20, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 20, 2024
@cy-google
Copy link
Author

/retest

@cy-google
Copy link
Author

/assign sergeykanzhelev

pkg/gcpcredential/gcpcredential.go Show resolved Hide resolved
if g.UseRegistryFromImage {
if registry, _, found := strings.Cut(image, "/"); found {
cfg[registry] = entry
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if url cutting by / failed - we will have no indication of an issue anywhere. Perhaps some warning log can be added or even better - error response returned.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. No error response can be returned but just klog.Errorf

}
return cfg
}

// Add our entry for each of the supported container registry URLs
for _, k := range containerRegistryUrls {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember the logic too well. But here is the question. Old logic added all wildcarded urls. My thinking here - if image url will redirect from one gcr.io blob to another, we will still use the auth token. With the new approach, no redirects will have the auth token.

Is it a real issue that may happen? Who would be the best person to review it from security perspective?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if image url will redirect from one gcr.io blob to another, we will still use the auth token. With the new approach, no redirects will have the auth token

I think the change for the new logic is that we don't have allowed endpoints for registry anymore(line 55), instead, we will get the token no matter what the registry is in the image url. I don't fully sure what the re-direct is in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the redirect happens within the container runtime pull attempt, it doesn't have a chance to go get another credential for another registry... I don't actually know what happens if the container runtime is asked to pull an image from registry foo using a credential config like {"foo":{...}} and the registry redirects to registry bar ... does the container runtime look for an entry in the credential matching bar or reuse the entry for foo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt you are right, it will be on container runtime side. Auth will only be reused for the same domain

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to return a wildcard to indicate whatever registry the request gets redirected to can use the credential? Is that a reasonable thing to do?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the container runtime look for an entry in the credential matching bar or reuse the entry for foo

My understanding is that only the registry in the key of the credential config can be pulled(otherwise, it will try to pull the image with no credential from bar)

is there a way to return a wildcard to indicate whatever registry the request gets redirected to can use the credential? Is that a reasonable thing to do?

I don't know if there is a way to do so without changing the logic in kubelet. Not sure who can tell if it's reasonable. @SergeyKanzhelev may know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thread got marked as outdated, continued it at #793 (comment)

@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 20, 2024
if registry, _, found := strings.Cut(image, "/"); found {
cfg[registry] = entry
} else {
klog.Errorf("Invalid image registry URL: %s. The URL must contain a '/' character to separate the registry domain from the image path. Please check the URL and ensure it's correctly formatted.", image)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this goes to stderr, right? that only gets surfaced / logged in the kubelet if the plugin invocation errors

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I did this because other errors are also be printed in stderr by using klog.Errorf. I wonder if there is a reason for us to change the Provide function to return this specific error instead?

}
} else {
// Add our entry for each of the supported container registry URLs
for _, k := range containerRegistryUrls {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the two potential downsides of this change comparing to the existing logic:

  1. caching and reading of credentials will be done for each sub-domain separately instead of reading once and using it from the cache by the wildecard. How expensive is the read of credentials? Was this slow down considered and validated?

  2. each subdomain will keep it's own entry now in the kubelet. I don't think it may lead to memory leak. At least I am not aware of any "dynamic" subdomains that may grow uncontrollably. So this risk is minimal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cache level for this was already per image. There's no slow-down or leak increase if that is the case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Jordan said, I think every time we pull an image by kueblet, we will use the domain of the registry, e.g. {"kind":"CredentialProviderResponse","apiVersion":"credentialprovider.kubelet.k8s.io/v1","cacheKeyType":"Image","cacheDuration":"0s","auth":{"gcr.io":{"username":"_token","password":"xxx"}}}, so AFAIK, we don't haveusing it from the cache by the wildcard for now, instead, we query the credential config everytime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think kubelet has this function: https://github.com/kubernetes/kubernetes/blob/4114a9b4e45a4df96f0383d87b2649640a6ffbf1/pkg/credentialprovider/keyring.go#L205C6-L205C15 and old logic was returning the cache entry that included globs. So token were reused.

I am not implying it is a huge overhead to call this thing multiple times. I just want to make sure we are making this change very intentionally and understand the cost of it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is at least my recollection on how it suppose to work. I didn't find a good test for it from the fast glance over

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think kubelet has this function: https://github.com/kubernetes/kubernetes/blob/4114a9b4e45a4df96f0383d87b2649640a6ffbf1/pkg/credentialprovider/keyring.go#L205C6-L205C15 and old logic was returning the cache entry that included globs. So token were reused.

That is used to decide if a given configured plugin should even be called to provide a credential for an image here:

// isImageAllowed returns true if the image matches against the list of allowed matches by the plugin.
func (p *pluginProvider) isImageAllowed(image string) bool {
	for _, matchImage := range p.matchImages {
		if matched, _ := credentialprovider.URLsMatchStr(matchImage, image); matched {
			return true
		}
	}

	return false
}

It is not used to decide the cache key.

For credentials returned from the plugin with a cache key level of "image", which this defaults to:

... the credentials are only cached under the exact image:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as this plugin is being used with image-level credential caching (which is the default) or registry-level caching, there is no difference in the number of times the plugin would be invoked to get credentials by the kubelet for images from a given registry

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sorry for the noise

// Currently, this is only used by auth-provider-gcp.
if g.UseRegistryFromImage {
if registry, _, found := strings.Cut(image, "/"); found {
cfg[registry] = entry
Copy link
Member

@liggitt liggitt Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continuing the thread from #793 (comment)

By using only the exact image from the registry (instead of partially wildcarded registries), if the registry forwards the image pull request to a different domain or a subdomain, the container runtime will not use the credential when re-attempting the pull against the redirected location

Previously, this returned credentials usable against "container.cloud.google.com", "gcr.io", "*.gcr.io", "*.pkg.dev", so any container registry forwards among those domains could use the credentials.

I don't really know what forwards to expect, so I have a hard time knowing whether this is an issue or not.

Options I see:

  1. Expect no forwards, return exactly the registry requested. If forwards to a different domain happen, no credentials will be used and the pull will fail if authentication is required.
  2. Add the registry for the requested image alongside the existing wildcard domains. This ensures the credential returned is usable for the exact domain of the image, plus all existing possible forwarded domains, but not any other forwarded domains outside the hard-coded set.
  3. Return the credential with a global wildcard (e.g. "*") that allows using it against any forwarded domain. I'm not 100% sure "*" works that way in the container registry or not for credential matching, and I'm not 100% sure we want credentials to be usable against any forwarded domain

1 is most strict / secure, but risks breaking existing forwarding.

2 ensures we won't regress anything that is currently working, but may not work for forwarding to new domains not in the hard-coded set.

3 (if "*" works as a global wildcard) is most permissive, most likely to work with any forwarding, but could allow credentials to be sent to unintended registries if the original registry forwarded out to them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @samuelkarp

Return the credential with a global wildcard (e.g. "") that allows using it against any forwarded domain. I'm not 100% sure "" works that way in the container registry or not for credential matching, and I'm not 100% sure we want credentials to be usable against any forwarded domain

this is not ideal and may lead to information disclosure. Similar to this CVE: GHSA-742w-89gc-8m9c. Yes, it is less critical since the redirect will be done by the reasonably trusted domains. But still concerning.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cy-google
Once this PR has been reviewed and has the lgtm label, please ask for approval from sergeykanzhelev. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@liggitt
Copy link
Member

liggitt commented Jan 8, 2025

change looks good to me, please squash to a single commit and we can merge

…f when query the token

Update the new field name from SkipContainerRegistryUrlsMatching to UseRegistryFromImage

Update the logic in Provide function to handle wrong format image and use else for old logic

Adopt option 2 mentioned in https://github.com/kubernetes/cloud-provider-gcp/pull/793/files#r1904560074

This will only append the registry of the requested image if it's not hardcoded in containerRegistryUrls

Stop logging error when we can't cut image by "/"

Also not removing the check for container.cloud.google.com
@cy-google
Copy link
Author

change looks good to me, please squash to a single commit and we can merge

Done. Thanks @SergeyKanzhelev can you please take a look too please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants