Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FG:InPlacePodVerticalScaling] Incomplete prerequisites for “Resize CPU and Memory Resources assigned to Containers” #41365

Open
THMAIL opened this issue May 29, 2023 · 19 comments · May be fixed by #49335
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. language/en Issues or PRs related to English language lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/docs Categorizes an issue or PR as relevant to SIG Docs. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@THMAIL
Copy link

THMAIL commented May 29, 2023

my k8s version:1.27.2

kubectl get nodes
NAME STATUS ROLES AGE VERSION
172.30.94.14 Ready 7d v1.27.2
172.30.94.201 Ready 7d v1.27.2
ecs6w3fxmxy5c.novalocal Ready control-plane 7d v1.27.2

Problem

I want to try in-place update and I do as the document describe.
But when I execute the cmd kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"cpu":"800m"}, "limits":{"cpu":"800m"}}}]}}',it threw err:

The Pod "qos-demo-5" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`,`spec.initContainers[*].image`,`spec.activeDeadlineSeconds`,`spec.tolerations` (only additions to existing tolerations),`spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)
  core.PodSpec{
        Volumes:        {{Name: "kube-api-access-p29n4", VolumeSource: {Projected: &{Sources: {{ServiceAccountToken: &{ExpirationSeconds: 3607, Path: "token"}}, {ConfigMap: &{LocalObjectReference: {Name: "kube-root-ca.crt"}, Items: {{Key: "ca.crt", Path: "ca.crt"}}}}, {DownwardAPI: &{Items: {{Path: "namespace", FieldRef: &{APIVersion: "v1", FieldPath: "metadata.namespace"}}}}}}, DefaultMode: &420}}}},
        InitContainers: nil,
        Containers: []core.Container{
                {
                        ... // 6 identical fields
                        EnvFrom: nil,
                        Env:     nil,
                        Resources: core.ResourceRequirements{
                                Limits: core.ResourceList{
-                                       s"cpu":    {i: resource.int64Amount{value: 700, scale: -3}, s: "700m", Format: "DecimalSI"},
+                                       s"cpu":    {i: resource.int64Amount{value: 800, scale: -3}, s: "800m", Format: "DecimalSI"},
                                        s"memory": {i: {...}, Format: "BinarySI"},
                                },
                                Requests: core.ResourceList{
-                                       s"cpu":    {i: resource.int64Amount{value: 700, scale: -3}, s: "700m", Format: "DecimalSI"},
+                                       s"cpu":    {i: resource.int64Amount{value: 800, scale: -3}, s: "800m", Format: "DecimalSI"},
                                        s"memory": {i: {...}, Format: "BinarySI"},
                                },
                                Claims: nil,
                        },
                        ResizePolicy: nil,
                        VolumeMounts: {{Name: "kube-api-access-p29n4", ReadOnly: true, MountPath: "/var/run/secrets/kubernetes.io/serviceaccount"}},
                        ... // 12 identical fields
                },
        },
        EphemeralContainers: nil,
        RestartPolicy:       "Always",
        ... // 28 identical fields
  }

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 29, 2023
@niranjandarshann
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot added language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. labels May 29, 2023
@niranjandarshann
Copy link
Contributor

/kind support

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label May 29, 2023
@sftim
Copy link
Contributor

sftim commented May 29, 2023

/retitle Incomplete prerequisites for “Resize CPU and Memory Resources assigned to Containers”

/remove-kind support
/kind bug

The prereqisited section of https://kubernetes.io/docs/tasks/configure-pod-container/resize-container-resources/ should state that your cluster must have InPlacePodVerticalScaling enabled on the control plane and on nodes; however, it does not.
/triage accepted
/priority backlog
/sig node

Thank you for reporting this @THMAIL

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 29, 2023
@k8s-ci-robot k8s-ci-robot changed the title Resize CPU and Memory Resources assigned to Containers Incomplete prerequisites for “Resize CPU and Memory Resources assigned to Containers” May 29, 2023
@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. kind/support Categorizes issue or PR as a support question. labels May 29, 2023
@THMAIL
Copy link
Author

THMAIL commented May 29, 2023

/retitle Incomplete prerequisites for “Resize CPU and Memory Resources assigned to Containers”

/remove-kind support /kind bug

The prereqisited section of https://kubernetes.io/docs/tasks/configure-pod-container/resize-container-resources/ should state that your cluster must have InPlacePodVerticalScaling enabled on the control plane and on nodes; however, it does not. /triage accepted /priority backlog /sig node

Thank you for reporting this @THMAIL

Thank you for your reply.And I have modified the file /etc/kubernetes/manifests/kube-apiserver.yaml, add InPlacePodVerticalScaling=true

But there's another problem:

  1. I execute the cmd kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"cpu":"800m"}, "limits":{"cpu":"800m"}}}]}}'
  2. the pod can't start ,run kubectl get pod qos-demo-5 --namespace=qos-example -o wide
NAME         READY   STATUS             RESTARTS        AGE   IP                NODE            NOMINATED NODE   READINESS GATES
qos-demo-5   0/1     CrashLoopBackOff   8 (5m11s ago)   17m   192.168.239.146   172.30.94.201   <none>           <none>
  1. run kubectl describe pod qos-demo-5 --namespace=qos-example,Event log:
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  4m32s                  default-scheduler  Successfully assigned qos-example/qos-demo-5 to 172.30.94.201
  Normal   Pulled     4m28s                  kubelet            Successfully pulled image "nginx" in 2.299559292s (2.299579947s including waiting)
  Normal   Started    4m28s                  kubelet            Started container qos-demo-ctr-5
  Normal   Killing    3m13s                  kubelet            Container qos-demo-ctr-5 definition changed, will be restarted
  Normal   Pulled     3m10s                  kubelet            Successfully pulled image "nginx" in 2.311044787s (2.311062277s including waiting)
  Normal   Pulled     3m7s                   kubelet            Successfully pulled image "nginx" in 2.167481718s (2.167497407s including waiting)
  Normal   Pulled     2m50s                  kubelet            Successfully pulled image "nginx" in 2.217118706s (2.217147034s including waiting)
  Normal   Pulling    2m21s (x5 over 4m30s)  kubelet            Pulling image "nginx"
  Normal   Created    2m19s (x5 over 4m28s)  kubelet            Created container qos-demo-ctr-5
  Warning  Failed     2m19s (x4 over 3m10s)  kubelet            Error: failed to start container "qos-demo-ctr-5": Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: failed to write "80000": write /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-podd6170de3_c124_47d6_a641_6b10f5b690cb.slice/qos-demo-ctr-5/cpu.cfs_quota_us: invalid argument: unknown
  Normal   Pulled     2m19s                  kubelet            Successfully pulled image "nginx" in 2.09857171s (2.098667953s including waiting)
  Warning  BackOff    110s (x4 over 2m34s)   kubelet            Back-off restarting failed container qos-demo-ctr-5 in pod qos-demo-5_qos-example(d6170de3-c124-47d6-a641-6b10f5b690cb)

my docker version is latest:

docker version
Client: Docker Engine - Community
 Version:           24.0.1
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        6802122
 Built:             Fri May 19 18:06:42 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.1
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       463850e
  Built:            Fri May 19 18:05:43 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

@THMAIL
Copy link
Author

THMAIL commented May 29, 2023

Linux 172.30.94.201 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

@THMAIL
Copy link
Author

THMAIL commented May 29, 2023

The path /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-podd6170de3_c124_47d6_a641_6b10f5b690cb.slice/qos-demo-ctr-5/cpu.cfs_quota_us didn't exist!

run cmd docker ps -a|grep qos:

0e73e98eb193   nginx                       "/docker-entrypoint.…"   2 minutes ago    Up 2 minutes                          k8s_qos-demo-ctr-5_qos-demo-5_qos-example_a7eca3d5-bd01-4d8f-ab96-9196a79c1629_0
c122a7cd4d1b   registry.k8s.io/pause:3.6   "/pause"                 2 minutes ago    Up 2 minutes                          k8s_POD_qos-demo-5_qos-example_a7eca3d5-bd01-4d8f-ab96-9196a79c1629_0

run cmd ls /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-podd6170de3_c124_47d6_a641_6b10f5b690cb.slice/:

0e73e98eb193503a997d1e3c7073713f514c81a35b8a88fa49f5492c7860eb0d  cgroup.event_control  cpuacct.usage         cpu.cfs_quota_us   cpu.shares         tasks
c122a7cd4d1b1125fccebd6e6b24c886943213d772285f1b32a065f0a924b48d  cgroup.procs          cpuacct.usage_percpu  cpu.rt_period_us   cpu.stat
cgroup.clone_children                                             cpuacct.stat          cpu.cfs_period_us     cpu.rt_runtime_us  notify_on_release

so did the path is error? Is this a problem with my boot parameter or a bug?

@sftim
Copy link
Contributor

sftim commented May 29, 2023

If you do want help with Kubernetes @THMAIL, please ask elsewhere. This issue tracker is the right place to tell us about shortcomings in the docs, and the wrong place to get advice on using features (alpha or otherwise).

If / when you can point out a new problem, you are welcome to file an issue so that we can cover that. SIG Node can then look at improving the docs for the beta.

@dshebib
Copy link
Contributor

dshebib commented Jun 22, 2023

/assign

@sftim Quick question about feature gates, is there a way to specify the specific feature gate that must be enbaled for alpha/beta features within the feature state tag so that we don't have to manually edit docs every time a feature graduates or the Kubernetes version updates?

@criscola
Copy link

Can someone please write precisely how to enable this feature? I tried to pass the flag like that: --feature-gates=InPlacePodVerticalScaling=true to kube-scheduler but kube still forbids patches to pod resources.

@sftim
Copy link
Contributor

sftim commented Jun 28, 2023

Hi @criscola

This issue is still waiting for a volunteer / contributor to pick it up and work on a fix.

@wenzhaojie
Copy link

Can someone please write precisely how to enable this feature? I tried to pass the flag like that: --feature-gates=InPlacePodVerticalScaling=true to kube-scheduler but kube still forbids patches to pod resources.

This is my config, anyone help me to check it?

cat > config.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.122.41
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  name: kubernetes-01
  taints: null
  kubeletExtraArgs:
    feature-gates: InPlacePodVerticalScaling=true
---
apiServer:
  timeoutForControlPlane: 4m0s
  extraArgs:
    feature-gates: InPlacePodVerticalScaling=true
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager:
  extraArgs:
    feature-gates: InPlacePodVerticalScaling=true
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.27.2
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: "192.168.0.0/16"
scheduler:
  extraArgs:
    feature-gates: InPlacePodVerticalScaling=true
EOF

@criscola
Copy link

criscola commented Jul 3, 2023

I confirm @wenzhaojie config is correct. To summarize the feature needs the corresponding feature gate InPlacePodVerticalScaling=true passed to the following components:

  • API server
  • Controller manager
  • Scheduler
  • Kubelet on worker
  • Kubelet on control plane

that should do the trick. Would be great to spend a paragraph somewhere to mention this, maybe we can edit this blog post with a short note? https://kubernetes.io/blog/2023/05/12/in-place-pod-resize-alpha/

@tengqm
Copy link
Contributor

tengqm commented Jul 16, 2023

Regarding this issue, it might be obvious that the in-place vertical scaling has to get kubelet involved. There are some technical implementation details as well. The scheduler has to reconsider the resource requests and limits, the ResourceQuota controller has to adjust its behavior as well, so on and so forth.

This leads me to rethink about a related topic. Maybe we were right when we avoided to document the feature gates on a per-component basis. Today the feature gate list is "shared" by all components. The implementation for some features like this one (in-place scaling) may involve several components. There could be a chance that feature FOO is only about the API server and the scheduler today, but soon the developers realize that the controller-manager has to do something as well to cover a corner case.

@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jul 15, 2024
@kannon92 kannon92 moved this to Triaged in SIG Node Bugs Jul 22, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 13, 2024
@haircommander haircommander moved this from Triaged to Triage in SIG Node Bugs Oct 23, 2024
@haircommander
Copy link
Contributor

cc @tallclair @AnishShah @esotsal

From a quick glance this looks like a documentation limitation, though the code changes for beta may also affect this situation

@haircommander
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Oct 23, 2024
@haircommander haircommander moved this from Triage to Triaged in SIG Node Bugs Oct 23, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 23, 2024
@esotsal
Copy link

esotsal commented Oct 24, 2024

/retitle [FG:InPlacePodVerticalScaling] Incomplete prerequisites for “Resize CPU and Memory Resources assigned to Containers”

@k8s-ci-robot k8s-ci-robot changed the title Incomplete prerequisites for “Resize CPU and Memory Resources assigned to Containers” [FG:InPlacePodVerticalScaling] Incomplete prerequisites for “Resize CPU and Memory Resources assigned to Containers” Oct 24, 2024
@esotsal
Copy link

esotsal commented Oct 24, 2024

Hi,

I see that docker engine was used , is cri-dockerd used ? If yes this looks like the same situation described at the first item in InPlacePodVerticalScaling known issues and discussed also here

If cri-dockerd was used then i recommend to repeat the tests using a cri-o or a containerd container runtime version satisfying InPlacePodVerticalScaling CRI APIs requirements.

CRI APIs requirements for InPlacePodVerticalScaling can be found at

Added [FG:InPlacePodVerticalScaling] prefix in title, to use this input to improve documentation especially with the forthcoming graduation of InPlacePodVerticalScaling to beta. Feel free to reach InPlacePodVerticalScaling community also at sig-node-inplace-pod-resize Slack channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. language/en Issues or PRs related to English language lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/docs Categorizes an issue or PR as relevant to SIG Docs. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Triaged
Development

Successfully merging a pull request may close this issue.