Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel params set by the config daemon should be verified #486

Merged
merged 2 commits into from
Oct 13, 2023

Conversation

wizhaoredhat
Copy link
Contributor

@wizhaoredhat wizhaoredhat commented Aug 3, 2023

In the case of vfio-pci, the config daemon might not notice it has to reboot to add kernel parameters. This could happen if the vfio driver state was updated but the config daemon was interrupted (e.g policy was removed and readded quickly).

This could lead to a pending change to the kernel parameters via grubby or os-tree but without a reboot, these changes wouldn't be applied.

This change introduces a map to hold the desired kernel parameters that the config daemon wishes to apply. This map is then validated against "/proc/cmdline" to see if the desired kernel parameters are actually applied. If not then there will be a subsequent attempt to apply these parameters.

We need to make sure that the kernel parameters are set, thus we should pass the errors up to the config daemon instead of silently consuming the errors when setting kernel parameters.

@github-actions
Copy link

github-actions bot commented Aug 3, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@wizhaoredhat
Copy link
Contributor Author

/cc @SchSeba
/cc @zeeke
/cc @e0ne
/cc @adrianchiris
/cc @zshi-redhat

@github-actions github-actions bot requested a review from zshi-redhat August 3, 2023 22:41
@wizhaoredhat
Copy link
Contributor Author

/cc @lmilleri

@coveralls
Copy link

coveralls commented Aug 3, 2023

Pull Request Test Coverage Report for Build 6502105920

  • 24 of 89 (26.97%) changed or added relevant lines in 2 files are covered.
  • 12 unchanged lines in 4 files lost coverage.
  • Overall coverage decreased (-0.2%) to 25.039%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/utils/utils.go 0 17 0.0%
pkg/plugins/generic/generic_plugin.go 24 72 33.33%
Files with Coverage Reduction New Missed Lines %
controllers/sriovnetwork_controller.go 2 71.32%
api/v1/helper.go 3 42.04%
pkg/plugins/generic/generic_plugin.go 3 42.62%
controllers/sriovibnetwork_controller.go 4 68.22%
Totals Coverage Status
Change from base Build 6501142942: -0.2%
Covered Lines: 2238
Relevant Lines: 8938

💛 - Coveralls

@wizhaoredhat
Copy link
Contributor Author

/hold

@github-actions github-actions bot added the hold label Aug 4, 2023
@wizhaoredhat
Copy link
Contributor Author

@bn222

@bn222
Copy link
Collaborator

bn222 commented Aug 7, 2023

I don't think this is fixing the issue. We aren't reconciling the state of the system to a desired state whenever the current state is different. What we want is to trigger setting the desired kargs at any point when they are set to something else then we expect.

Reference for this: OCPBUGS-16909

It's a step in the good direction though, by performing the necessary check when we are making a change. We just need to have this check be executed more frequently.

pkg/utils/utils.go Outdated Show resolved Hide resolved
@bn222
Copy link
Collaborator

bn222 commented Aug 8, 2023

We have one remaining situation to fix due to how enable-kargs.sh works. Let's say we've run enable-kargs.sh, and we returned non-zero, i.e. we made a change and we need to reboot. Now, before we actually get to the point of the reboot, the daemon is restarted. Now, assume again execution gets to the point of running enable-kargs.sh but this time, since rpm-ostree kargs contains the pending kargs, it will return 0 since from the perspective of rpm-ostree we are good and we won't reboot. We need to fix this on line 10 in enable-kargs.sh

pkg/utils/utils.go Outdated Show resolved Hide resolved
@wizhaoredhat
Copy link
Contributor Author

wizhaoredhat commented Aug 8, 2023

We have one remaining situation to fix due to how enable-kargs.sh works. Let's say we've run enable-kargs.sh, and we returned non-zero, i.e. we made a change and we need to reboot. Now, before we actually get to the point of the reboot, the daemon is restarted. Now, assume again execution gets to the point of running enable-kargs.sh but this time, since rpm-ostree kargs contains the pending kargs, it will return 0 since from the perspective of rpm-ostree we are good and we won't reboot. We need to fix this on line 10 in enable-kargs.sh

@bn222
I thought the logic works in your case. But maybe I misunderstood.

        if [[ $args != *${t}* ]];then  // This would be true in your case because cmdLine does not have the kernel parameter (we made a change and we didn't reboot). This will always be true because we didn't reboot.
            ...
            let ret++   // We increment this value and return it.
        fi

If ret > 0 then we will be marked for reboot.

@wizhaoredhat
Copy link
Contributor Author

I don't think this is fixing the issue. We aren't reconciling the state of the system to a desired state whenever the current state is different. What we want is to trigger setting the desired kargs at any point when they are set to something else then we expect.

Reference for this: OCPBUGS-16909

It's a step in the good direction though, by performing the necessary check when we are making a change. We just need to have this check be executed more frequently.

Yes, I agree this is an issue. I was thinking of calling it earlier in nodeStateSyncHandler.

@bn222
Copy link
Collaborator

bn222 commented Aug 9, 2023

We have one remaining situation to fix due to how enable-kargs.sh works. Let's say we've run enable-kargs.sh, and we returned non-zero, i.e. we made a change and we need to reboot. Now, before we actually get to the point of the reboot, the daemon is restarted. Now, assume again execution gets to the point of running enable-kargs.sh but this time, since rpm-ostree kargs contains the pending kargs, it will return 0 since from the perspective of rpm-ostree we are good and we won't reboot. We need to fix this on line 10 in enable-kargs.sh

@bn222 I thought the logic works in your case. But maybe I misunderstood.

        if [[ $args != *${t}* ]];then                                                                             // This would be true in your case because cmdLine does not have the kernel parameter (we made a change and we didn't reboot). This will always be true because we didn't reboot.
            ...
            let ret++                                                                                                        // We increment this value and return it.
        fi

If ret > 0 then we will be marked for reboot.

Yes, that happens in normal execution. What happens when we kill the pod immediately after we are marked for reboot? I think we will run through the whole logic from start again, but this time we will not mark for reboot, and we will just wait. iow, I believe there is an edge case due to non-idempotency.

@SchSeba
Copy link
Collaborator

SchSeba commented Aug 9, 2023

Hi folks,

let me add some ideas here.

we can improve this one with a simple fix I think.
instead of checking the rpm-ostree command return to see if we change it or not we just need to look on what is really in the system by doing cat /proc/cmdline

and I don't think we need to revalidate this every few seconds it's a waster of resource as this variable is loaded or unloaded after reboot you can't do it when the system is already running.

meaning if the user disable it but didn't reboot all good we still have it.
and after reboot the operator will see that the variable is not there and will apply it and reboot.

so again I think the only change needed is to check if we need to reboot by looking on the source of true that is /proc/cmdline and not the output of the rpm-ostree or grubby

@bn222
Copy link
Collaborator

bn222 commented Aug 9, 2023

that is correct and also what I'm proposing. No need to recheck that every time. A reboot implies a start of execution from the beginning, but we do need to check /proc/cmdline instead.

While at it, please move the .sh to .go (maybe as a separate patch). I don't see any need to use .sh here.

@bn222
Copy link
Collaborator

bn222 commented Aug 9, 2023

/lgtm

@bn222
Copy link
Collaborator

bn222 commented Aug 9, 2023

discussed with William offline. This is good to go from my side.

@github-actions github-actions bot added the lgtm label Aug 9, 2023
@wizhaoredhat
Copy link
Contributor Author

/unhold

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@wizhaoredhat
Copy link
Contributor Author

/hold cancel

@github-actions github-actions bot removed the hold label Aug 15, 2023
@bn222
Copy link
Collaborator

bn222 commented Aug 21, 2023

LGTM from me on this. We would want to have a separate go function that does the configuring and another that checks if the config needs to be done), but we will address that in a PR that cleans up the .sh file (by moving that functionality into .go) and separates out the two tasks (check & set).

@wizhaoredhat
Copy link
Contributor Author

@adrianchiris @e0ne Could you PTAL, Thanks!

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@wizhaoredhat
Copy link
Contributor Author

/test-all

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

pkg/utils/utils.go Outdated Show resolved Hide resolved
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@bn222
Copy link
Collaborator

bn222 commented Oct 2, 2023

LGTM

@github-actions
Copy link

github-actions bot commented Oct 2, 2023

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@wizhaoredhat
Copy link
Contributor Author

@adrianchiris Please take a look, much appreciated!

@Eoghan1232
Copy link
Collaborator

Eoghan1232 commented Oct 12, 2023

@wizhaoredhat I see the CI failed, could you take a look into that? I will give this PR a quick review tomorrow once I am online

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

In the case of vfio-pci, the config daemon might not notice it has to
reboot to add kernel arguments. This could happen if the vfio driver
state was updated but the config daemon was interrupted (e.g policy was
removed and readded quickly).

This could lead to a pending change to the kernel arguments via grubby
or os-tree but without a reboot, these changes wouldn't be applied.

This change introduces a map to hold the desired kernel arguments that
the config daemon wishes to apply. This map is then validated against
"/proc/cmdline" to see if the desired kernel arguments are actually
applied. If not then there will be a subsequent attempt to apply these
arguments.

We need to make sure that the kernel arguments are set, thus we should
pass the errors up to the config daemon instead of silently consuming
the errors when setting kernel arguments.

Signed-off-by: William Zhao <[email protected]>
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@wizhaoredhat
Copy link
Contributor Author

/test-all

When creating VFs via sysfs sriov_numvfs, there is a chance of getting
the error syscall.ENOMEM "cannot allocate memory". This could occur when
the BIOS is not providing enough MMIO space for VFs. A solution is to
reallocate the MMIO space via "pci=realloc".

Signed-off-by: William Zhao <[email protected]>
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@@ -285,33 +337,39 @@ func (p *GenericPlugin) needDrainNode(desired sriovnetworkv1.Interfaces, current
return
}

func needRebootIfVfio(state *sriovnetworkv1.SriovNetworkNodeState, driverMap DriverStateMapType) (needReboot bool) {
driverState := driverMap[Vfio]
func (p *GenericPlugin) addVfioDesiredKernelArg(state *sriovnetworkv1.SriovNetworkNodeState) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: addVfioDesiredKernelArg -> addVfioDesiredKernelArgs to keep it the same as other methods ? WDYT.
Although, we are only adding a single arg here....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer singular. It's only adding 1 arg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer it being singular. Let's keep it as is. Thanks for reviewing!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its adding two, what am i missing ?

L343, L344

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I grouped IOMMU as singular. There is infact Intel IOMMU and the general IOMMU. However this is a small NIT on naming.

Copy link
Collaborator

@Eoghan1232 Eoghan1232 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, PR lgtm.
Can you confirm CI failure is due to CI itself and not this PR?

@zeeke
Copy link
Member

zeeke commented Oct 13, 2023

overall, PR lgtm. Can you confirm CI failure is due to CI itself and not this PR?

k8s and ocp jobs also fail in #518. Not related to this job

@bn222 bn222 merged commit 49fff50 into k8snetworkplumbingwg:master Oct 13, 2023
10 of 11 checks passed
set := utils.IsKernelArgsSet(kargs, desiredKarg)
if !set {
if attempted {
glog.V(2).Infof("generic-plugin syncDesiredKernelArgs(): previously attempted to set kernel arg %s", desiredKarg)
Copy link
Collaborator

@adrianchiris adrianchiris Oct 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wizhaoredhat if you previously attempted to set the arg, it means setKernlelArg succeeded.
then why do we need to set it again ? shouldnt we "continue" here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This captures the case where we hit this case, but either the kernel arg wasn't set properly or if the reboot did not occur for some reason. This results in IsKernelArgsSet returning false even though it was set before. So we should try again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants