-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
virtual: get real PCI address for each device found #516
Conversation
Thanks for your PR,
To skip the vendors CIs use one of:
|
2 similar comments
Thanks for your PR,
To skip the vendors CIs use one of:
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aye, this was the approach I'd be advocating for. Looks pretty reasonable to me, though I do have two comments inline.
Thanks for your PR,
To skip the vendors CIs use one of:
|
1 similar comment
Thanks for your PR,
To skip the vendors CIs use one of:
|
Pull Request Test Coverage Report for Build 6576446197
💛 - Coveralls |
Thanks for your PR,
To skip the vendors CIs use one of:
|
Downstream's CI has seen no regression: openshift/sriov-network-operator#837 However, I'll confirm with the initial bug reporter that this works in newer version of qemu. |
/lgtm |
Thanks for your PR,
To skip the vendors CIs use one of:
|
Hi @EmilienM I don't think this work as expected in all cases. can you try this
I think it this case it will not work as expected because we will not be able to find the device on the host as it's inside the container. or even if we switch the nic driver to be vfio for example no? |
Thanks for your PR,
To skip the vendors CIs use one of:
|
we tried together today and it worked fine. |
Thanks for your PR,
To skip the vendors CIs use one of:
|
1 similar comment
Thanks for your PR,
To skip the vendors CIs use one of:
|
This was well tested on OpenStack: openshift/sriov-network-operator#837 |
Thanks for your PR,
To skip the vendors CIs use one of:
|
just to be clear for reviewers: Sebastian and I spent time together to test a bunch of things and it's all good, so don't take this comment in consideration while reviewing this PR. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@adrianchiris @Eoghan1232 please take a look at this when you have time
/hold cancel |
Thanks for your PR,
To skip the vendors CIs use one of:
|
1 similar comment
Thanks for your PR,
To skip the vendors CIs use one of:
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe current approach is also problematic in some cases. (see my comments)
the way i see it there are two options:
- some different solution/approach
- make this a "best-effort" change so that we are no worse than where we are today
pkg/utils/utils_virtual.go
Outdated
for i, device := range metaData.Devices { | ||
realPCIAddr, err := getPCIAddressFromMACAddress(device.Mac) | ||
if err != nil { | ||
return metaData, networkData, fmt.Errorf("GetOpenstackData(): error getting PCI address for device %s: %w", device.Mac, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if checkpoint file is deleted (sno-initial-node-state.json) for some reason AND config-daemon is restarted while there are pods that consume these devices, config daemon will keep failing as no device will match the mac address (since its not visible in the host network namespace)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SchSeba can correct me if I'm wrong but I think this was already the case before this PR.
I mean, if the checkpoint file is deleted and config-daemon restarts the devices will not be adopted by SNO & a reboot of the worker is required.
Suggestions are welcome otherwise...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with @adrianchiris in case the SNO file is deleted and we restart the daemon AND there is a workload running using the device we will not find it so we should not return and error in case it was not found (let's put a warning)
A better solution we can implement in case:
- SNO doesn't exist
- daemon is restarted
- we have sriov allocated devices
is to drain the node so will remove the devices but it's a pretty big change in the code today so it can be implemented in a separate PR because it's not related to this fix, it's a bug that exist even before this.
here we just need to add a warning if the function getPCIAddressFromMACAddress
returns "device not found base on mac" and not return error that will kill the pod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good for me Seb, I'm adding a comment as well about this plan.
pkg/utils/utils_virtual.go
Outdated
@@ -205,6 +227,33 @@ func getOpenstackDataFromMetadataService() (metaData *OSPMetaData, networkData * | |||
return metaData, networkData, nil | |||
} | |||
|
|||
// getPCIAddressFromMACAddress returns the PCI address of a device given its MAC address | |||
func getPCIAddressFromMACAddress(macAddress string) (string, error) { | |||
nics, err := ghw.Network() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will keep reading all NICs information from the filesystem/ethtool for every MAC address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on my comment above, I think this is fine as we don't support hotplug and a reboot would be required for any hardware change anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont follow, this is plain inefficient from software perspective.
while query from the system the same information over and over again while expecting to get the same result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- How do you know upfront it's a performance issue in the system as a whole. I understand that it feels inefficient, and it probably is. However, does it mean we need to fix it now before it has been identified as a real issue?
- A solution is a caching layer that memoizes calls to this functions (i.e. store a map of macAddress to PCI address in memory). While this is not complex in and of itself, it does mean more code that we need to maintain.
Maybe @adrianchiris has a different idea than 2 in mind? 2 will only help if we actually call this function multiple times frequently enough for it to matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetOpenstackData is called once via writer when daemon starts. so im not aiming at caching it.
what id like to see is essentially moving this line to the caller so that listing the nic happens only once, then the list is passed as parameter to this function.
when there are multiple interfaces passed to the VM (multiple NICs) this becomes quatratic
(instead of listing e.g 16 nics once you would list 16 nics 16 times)
each time triggering OS operations (ethtool / sysfs).
so my idea is essentially a 2 line change for which we spent multiple lines discussing it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice proposal @adrianchiris - thanks. I'll update the pr!
Thanks for your PR,
To skip the vendors CIs use one of:
|
3 similar comments
Thanks for your PR,
To skip the vendors CIs use one of:
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
/test-e2e-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove lgtm until all comments will be addressed
Thanks for your PR,
To skip the vendors CIs use one of:
|
2 similar comments
Thanks for your PR,
To skip the vendors CIs use one of:
|
Thanks for your PR,
To skip the vendors CIs use one of:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for addressing my comments ! LGTM
some minor nit but would not block on it.
Thanks for your PR,
To skip the vendors CIs use one of:
|
the k8s and ocp failures can be ignore I believe, none of them test SNO on OpenStack. They were passing a few days ago, I wonder if something broke? |
HI @EmilienM please add the ID here https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/master/pkg/daemon/daemon.go#L1141 if not we will have again the issue after reboot on non shiftstack deployments |
We can't rely on the PCI address from the metadata so we will lookup the real PCI address for the NIC that matches the MAC address. Libvirt/QEMU cannot guarantee that the address specified in the XML will match the address seen by the guest. This is a well known limitation: https://libvirt.org/pci-addresses.html When using the q35 machine type, it highlights this issue due to the change from using PCI to PCI-E bus for virtual devices. With that said, the PCI value in Nova Metadata is a best effort hint due to the limitations mentioned above. Therefore we will lookup the real PCI address for the NIC that matches the MAC address.
Co-Authored-By: Andrea Panattoni <[email protected]> Co-Authored-By: Emilien Macchi <[email protected]>
https://devicehunt.com/view/type/pci/vendor/1AF4/device/1041 https://qemu.readthedocs.io/en/master/specs/pci-ids.html Modern QEMU brings up virtio 1.0 network device, we need to support that for DPDK.
Thanks for your PR,
To skip the vendors CIs use one of:
|
We can't rely on the PCI address from the metadata so we will lookup the real PCI address
for the NIC that matches the MAC address.
Libvirt/QEMU cannot guarantee that the address specified in the XML will match the address seen by the guest.
This is a well known limitation: https://libvirt.org/pci-addresses.html
When using the q35 machine type, it highlights this issue due to the change from using PCI to PCI-E bus for virtual devices.
With that said, the PCI value in Nova Metadata is a best effort hint due to the limitations mentioned above. Therefore
we will lookup the real PCI address for the NIC that matches the MAC address.
This PR also adds Virtio 1.0 network device as supported
https://devicehunt.com/view/type/pci/vendor/1AF4/device/1041
https://qemu.readthedocs.io/en/master/specs/pci-ids.html
Modern QEMU brings up virtio 1.0 network device, we need to support that
for DPDK.