Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-19908: virtual: get real PCI address for each device found #837

Closed
wants to merge 2 commits into from

Conversation

EmilienM
Copy link
Member

@EmilienM EmilienM commented Oct 2, 2023

We can't rely on the PCI address from the metadata so we will lookup the real PCI address
for the NIC that matches the MAC address.

Libvirt/QEMU cannot guarantee that the address specified in the XML will match the address seen by the guest.
This is a well known limitation: https://libvirt.org/pci-addresses.html
When using the q35 machine type, it highlights this issue due to the change from using PCI to PCI-E bus for virtual devices.

With that said, the PCI value in Nova Metadata is a best effort hint due to the limitations mentioned above. Therefore
we will lookup the real PCI address for the NIC that matches the MAC address.

This PR also adds Virtio 1.0 network device as supported

https://devicehunt.com/view/type/pci/vendor/1AF4/device/1041
https://qemu.readthedocs.io/en/master/specs/pci-ids.html

Modern QEMU brings up virtio 1.0 network device, we need to support that
for DPDK.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 2, 2023
@openshift-ci-robot
Copy link
Contributor

@EmilienM: This pull request references Jira Issue OCPBUGS-19908, which is invalid:

  • expected the bug to target the "4.15.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

We can't rely on the PCI address from the metadata so we will lookup the real PCI address
for the NIC that matches the MAC address.

Libvirt/QEMU cannot guarantee that the address specified in the XML will match the address seen by the guest.
This is a well known limitation: https://libvirt.org/pci-addresses.html
When using the q35 machine type, it highlights this issue due to the change from using PCI to PCI-E bus for virtual devices.

With that said, the PCI value in Nova Metadata is a best effort hint due to the limitations mentioned above. Therefore
we will lookup the real PCI address for the NIC that matches the MAC address.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@EmilienM
Copy link
Member Author

EmilienM commented Oct 2, 2023

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 2, 2023
@openshift-ci-robot
Copy link
Contributor

@EmilienM: This pull request references Jira Issue OCPBUGS-19908, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @zhaozhanqi

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

@EmilienM: This pull request references Jira Issue OCPBUGS-19908, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @zhaozhanqi

In response to this:

We can't rely on the PCI address from the metadata so we will lookup the real PCI address
for the NIC that matches the MAC address.

Libvirt/QEMU cannot guarantee that the address specified in the XML will match the address seen by the guest.
This is a well known limitation: https://libvirt.org/pci-addresses.html
When using the q35 machine type, it highlights this issue due to the change from using PCI to PCI-E bus for virtual devices.

With that said, the PCI value in Nova Metadata is a best effort hint due to the limitations mentioned above. Therefore
we will lookup the real PCI address for the NIC that matches the MAC address.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@EmilienM
Copy link
Member Author

EmilienM commented Oct 2, 2023

/hold

under testing

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 2, 2023
@MaysaMacedo
Copy link
Contributor

Nice catch!
Should this fix also be made available upstream?
https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/master/pkg/utils/utils_virtual.go#L116-L122

@EmilienM
Copy link
Member Author

EmilienM commented Oct 2, 2023

Nice catch! Should this fix also be made available upstream? https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/master/pkg/utils/utils_virtual.go#L116-L122

k8snetworkplumbingwg/sriov-network-operator#516

This PR is just for internal testing for now.

@EmilienM
Copy link
Member Author

EmilienM commented Oct 2, 2023

I'm holding it until we get:

  • Upstream review & approval
  • Manual testing of that PR on OSP 17.1 (in progress now)

@bn222
Copy link
Contributor

bn222 commented Oct 3, 2023

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 3, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 3, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bn222, EmilienM

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 3, 2023
Copy link
Member

@mandre mandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! We'll need to backport that too.

pkg/utils/utils_virtual.go Outdated Show resolved Hide resolved
pkg/utils/utils_virtual.go Outdated Show resolved Hide resolved
pkg/utils/utils_virtual.go Outdated Show resolved Hide resolved
@EmilienM
Copy link
Member Author

EmilienM commented Oct 4, 2023

@mandre thanks for your review, I'll address your comments shortly. I didn't have the time today due to testing this PR in real environment.

We'll need to backport that too.

👍

@EmilienM
Copy link
Member Author

EmilienM commented Oct 4, 2023

/retest e2e-openstack-nfv
It will probably fail because we need openshift/release#43986 so the right PCI device is used but it depends if we're lucky the job will pass. I'll test here.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 4, 2023

@EmilienM: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

  • /test api
  • /test ci-index
  • /test controllers
  • /test gofmt
  • /test images
  • /test operator-e2e
  • /test pkg

The following commands are available to trigger optional jobs:

  • /test e2e-openstack-nfv
  • /test e2e-openstack-nfv-hwoffload

Use /test all to run all jobs.

In response to this:

/retest e2e-openstack-nfv
It will probably fail because we need openshift/release#43986 so the right PCI device is used but it depends if we're lucky the job will pass. I'll test here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@EmilienM
Copy link
Member Author

EmilienM commented Oct 4, 2023

/test e2e-openstack-nfv

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 5, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 5, 2023

New changes are detected. LGTM label has been removed.

@mandre
Copy link
Member

mandre commented Oct 11, 2023

/cherry-pick release-4.14

@openshift-cherrypick-robot

@mandre: once the present PR merges, I will cherry-pick it on top of release-4.14 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

We can't rely on the PCI address from the metadata so we will lookup the real PCI address
for the NIC that matches the MAC address.

Libvirt/QEMU cannot guarantee that the address specified in the XML will match the address seen by the guest.
This is a well known limitation: https://libvirt.org/pci-addresses.html
When using the q35 machine type, it highlights this issue due to the change from using PCI to PCI-E bus for virtual devices.

With that said, the PCI value in Nova Metadata is a best effort hint due to the limitations mentioned above. Therefore
we will lookup the real PCI address for the NIC that matches the MAC address.
@EmilienM
Copy link
Member Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 16, 2023
@EmilienM
Copy link
Member Author

/hold
the upstream PR now has some unit tests and I was told that the downstream fork should only have syncs from upstream using cherry-picks, so this PR will probably not make it.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 17, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 18, 2023

@EmilienM: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@EmilienM
Copy link
Member Author

taken care here: #842

@EmilienM EmilienM closed this Oct 23, 2023
@openshift-ci-robot
Copy link
Contributor

@EmilienM: This pull request references Jira Issue OCPBUGS-19908. The bug has been updated to no longer refer to the pull request using the external bug tracker.

In response to this:

We can't rely on the PCI address from the metadata so we will lookup the real PCI address
for the NIC that matches the MAC address.

Libvirt/QEMU cannot guarantee that the address specified in the XML will match the address seen by the guest.
This is a well known limitation: https://libvirt.org/pci-addresses.html
When using the q35 machine type, it highlights this issue due to the change from using PCI to PCI-E bus for virtual devices.

With that said, the PCI value in Nova Metadata is a best effort hint due to the limitations mentioned above. Therefore
we will lookup the real PCI address for the NIC that matches the MAC address.

This PR also adds Virtio 1.0 network device as supported

https://devicehunt.com/view/type/pci/vendor/1AF4/device/1041
https://qemu.readthedocs.io/en/master/specs/pci-ids.html

Modern QEMU brings up virtio 1.0 network device, we need to support that
for DPDK.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@EmilienM EmilienM deleted the OCPBUGS-19908 branch October 23, 2023 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants