Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e: mustgather: add missing deployer initialization #1115

Closed
wants to merge 1 commit into from

Conversation

shajmakh
Copy link
Member

Without the deployer initialization, deployment.deployer is nil and causes the test to panic. Fix that by calling deploy.NewForPlatform().

@openshift-ci openshift-ci bot requested review from mrniranjan and Tal-or December 13, 2024 10:38
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 13, 2024
@shajmakh
Copy link
Member Author

/cherry-pick release-4.18

@openshift-cherrypick-robot

@shajmakh: once the present PR merges, I will cherry-pick it on top of release-4.18 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@@ -71,6 +71,7 @@ var _ = ginkgo.BeforeSuite(func() {
return
}
ginkgo.By("Setting up the cluster")
deployment.Deployer = deploy.NewForPlatform(configuration.Plat)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, this is a nice catch. I think I like more a fix like

$ git diff
diff --git a/test/e2e/must-gather/must_gather_suite_test.go b/test/e2e/must-gather/must_gather_suite_test.go
index 838cdc42..2da25052 100644
--- a/test/e2e/must-gather/must_gather_suite_test.go
+++ b/test/e2e/must-gather/must_gather_suite_test.go
@@ -62,12 +62,12 @@ var _ = ginkgo.BeforeSuite(func() {
        mustGatherTag = getStringValueFromEnv(envVarMustGatherTag, defaultMustGatherTag)
        ginkgo.By(fmt.Sprintf("Using must-gather image %q tag %q", mustGatherImage, mustGatherTag))
 
-       if _, ok := os.LookupEnv("E2E_NROP_INFRA_SETUP_SKIP"); ok {
-               ginkgo.By("Fetching up cluster data")
+       var err error
+       deployment, err = deploy.GetDeploymentWithSched(context.TODO())
+       gomega.Expect(err).ToNot(gomega.HaveOccurred())
 
-               var err error
-               deployment, err = deploy.GetDeploymentWithSched(context.TODO())
-               gomega.Expect(err).ToNot(gomega.HaveOccurred())
+       if _, ok := os.LookupEnv("E2E_NROP_INFRA_SETUP_SKIP"); ok {
+               ginkgo.Expect(deployment.NroSchedObj).ToNot(gomega.BeNil(), "infra setup skip but Scheduler instance not found")
                return
        }
        ginkgo.By("Setting up the cluster")

Do you think the above could work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice idea, thanks for raising! I updated GetDeploymentWithSched() a bit to initialize the deployer so even if setup is skipped it won't hurt. let me know if you meant something else please

Without the deployer initialization, deployment.deployer is nil and
causes the test to panic. Fix that by calling deploy.NewForPlatform().

Signed-off-by: Shereen Haj <[email protected]>
Copy link
Contributor

openshift-ci bot commented Dec 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shajmakh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your approach can work! let's polish it up

@@ -71,7 +71,10 @@ func NewForPlatform(plat platform.Platform) Deployer {
}

func GetDeploymentWithSched(ctx context.Context) (NroDeploymentWithSched, error) {
sd := NroDeploymentWithSched{}
sd := NroDeploymentWithSched{
Deployer: NewForPlatform(configuration.Plat),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems correct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to set it first thing in case the cluster didn't have the scheduler and fails to get it

@@ -62,12 +62,13 @@ var _ = ginkgo.BeforeSuite(func() {
mustGatherTag = getStringValueFromEnv(envVarMustGatherTag, defaultMustGatherTag)
ginkgo.By(fmt.Sprintf("Using must-gather image %q tag %q", mustGatherImage, mustGatherTag))

ginkgo.By("Fetching up cluster data")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems wrong

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed. I'll make it "collect available data" does that sound better?

ginkgo.By("Fetching up cluster data")
var err error
// the error might be not nil we'll decide if that's fine or not depending on E2E_NROP_INFRA_SETUP_SKIP
deployment, err = deploy.GetDeploymentWithSched(context.TODO())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should never fail anyway so let's check err not occurred

Copy link
Member Author

@shajmakh shajmakh Dec 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can fail in case the cluster doesn't have the scheduler and we try to fetch it:
https://github.com/openshift-kni/numaresources-operator/blob/main/test/utils/deploy/deploy.go#L79

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question is why the sudden change? why are we always running the deployment, even if E2E_NROP_INFRA_SETUP_SKIP exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my take is that we always want to have an instance of the current scheduler object. If the scheduler does already exist (skip setup), then we should just fetch the existing instance and move on; otherwise we need to setup it and then fetch the setupped instance. This is probably not too evident from the code and we likely need a bit larger refactoring.

@ffromani
Copy link
Member

/retest

var err error
deployment, err = deploy.GetDeploymentWithSched(context.TODO())
gomega.Expect(err).ToNot(gomega.HaveOccurred())
gomega.Expect(err).ToNot(gomega.HaveOccurred(), "infra setup skip but Scheduler instance not found")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't make sense to conditionally check the error. If we need to do GetDeploymentWithSched it must not fail.

@ffromani
Copy link
Member

/close

we merged #1120

but kudos to find and fix the issue here. #1120 takes a step further and was merged only because handoff of PRs.

@ffromani ffromani closed this Dec 17, 2024
@shajmakh
Copy link
Member Author

/close

we merged #1120

but kudos to find and fix the issue here. #1120 takes a step further and was merged only because handoff of PRs.

Thanks for handling this. much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants