fix: properly handle skip on assessments #473

phisco · 2025-01-03T14:21:45Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

t.Skip with --fail-fast was resulting in the parent test being failed, this patch fixes that, however I'm not entirely sure about the sematic of --fail-fast, so I'd appreciate if someone could clarify that.

For sure, independently of --fail-fast a call to t.Skip* in an Assessment should only skip the current Assessment. But in case of calls to t.Fail or t.FailNow, what should be the behaviour in the two cases?

Which issue(s) this PR fixes:

Fixes #472

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix bug failing tests being skipped when `--fail-fast` is set.

Additional documentation e.g., Usage docs, etc.:

Signed-off-by: Philippe Scorsolini <[email protected]>

k8s-ci-robot · 2025-01-03T14:21:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: phisco
Once this PR has been reviewed and has the lgtm label, please assign cpanato for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-01-03T14:21:54Z

Hi @phisco. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Fricounet

Thanks for catching and fixing the bug.

however I'm not entirely sure about the sematic of --fail-fast, so I'd appreciate if someone could clarify that.

I don't have the full context but from my understanding, --fail-fast is an attempt to adapt the go test -failfast flag to the e2e-framework, ie have a way to stop assessments and features executions early.

For sure, independently of --fail-fast a call to t.Skip* in an Assessment should only skip the current Assessment. But in case of calls to t.Fail or t.FailNow, what should be the behaviour in the two cases?

In my understanding of the framework, Assessments can be kinda linked together as part of the same 'Test' while features are some kind of distinct 'Test'.
Based on the semantics of Fail (marks the function as having failed but continues execution.) and FailNow (marks the function as having failed and stops its execution by calling runtime.Goexit. Execution will continue at the next test or benchmark.), my interpretation is that:

a call to Fail should not attempt to skip any remaining assessment or feature
a call to FailNow should skip all the remaining assessment but not skip the next features

I guess the behavior for FailNow could be argued for or against skipping the next features too. But this is more a question of whether features should be considered independent or not, which, as far as I know is not really clarified anywhere.

Fricounet · 2025-01-08T12:31:49Z

pkg/env/env.go

-		// invoked to make sure we leave the traces of the failed test behind to enable better debugging for the
-		// test developers
-		if e.cfg.FailFast() && failed {
-			newT.FailNow()


will the failure be propagated properly to the parent test without this?

Fricounet · 2025-01-08T12:39:05Z

pkg/env/env.go

+						// The test passed, nothing to do
+						return
+					}
+					if !finished && e.cfg.FailFast() {


Suggested change

if !finished && e.cfg.FailFast() {

if !finished || e.cfg.FailFast() {

shouldn't this be an or instead? Based on the comment below

Fricounet · 2025-01-08T12:40:17Z

pkg/env/env.go

+						// The test didn't finish, this means t.FailNow was called
+						// or we are in fail fast mode, we should skip the remaining assessments


Suggested change

// The test didn't finish, this means t.FailNow was called

// or we are in fail fast mode, we should skip the remaining assessments

// The test failed and either:

// didn't finish, this means t.FailNow was called

// OR we are in fail fast mode.

// We should skip the remaining assessments

maybe a bit more explicit?

Fricounet · 2025-01-08T12:44:51Z

pkg/env/env.go

-		// test developers
-		if e.cfg.FailFast() && failed {
-			newT.FailNow()
+		if failFast {


I'm not sure if we should also skip teardowns when t.FailNow() is called. In particular, I'm not sure it is very logical to skip teardowns for t.FailNow() but not for t.Fail().

In which case, scoping to failed tests with FailFast configured would be better I think. What do you think?

phisco · 2025-01-08T14:14:45Z

a call to Fail should not attempt to skip any remaining assessment or feature

a call to FailNow should skip all the remaining assessment but not skip the next features

Makes sense, @Fricounet, but what about --fail-fast? Should we skip the remaining assessments independently of which one was called, Fail or FailNow when --fail-fast is set? I'd say so from:

e2e-framework/examples/fail_fast/README.md

Lines 3 to 11 in b69e158

    
           There are times in your test infra that you want the rest of your feature assessments to fail in 
        
           case if one of the current executing assessments fail. 
        
           This can aid in getting a better turn around time especially in cases where your assessments are 
        
           inter-related and a failed assessment in step 1 can mean that running the rest of the assessment 
        
           is guaranteed to fail. This can be done with the help of a `--fail-fast` flag provided at the 
        
           framework level. 
        
           This works similar to how the `-failfast` mode of the `go test` works but provides the same 
        
           feature at the context of the `e2e-framework`.

Fricounet · 2025-01-08T15:01:20Z

Makes sense, @Fricounet, but what about --fail-fast? Should we skip the remaining assessments independently of which one was called, Fail or FailNow when --fail-fast is set? I'd say so from:

I'd say so too @phisco. Though, I don't agree with everything written in the README. In particular:

This can aid in getting a better turn around time especially in cases where your assessments are
inter-related and a failed assessment in step 1 can mean that running the rest of the assessment
is guaranteed to fail. This can be done with the help of a --fail-fast flag provided at the
framework level.

To me, this is the job of t.FailNow()

Basically:

as the test writer, I know when it makes sense for my test to keep going (use t.Fail()) or stop early because there is no point going further (use t.FailNow())
as the test executor, I know if I want my whole test suite to exit early at the first sign of failure (use --fail-fast). Hence --fail-fast should skip the remaining assessments regardless of Fail/FailNow

Hope that makes sense 😄

fix: properly handle skip on assessments

57658ac

Signed-off-by: Philippe Scorsolini <[email protected]>

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jan 3, 2025

k8s-ci-robot requested review from ShwethaKumbla and vladimirvivien January 3, 2025 14:21

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 3, 2025

Fricounet reviewed Jan 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: properly handle skip on assessments #473

fix: properly handle skip on assessments #473

phisco commented Jan 3, 2025

k8s-ci-robot commented Jan 3, 2025

k8s-ci-robot commented Jan 3, 2025

Fricounet left a comment

Fricounet Jan 8, 2025

Fricounet Jan 8, 2025

Fricounet Jan 8, 2025

Fricounet Jan 8, 2025

phisco commented Jan 8, 2025

Fricounet commented Jan 8, 2025

	if !finished && e.cfg.FailFast() {
	if !finished \|\| e.cfg.FailFast() {

		// The test didn't finish, this means t.FailNow was called
		// or we are in fail fast mode, we should skip the remaining assessments

-						// The test didn't finish, this means t.FailNow was called
-						// or we are in fail fast mode, we should skip the remaining assessments
+						// The test failed and either:
+						// didn't finish, this means t.FailNow was called
+						// OR we are in fail fast mode.
+						// We should skip the remaining assessments

fix: properly handle skip on assessments #473

Are you sure you want to change the base?

fix: properly handle skip on assessments #473

Conversation

phisco commented Jan 3, 2025

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., Usage docs, etc.:

k8s-ci-robot commented Jan 3, 2025

k8s-ci-robot commented Jan 3, 2025

Fricounet left a comment

Choose a reason for hiding this comment

Fricounet Jan 8, 2025

Choose a reason for hiding this comment

Fricounet Jan 8, 2025

Choose a reason for hiding this comment

Fricounet Jan 8, 2025

Choose a reason for hiding this comment

Fricounet Jan 8, 2025

Choose a reason for hiding this comment

phisco commented Jan 8, 2025

Fricounet commented Jan 8, 2025