Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{chem,devel}[foss/2023a,gompi/2023a] Boost.MPI v1.79.0, waLBerla v6.1 #19252

Closed

Conversation

Neves-P
Copy link
Contributor

@Neves-P Neves-P commented Nov 17, 2023

(created using eb --new-pr)

@Neves-P
Copy link
Contributor Author

Neves-P commented Nov 17, 2023

Test report by @Neves-P
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
WKS058930 - Linux Ubuntu 22.04, x86_64, 12th Gen Intel(R) Core(TM) i7-1260P, Python 3.10.12
See https://gist.github.com/Neves-P/dfab491eaac2789f069d659f34d81dcd for a full test report.

@Neves-P
Copy link
Contributor Author

Neves-P commented Nov 17, 2023

Test report by @Neves-P
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
WKS058930 - Linux Ubuntu 22.04, x86_64, 12th Gen Intel(R) Core(TM) i7-1260P, Python 3.10.12
See https://gist.github.com/Neves-P/0dedd925b7565bdb478a6b1b901bbb79 for a full test report.

@bedroge
Copy link
Contributor

bedroge commented Nov 17, 2023

@boegelbot please test @ jsc-zen2

@boegelbot
Copy link
Collaborator

@bedroge: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=19252 EB_ARGS= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --mem-per-cpu=4000M --job-name test_PR_19252 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3748

Test results coming soon (I hope)...

- notification for comment with ID 1816382125 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@bedroge
Copy link
Contributor

bedroge commented Nov 17, 2023

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@bedroge: Request for testing this PR well received on login1

PR test command 'EB_PR=19252 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19252 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12183

Test results coming soon (I hope)...

- notification for comment with ID 1817019432 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Member

boegel commented Nov 21, 2023

test install on generoso is hanging on test import:

boegelb+  596670  0.0  0.0 224868  5628 ?        S    Nov17   0:00  \_ /bin/bash -l /var/spool/slurmd/job12183/slurm_script
boegelb+  596713  0.0  0.5 564336 82556 ?        Sl   Nov17   3:23      \_ python3 -m easybuild.main --from-pr 19252 --debug --rebuild --robot --upload-test-report --download-timeout=1000
boegelb+  635734  0.0  0.2 451036 45420 ?        Sl   Nov17   0:00          \_ python -c import waLBerla
boegelb+  635735  0.0  0.1 171988 18928 ?        Ssl  Nov17   0:03              \_ orted --hnp --set-sid --report-uri 8 --singleton-died-pipe 9 -mca state_novm_select 1 -mca ess hnp -mca pmix ^s1,s2,cray,isolated

@boegel
Copy link
Member

boegel commented Nov 21, 2023

I've stopped job 12183 on generoso, since there's no way it would finish, the import hang should be resolved somehow

@Neves-P
Copy link
Contributor Author

Neves-P commented Nov 21, 2023

Thanks @bedroge. I modified the sanity check, should hopefully work now.

@bedroge
Copy link
Contributor

bedroge commented Nov 21, 2023

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@bedroge: Request for testing this PR well received on login1

PR test command 'EB_PR=19252 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19252 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12205

Test results coming soon (I hope)...

- notification for comment with ID 1821001878 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
cns1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/96c92573d31ced99f9160d266d9edde9 for a full test report.

@Neves-P
Copy link
Contributor Author

Neves-P commented Nov 21, 2023

Test report by @Neves-P
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
WKS058930 - Linux Ubuntu 22.04, x86_64, 12th Gen Intel(R) Core(TM) i7-1260P, Python 3.10.12
See https://gist.github.com/Neves-P/872b91b9ef6fff057e63b33920d6758c for a full test report.

@Neves-P
Copy link
Contributor Author

Neves-P commented Nov 21, 2023

Test report by @boegelbot FAILED Build succeeded for 1 out of 2 (2 easyconfigs in total) cns1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8 See https://gist.github.com/boegelbot/96c92573d31ced99f9160d266d9edde9 for a full test report.

The fail seems to be due to a lingering lock file:

== 2023-11-21 14:30:45,878 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/easybuild-framework/easybuild/base/exceptions.py:126 in __init__): Lock /project/boegelbot/Rocky8/haswell/software/.locks/_project_boegelbot_Rocky8_haswell_software_waLBerla_6.1-foss-2023a.lock already exists, aborting! (at easybuild/easybuild-framework/easybuild/tools/filetools.py:2002 in check_lock)
== 2023-11-21 14:30:45,879 easyblock.py:4277 WARNING build failed (first 300 chars): Lock /project/boegelbot/Rocky8/haswell/software/.locks/_project_boegelbot_Rocky8_haswell_software_waLBerla_6.1-foss-2023a.lock already exists, aborting!

From: https://gist.github.com/boegelbot/f7d391b376af559329d08083c2025e42#file-walberla-6-1-foss-2023a_partial-log-L98

@bedroge
Copy link
Contributor

bedroge commented Nov 21, 2023

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@bedroge: Request for testing this PR well received on login1

PR test command 'EB_PR=19252 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19252 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12207

Test results coming soon (I hope)...

- notification for comment with ID 1821516335 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@Neves-P
Copy link
Contributor Author

Neves-P commented Nov 27, 2023

Attempted #19324 to use foss/2022b as a work around, but it did not solve the problem, same issue occurs. This problem seems similar to what is seen here #19314 (comment), but the same fix does not solve the problem.

@Neves-P
Copy link
Contributor Author

Neves-P commented Nov 28, 2023

Test report by @Neves-P
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
WKS058930 - Linux Ubuntu 22.04, x86_64, 12th Gen Intel(R) Core(TM) i7-1260P, Python 3.10.12
See https://gist.github.com/Neves-P/60cb210719093ba6829ea0fefd6eb953 for a full test report.

@Neves-P
Copy link
Contributor Author

Neves-P commented Nov 29, 2023

Test report by @Neves-P
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
WKS058930 - Linux Ubuntu 22.04, x86_64, 12th Gen Intel(R) Core(TM) i7-1260P, Python 3.10.12
See https://gist.github.com/Neves-P/8b1e3f52d368436eac87f97c45123e8a for a full test report.

@bedroge
Copy link
Contributor

bedroge commented Dec 12, 2023

The issue with the hanging process seems specific to the EB test clusters, as this is working fine on several other machines. We also tested in the EESSI context on three node types, and they all succeeded, see: EESSI/software-layer#421 (comment).

After quite some debugging on generoso I found some workarounds, and the easiest one is setting UCX_LOG_LEVEL=info, which surprisingly makes the issue disappear... I've opened an issue on the UCX repo (openucx/ucx#9532), and one of the developers suspects that it's an issue between glibc and UCX.

So I propose we add UCX_LOG_LEVEL=info to the sanity check command, which should allow us to successfully build this on the EB test clusters as well.

@bedroge
Copy link
Contributor

bedroge commented Dec 12, 2023

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@bedroge: Request for testing this PR well received on login1

PR test command 'EB_PR=19252 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19252 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12388

Test results coming soon (I hope)...

- notification for comment with ID 1853019804 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
cns2 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/36fe7e9d66d0d4082ff3692fd032b04f for a full test report.

@bedroge
Copy link
Contributor

bedroge commented Dec 13, 2023

@boegelbot please test @ jsc-zen2

@boegelbot
Copy link
Collaborator

@bedroge: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=19252 EB_ARGS= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --mem-per-cpu=4000M --job-name test_PR_19252 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3889

Test results coming soon (I hope)...

- notification for comment with ID 1853482720 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
jsczen2c1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/boegelbot/21ae719e1e1b550007240b6162fe524b for a full test report.

@bedroge
Copy link
Contributor

bedroge commented Dec 14, 2023

Test report by @bedroge
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
aarch64-neoverse-v1-node1.int.aws-rocky88-202310.eessi.io - Linux Debian GNU/Linux 11 (bullseye), AArch64, ARM UNKNOWN, Python 3.11.4
See https://gist.github.com/bedroge/8991dcbb7cae60116d5f3edf685b3e08 for a full test report.

Neves-P and others added 2 commits December 15, 2023 14:08
It was added trying to solve build hanging on EasyBuild generoso and jsc-zen2 clusters

Co-authored-by: Bob Dröge <[email protected]>
@bedroge
Copy link
Contributor

bedroge commented Dec 15, 2023

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@bedroge: Request for testing this PR well received on login1

PR test command 'EB_PR=19252 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19252 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12418

Test results coming soon (I hope)...

- notification for comment with ID 1857862561 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@bedroge
Copy link
Contributor

bedroge commented Dec 15, 2023

Since @verdurin is seeing similar issues with the python -c "import waLBerla" command in #19324 (comment) (same version of waLBerla, different toolchain), this issue may pop up on more systems anyway, so we should not merge this PR yet.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
cns1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/dced1a7b324231c1720a435880c733f9 for a full test report.

@Neves-P
Copy link
Contributor Author

Neves-P commented Feb 12, 2024

The waLBerla Python bindings would not be particularly useful, since they are used mostly for ease of use during post-processing. Since the installation would then essentially be compiling examples and benchmarks via an install with a CMakeMakeCp easyconfig, which would then copy over the header files into the installation directory there would be little benefit to an installation of waLBerla as a module.

This PR was motivated by the inclusion of waLBerla in EESSI EESSI/software-layer#424, and this has also been closed for the same reason.

Because of this, I am closing this PR.

@Neves-P Neves-P closed this Feb 12, 2024
@Neves-P Neves-P reopened this Feb 16, 2024
@Neves-P
Copy link
Contributor Author

Neves-P commented Feb 19, 2024

Reworked the easyconfig file to use CMakeMakeCp and applied @boegel's suggested fixes to sanity checks in #19324.

@Neves-P
Copy link
Contributor Author

Neves-P commented Feb 19, 2024

Test report by @Neves-P
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
WKS058930 - Linux Ubuntu 22.04.3 LTS (Jammy Jellyfish), x86_64, 12th Gen Intel(R) Core(TM) i7-1260P, Python 3.10.12
See https://gist.github.com/Neves-P/4f785073afa19667866f1ad8452ec81a for a full test report.

@Neves-P
Copy link
Contributor Author

Neves-P commented Oct 9, 2024

Closed as I am about to open a new updated PR for this.

@Neves-P Neves-P closed this Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants