Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{mpi,toolchain}[intel-compilers/2024.0.0,system/system] impi v2021.11.0, iimpi v2023.11 #19314

Merged

Conversation

bartoldeman
Copy link
Contributor

(created using eb --new-pr)

@bartoldeman
Copy link
Contributor Author

bartoldeman commented Nov 24, 2023

@bartoldeman
Copy link
Contributor Author

@boegelbot please test @ generoso
EB_ARGS="--include-easyblocks-from-pr 3039"

@boegelbot
Copy link
Collaborator

@bartoldeman: Request for testing this PR well received on login1

PR test command 'EB_PR=19314 EB_ARGS="--include-easyblocks-from-pr 3039" EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19314 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12241

Test results coming soon (I hope)...

- notification for comment with ID 1825650626 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3039
FAILED
Build succeeded for 0 out of 2 (2 easyconfigs in total)
cns3 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/b68205b6e14d93cf068e93c37395fcc6 for a full test report.

@bartoldeman
Copy link
Contributor Author

Generoso fails only at the testing stage using this error:

ERROR EasyBuild crashed with an error (at easybuild/easybuild-framework/easybuild/base/exceptions.py:126 in __init__): Sanity check failed: sanity check command mpirun -n 4 /tmp/boegelbot/impi/2021.11.0/intel-compilers-2024.0.0/mpi_test exited with code 143 (output: Abort(1614479) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Unknown error class, error stack:
MPIR_Init_thread(189)............: 
MPID_Init(1561)..................: 
MPIDI_OFI_mpi_init_hook(1674)....: 
insert_addr_table_roots_only(472): OFI get address vector map failed
) (at easybuild/easybuild-framework/easybuild/framework/easyblock.py:3655 in _sanity_check_step)

can probably be worked around using an environment variable. Will have a look.

@bartoldeman
Copy link
Contributor Author

@boegelbot please test @ generoso
EB_ARGS="--include-easyblocks-from-pr 3039" FI_PROVIDER=tcp

@boegelbot
Copy link
Collaborator

@bartoldeman: Request for testing this PR well received on login1

PR test command 'EB_PR=19314 EB_ARGS="--include-easyblocks-from-pr 3039" EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19314 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12244

Test results coming soon (I hope)...

- notification for comment with ID 1825862659 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3039
FAILED
Build succeeded for 0 out of 2 (2 easyconfigs in total)
cns2 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/46418675393746fe5597984849a7f5db for a full test report.

@bartoldeman
Copy link
Contributor Author

@boegelbot please test @ jsc-zen2
EB_ARGS="--include-easyblocks-from-pr 3039"

@boegelbot
Copy link
Collaborator

@bartoldeman: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=19314 EB_ARGS="--include-easyblocks-from-pr 3039" EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --mem-per-cpu=4000M --job-name test_PR_19314 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3792

Test results coming soon (I hope)...

- notification for comment with ID 1825876685 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3039
SUCCESS
Build succeeded for 3 out of 3 (2 easyconfigs in total)
jsczen2c1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/boegelbot/49615d563758761433840d068bd4cb75 for a full test report.

@bartoldeman
Copy link
Contributor Author

@boegelbot please test @ generoso
CORE_CNT=8 EB_ARGS="--include-easyblocks-from-pr 3039"

@boegelbot
Copy link
Collaborator

@bartoldeman: Request for testing this PR well received on login1

PR test command 'EB_PR=19314 EB_ARGS="--include-easyblocks-from-pr 3039" EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19314 --ntasks="8" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12246

Test results coming soon (I hope)...

- notification for comment with ID 1825892230 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3039
FAILED
Build succeeded for 0 out of 2 (2 easyconfigs in total)
cnx1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/7992779066e7da14b7eca290cab500ee for a full test report.

@jfgrimm

This comment was marked as outdated.

@jfgrimm
Copy link
Member

jfgrimm commented Nov 28, 2023

Test report by @jfgrimm
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3037, easybuilders/easybuild-easyblocks#3039
SUCCESS
Build succeeded for 3 out of 3 (2 easyconfigs in total)
node041.viking2.yor.alces.network - Linux Rocky Linux 8.8, x86_64, AMD EPYC 7643 48-Core Processor, Python 3.6.8
See https://gist.github.com/jfgrimm/e80fd091c8cfd7314a1f221b91bf9744 for a full test report.

bartoldeman added a commit to bartoldeman/boegelbot that referenced this pull request Nov 28, 2023
see
easybuilders/easybuild-easyconfigs#19314 (comment)
this is similar to `PSM3_DEVICES='self,shm'` but affects Intel MPI at a higher level; the included libfabric there somehow seems to choose the `mlx` provider (not in upstream libfabric) which fails on Generoso.
@jfgrimm jfgrimm added this to the 4.x milestone Nov 29, 2023
@bartoldeman
Copy link
Contributor Author

@boegelbot please test @ generoso
EB_ARGS="--include-easyblocks-from-pr 3039"

@boegelbot
Copy link
Collaborator

@bartoldeman: Request for testing this PR well received on login1

PR test command 'EB_PR=19314 EB_ARGS="--include-easyblocks-from-pr 3039" EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19314 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12329

Test results coming soon (I hope)...

- notification for comment with ID 1843242648 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3039
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
cns1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/5358906825ff901f55f9018a2d8aff8f for a full test report.

@bartoldeman
Copy link
Contributor Author

bartoldeman commented Dec 6, 2023

ok, with I_MPI_FABRICS=shm on generoso @boegelbot seems to be a happy camper

Copy link
Member

@jfgrimm jfgrimm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jfgrimm jfgrimm merged commit c25b577 into easybuilders:develop Dec 6, 2023
9 checks passed
@boegel boegel removed this from the 4.x milestone Dec 30, 2023
@boegel boegel added this to the next release (4.9.0) milestone Dec 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants