Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open MPI with UCX breaks in user namespaces #4224

Open
adrianreber opened this issue Sep 25, 2019 · 48 comments
Open

Open MPI with UCX breaks in user namespaces #4224

adrianreber opened this issue Sep 25, 2019 · 48 comments
Assignees

Comments

@adrianreber
Copy link

Trying to run a UCX based Open MPI with each process in a user namespace (container) breaks UCX completely it seems:

 mm_posix.c:445  UCX  ERROR Error returned from open in attach. Permission denied. File name is: /proc/24149/fd/16    
    mm_ep.c:75   UCX  ERROR failed to connect to remote peer with mm. remote mm_id: 103719165231238
  pml_ucx.c:383  Error: ucp_ep_create(proc=6) failed: Shared memory error

I fixed a similar thing recently in Open MPI vader: open-mpi/ompi#6844

Autodetect that each process is running in a different user namespace and do not use ptrace() based copy mechanisms.

This can be easily reproduced on Fedora 31 with:

[root@fedora01 ~]# rpm -q ucx openmpi
ucx-1.6.0-1.fc31.x86_64
openmpi-4.0.2-0.2.rc1.fc31.x86_64
[root@fedora01 ~]# mpirun --allow-run-as-root -np 4 unshare --map-root-user --user /home/mpi/ring
[fedora01:00765] mca_base_component_repository_open: unable to open mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: undefined symbol: uct_ep_create_connected (ignored)
[fedora01:00767] mca_base_component_repository_open: unable to open mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: undefined symbol: uct_ep_create_connected (ignored)
[fedora01:00764] mca_base_component_repository_open: unable to open mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: undefined symbol: uct_ep_create_connected (ignored)
[fedora01:00766] mca_base_component_repository_open: unable to open mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: undefined symbol: uct_ep_create_connected (ignored)
[1569392914.129581] [fedora01:766  :0]       mm_posix.c:445  UCX  ERROR Error returned from open in attach. Permission denied. File name is: /proc/767/fd/16
[1569392914.129594] [fedora01:766  :0]          mm_ep.c:75   UCX  ERROR failed to connect to remote peer with mm. remote mm_id: 3294239916166
[fedora01:00766] pml_ucx.c:383  Error: ucp_ep_create(proc=3) failed: Shared memory error
[1569392914.129813] [fedora01:764  :0]       mm_posix.c:445  UCX  ERROR Error returned from open in attach. Permission denied. File name is: /proc/765/fd/16
[1569392914.129829] [fedora01:764  :0]          mm_ep.c:75   UCX  ERROR failed to connect to remote peer with mm. remote mm_id: 3285649981574
[fedora01:00764] pml_ucx.c:383  Error: ucp_ep_create(proc=1) failed: Shared memory error
[1569392914.130027] [fedora01:767  :0]       mm_posix.c:445  UCX  ERROR Error returned from open in attach. Permission denied. File name is: /proc/764/fd/16
[1569392914.130070] [fedora01:767  :0]          mm_ep.c:75   UCX  ERROR failed to connect to remote peer with mm. remote mm_id: 3281355014278
[fedora01:00767] pml_ucx.c:383  Error: ucp_ep_create(proc=0) failed: Shared memory error
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
[1569392914.130773] [fedora01:765  :0]       mm_posix.c:445  UCX  ERROR Error returned from open in attach. Permission denied. File name is: /proc/766/fd/16
[1569392914.130818] [fedora01:765  :0]          mm_ep.c:75   UCX  ERROR failed to connect to remote peer with mm. remote mm_id: 3289944948870
[fedora01:00765] pml_ucx.c:383  Error: ucp_ep_create(proc=2) failed: Shared memory error
[fedora01:00764] *** An error occurred in MPI_Init
[fedora01:00764] *** reported by process [336265217,0]
[fedora01:00764] *** on a NULL communicator
[fedora01:00764] *** Unknown error
[fedora01:00764] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[fedora01:00764] ***    and potentially your MPI job)
[fedora01:00759] 3 more processes have sent help message help-mpi-runtime.txt / mpi_init:startup:internal-failure
[fedora01:00759] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[fedora01:00759] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
@yosefe
Copy link
Contributor

yosefe commented Sep 25, 2019

@adrianreber this is known limitation and this is something that will be fixed in next UCX release.
For meantime, can you try mpirun ... -x UCX_POSIX_USE_PROC_LINK=n ... as workaround?

@adrianreber
Copy link
Author

@yosefe Thanks, that works for me.

@yosefe yosefe reopened this Sep 25, 2019
@yosefe
Copy link
Contributor

yosefe commented Sep 25, 2019

i'd like to keep this open to make it work out-of-box

@yosefe
Copy link
Contributor

yosefe commented Sep 25, 2019

adding @hoopoepg to be handled as part of docker support feature

@adrianreber
Copy link
Author

i'd like to keep this open to make it work out-of-box

Sure, just sounded like it is already tracked somewhere.

adding @hoopoepg to be handled as part of docker support feature

I was actually using Open MPI with Podman when it failed. I am using the following command on Fedora 31:

[mpi@fedora01 ~]$ mpirun -x UCX_POSIX_USE_PROC_LINK=n --mca orte_tmpdir_base /tmp/podman-mpirun podman run --env-host -v /tmp/podman-mpirun:/tmp/podman-mpirun --userns=keep-id  --net=host --pid=host --ipc=host quay.io/adrianreber/mpi-test:31 /home/ring
Rank 1 has cleared MPI_Init
Rank 2 has cleared MPI_Init
Rank 3 has cleared MPI_Init
Rank 0 has cleared MPI_Init
Rank 1 has completed ring
Rank 0 has completed ring
Rank 3 has completed ring
Rank 1 has completed MPI_Barrier
Rank 2 has completed ring
Rank 3 has completed MPI_Barrier
Rank 0 has completed MPI_Barrier
Rank 2 has completed MPI_Barrier

@hoopoepg
Copy link
Contributor

@yosefe as short plan we may block CMA on different EP namespaces (have to add namespace ID to system GUID generation)

@hoopoepg
Copy link
Contributor

hi @adrianreber
thank you for bug report & link to OMPI fix

could you try this PR: #4225
unfortunately right now we have no environment to test this functionality

thank you again

@adrianreber
Copy link
Author

could you try this PR: #4225
unfortunately right now we have no environment to test this functionality

@hoopoepg Can you provide a patch against 1.6.1? Then I could patch the Fedora RPM and try it out.

@hoopoepg
Copy link
Contributor

git-diff.txt
here is git diff patch

@adrianreber
Copy link
Author

@hoopoepg Thanks for 1.6.1 based patch. It works.

I added the patch to the Fedora 31 RPM https://koji.fedoraproject.org/koji/taskinfo?taskID=37856715

I rebuilt my test container with it (quay.io/adrianreber/mpi-test:31) and now I can run podman with UCX based Open MPI without errors:

[mpi@host-08 ~]$ mpirun --hostfile hostfile  --mca orte_tmpdir_base /tmp/podman-mpirun podman run --env-host -v /tmp/podman-mpirun:/tmp/podman-mpirun --userns=keep-id  --net=host --pid=host --ipc=host quay.io/adrianreber/mpi-test:31 /home/ring
Rank 0 has cleared MPI_Init
Rank 1 has cleared MPI_Init
Rank 2 has cleared MPI_Init
Rank 3 has cleared MPI_Init
Rank 0 has completed ring
Rank 1 has completed ring
Rank 2 has completed ring
Rank 0 has completed MPI_Barrier
Rank 3 has completed ring
Rank 2 has completed MPI_Barrier
Rank 1 has completed MPI_Barrier
Rank 3 has completed MPI_Barrier

Thanks for the quick fix!

@yosefe I see your name in the Fedora UCX spec file. Would it be okay for you if I update ucx on Fedora rawhide and Fedora 31 to include this patch? Currently it is only a scratch build, no changes done to Fedora's dist-git, yet.

@yosefe
Copy link
Contributor

yosefe commented Sep 25, 2019

@adrianreber this fix appears to block shared memory between containers completely, i'm not sure it's desired. can we wait with this patch for now?

@adrianreber
Copy link
Author

@adrianreber this fix appears to block shared memory between containers completely, i'm not sure it's desired. can we wait with this patch for now?

Sure. In my setup I am sharing the IPC namespace between all containers, so shared memory should work. Running podman with --ipc=host mounts /dev/shm from the host.

@hoopoepg
Copy link
Contributor

hi @adrianreber

we pushed few changes into UCX master branch for containers support. for now only IPC namespace should be shared across containers to allow SHM devices to be used. If you have time it would be great if you try it on your environment

thank you

@shamisp
Copy link
Contributor

shamisp commented Dec 13, 2019

@hoopoepg Thanks for 1.6.1 based patch. It works.

I added the patch to the Fedora 31 RPM https://koji.fedoraproject.org/koji/taskinfo?taskID=37856715

I rebuilt my test container with it (quay.io/adrianreber/mpi-test:31) and now I can run podman with UCX based Open MPI without errors:

[mpi@host-08 ~]$ mpirun --hostfile hostfile  --mca orte_tmpdir_base /tmp/podman-mpirun podman run --env-host -v /tmp/podman-mpirun:/tmp/podman-mpirun --userns=keep-id  --net=host --pid=host --ipc=host quay.io/adrianreber/mpi-test:31 /home/ring
Rank 0 has cleared MPI_Init
Rank 1 has cleared MPI_Init
Rank 2 has cleared MPI_Init
Rank 3 has cleared MPI_Init
Rank 0 has completed ring
Rank 1 has completed ring
Rank 2 has completed ring
Rank 0 has completed MPI_Barrier
Rank 3 has completed ring
Rank 2 has completed MPI_Barrier
Rank 1 has completed MPI_Barrier
Rank 3 has completed MPI_Barrier

Thanks for the quick fix!

@yosefe I see your name in the Fedora UCX spec file. Would it be okay for you if I update ucx on Fedora rawhide and Fedora 31 to include this patch? Currently it is only a scratch build, no changes done to Fedora's dist-git, yet.

@hoopoepg @yosefe - please create PR for 1.6.x branch with the patch. Who knows, maybe at some point we will be asked to do 1.6.2

@FaDee1
Copy link

FaDee1 commented Jan 5, 2020

@adrianreberข้อ จำกัด นี้เป็นที่รู้จักกันดีและนี่คือสิ่งที่จะได้รับการแก้ไขในการเปิดตัว UCX ครั้งต่อไป
ในระหว่างนี้คุณสามารถลองmpirun ... -x UCX_POSIX_USE_PROC_LINK=n ...วิธีแก้ปัญหาได้หรือไม่?

@FaDee1
Copy link

FaDee1 commented Jan 5, 2020

การพยายามเรียกใช้ MPI แบบเปิดที่ใช้ UCX กับแต่ละกระบวนการในเนมสเปซผู้ใช้ (คอนเทนเนอร์) ทำให้แบ่ง UCX โดยสิ้นเชิงดูเหมือนว่า:

 mm_posix.c:445  UCX  ERROR Error returned from open in attach. Permission denied. File name is: /proc/24149/fd/16    
    mm_ep.c:75   UCX  ERROR failed to connect to remote peer with mm. remote mm_id: 103719165231238
  pml_ucx.c:383  Error: ucp_ep_create(proc=6) failed: Shared memory error

ฉันได้แก้ไขสิ่งที่คล้ายกันใน Open MPI vader: open-mpi / ompi # 6844

ตรวจสอบอัตโนมัติว่าแต่ละกระบวนการกำลังทำงานในเนมสเปซผู้ใช้ที่แตกต่างกันและไม่ใช้ptrace()กลไกการคัดลอกที่ใช้

สามารถทำซ้ำได้ง่ายใน Fedora 31 ด้วย:

[root @ fedora01 ~ ] # rpm -q ucx openmpi
UCX-1.6.0-1.fc31.x86_64
openmpi-4.0.2-0.2.rc1.fc31.x86_64
[root @ fedora01 ~ ] # mpirun --allow-run-as-root -np 4 unshare --map-root-user --user / home / mpi / ring
[fedora01: 00765] mca_base_component_repository_open: ไม่สามารถเปิด mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: สัญลักษณ์ที่ไม่ได้กำหนด: uct_ep_create_connected (ละเว้น)
[fedora01: 00767] mca_base_component_repository_open: ไม่สามารถเปิด mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: สัญลักษณ์ที่ไม่ได้กำหนด: uct_ep_create_connected (ละเว้น)
[fedora01: 00764] mca_base_component_repository_open: ไม่สามารถเปิด mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: สัญลักษณ์ที่ไม่ได้กำหนด: uct_ep_create_connected (ละเว้น)
[fedora01: 00766] mca_base_component_repository_open: ไม่สามารถเปิด mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: สัญลักษณ์ที่ไม่ได้กำหนด: uct_ep_create_connected (ละเว้น)
[1569392914.129581] [fedora01: 766: 0] mm_posix.c: 445 UCX ข้อผิดพลาดส่งคืนข้อผิดพลาดจากการเปิดในไฟล์แนบ ปฏิเสธการอนุญาต ชื่อไฟล์คือ: / proc / 767 / fd / 16
[1569392914.129594] [fedora01: 766: 0] mm_ep.c: 75 ข้อผิดพลาด UCX ล้มเหลวในการเชื่อมต่อกับเพียร์ระยะไกลด้วย mm remote mm_id: 3294239916166
[fedora01: 00766] pml_ucx.c: 383 ข้อผิดพลาด: ucp_ep_create (proc = 3) ล้มเหลว: ข้อผิดพลาดของหน่วยความจำที่ใช้ร่วมกัน
[1569392914.129813] [fedora01: 764: 0] mm_posix.c: 445 UCX ข้อผิดพลาดส่งคืนข้อผิดพลาดจากการเปิดในไฟล์แนบ ปฏิเสธการอนุญาต ชื่อไฟล์คือ: / proc / 765 / fd / 16
[1569392914.129829] [fedora01: 764: 0] mm_ep.c: 75 ข้อผิดพลาด UCX ล้มเหลวในการเชื่อมต่อกับเพียร์ระยะไกลด้วย mm remote mm_id: 3285649981574
[fedora01: 00764] pml_ucx.c: 383 ข้อผิดพลาด: ucp_ep_create (proc = 1) ล้มเหลว: ข้อผิดพลาดของหน่วยความจำที่ใช้ร่วมกัน
[1569392914.130027] [fedora01: 767: 0] mm_posix.c: 445 UCX ข้อผิดพลาดส่งคืนข้อผิดพลาดจากการเปิดในไฟล์แนบ ปฏิเสธการอนุญาต ชื่อไฟล์คือ: / proc / 764 / fd / 16
[1569392914.130070] [fedora01: 767: 0] mm_ep.c: 75 UCX ข้อผิดพลาดล้มเหลวในการเชื่อมต่อกับเพียร์ระยะไกลด้วย mm remote mm_id: 3281355014278
[fedora01: 00767] pml_ucx.c: 383 ข้อผิดพลาด: ucp_ep_create (proc = 0) ล้มเหลว: ข้อผิดพลาดของหน่วยความจำที่ใช้ร่วมกัน
-------------------------------------------------- ------------------------
ดูเหมือนว่า MPI_INIT ล้มเหลวสำหรับเหตุผลบางอย่าง; กระบวนการคู่ขนานของคุณคือ
มีแนวโน้มที่จะยกเลิก มีหลายเหตุผลที่กระบวนการแบบขนานสามารถทำได้
ล้มเหลวในระหว่าง MPI_INIT ; บางส่วนเกิดจากการกำหนดค่าหรือสภาพแวดล้อม
ปัญหาที่เกิดขึ้น ความล้มเหลวนี้ดูเหมือนจะเป็นความล้มเหลวภายใน; นี่คือ
ข้อมูลเพิ่มเติม
บางส่วน(ซึ่งอาจเกี่ยวข้องกับนักพัฒนาOpen MPI เท่านั้น):

  PML เพิ่ม procs ล้มเหลว
  -> ส่งคืน "ข้อผิดพลาด" (-1) แทน "สำเร็จ" (0) 
--------------------------- ----------------------------------------------- 
[1569392914.130773] [fedora01: 765: 0] mm_posix.c: 445 UCX ข้อผิดพลาดส่งคืนข้อผิดพลาดจากการเปิดในไฟล์แนบ ปฏิเสธการอนุญาต ชื่อไฟล์คือ: / proc / 766 / fd / 16 
[1569392914.130818] [fedora01: 765: 0] mm_ep.c: 75 ข้อผิดพลาด UCX ล้มเหลวในการเชื่อมต่อกับเครื่องปลายทางระยะไกลด้วย mm remote mm_id: 3289944948870 
[fedora01: 00765] pml_ucx.c: 383 ข้อผิดพลาด: ucp_ep_create (proc = 2) ล้มเหลว: ข้อผิดพลาดของหน่วยความจำที่ใช้ร่วมกัน
[fedora01: 00764] *** เกิดข้อผิดพลาดใน MPI_Init 
[fedora01: 00764] *** รายงานโดย กระบวนการ [336265217,0] 
[fedora01: 00764] *** บนตัวสื่อสาร NULL 
[fedora01: 00764] *** ข้อผิดพลาดที่ไม่รู้จัก
[fedora01: 00764] *** MPI_ERRORS_ARE_FATAL (กระบวนการในเครื่องมือสื่อสารนี้จะยกเลิก
[fedora01: 00764] *** และอาจเป็นงาน MPI ของคุณ) 
[fedora01: 00759] 3 กระบวนการเพิ่มเติมได้ส่งข้อความช่วยเหลือ help-mpi-runtime txt / mpi_init: เริ่มต้น: internal-failure 
[fedora01: 00759] ตั้งค่าพารามิเตอร์ MCA "orte_base_help_aggregate" เป็น 0 เพื่อดูข้อความช่วยเหลือ / ข้อผิดพลาดทั้งหมด
[fedora01: 00759] 3 กระบวนการเพิ่มเติมได้ส่งข้อความช่วยเหลือ help-mpi -err.txt จัดการที่ไม่รู้จัก

1 similar comment
@FaDee1
Copy link

FaDee1 commented Jan 5, 2020

การพยายามเรียกใช้ MPI แบบเปิดที่ใช้ UCX กับแต่ละกระบวนการในเนมสเปซผู้ใช้ (คอนเทนเนอร์) ทำให้แบ่ง UCX โดยสิ้นเชิงดูเหมือนว่า:

 mm_posix.c:445  UCX  ERROR Error returned from open in attach. Permission denied. File name is: /proc/24149/fd/16    
    mm_ep.c:75   UCX  ERROR failed to connect to remote peer with mm. remote mm_id: 103719165231238
  pml_ucx.c:383  Error: ucp_ep_create(proc=6) failed: Shared memory error

ฉันได้แก้ไขสิ่งที่คล้ายกันใน Open MPI vader: open-mpi / ompi # 6844

ตรวจสอบอัตโนมัติว่าแต่ละกระบวนการกำลังทำงานในเนมสเปซผู้ใช้ที่แตกต่างกันและไม่ใช้ptrace()กลไกการคัดลอกที่ใช้

สามารถทำซ้ำได้ง่ายใน Fedora 31 ด้วย:

[root @ fedora01 ~ ] # rpm -q ucx openmpi
UCX-1.6.0-1.fc31.x86_64
openmpi-4.0.2-0.2.rc1.fc31.x86_64
[root @ fedora01 ~ ] # mpirun --allow-run-as-root -np 4 unshare --map-root-user --user / home / mpi / ring
[fedora01: 00765] mca_base_component_repository_open: ไม่สามารถเปิด mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: สัญลักษณ์ที่ไม่ได้กำหนด: uct_ep_create_connected (ละเว้น)
[fedora01: 00767] mca_base_component_repository_open: ไม่สามารถเปิด mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: สัญลักษณ์ที่ไม่ได้กำหนด: uct_ep_create_connected (ละเว้น)
[fedora01: 00764] mca_base_component_repository_open: ไม่สามารถเปิด mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: สัญลักษณ์ที่ไม่ได้กำหนด: uct_ep_create_connected (ละเว้น)
[fedora01: 00766] mca_base_component_repository_open: ไม่สามารถเปิด mca_btl_uct: /usr/lib64/openmpi/lib/openmpi/mca_btl_uct.so: สัญลักษณ์ที่ไม่ได้กำหนด: uct_ep_create_connected (ละเว้น)
[1569392914.129581] [fedora01: 766: 0] mm_posix.c: 445 UCX ข้อผิดพลาดส่งคืนข้อผิดพลาดจากการเปิดในไฟล์แนบ ปฏิเสธการอนุญาต ชื่อไฟล์คือ: / proc / 767 / fd / 16
[1569392914.129594] [fedora01: 766: 0] mm_ep.c: 75 ข้อผิดพลาด UCX ล้มเหลวในการเชื่อมต่อกับเพียร์ระยะไกลด้วย mm remote mm_id: 3294239916166
[fedora01: 00766] pml_ucx.c: 383 ข้อผิดพลาด: ucp_ep_create (proc = 3) ล้มเหลว: ข้อผิดพลาดของหน่วยความจำที่ใช้ร่วมกัน
[1569392914.129813] [fedora01: 764: 0] mm_posix.c: 445 UCX ข้อผิดพลาดส่งคืนข้อผิดพลาดจากการเปิดในไฟล์แนบ ปฏิเสธการอนุญาต ชื่อไฟล์คือ: / proc / 765 / fd / 16
[1569392914.129829] [fedora01: 764: 0] mm_ep.c: 75 ข้อผิดพลาด UCX ล้มเหลวในการเชื่อมต่อกับเพียร์ระยะไกลด้วย mm remote mm_id: 3285649981574
[fedora01: 00764] pml_ucx.c: 383 ข้อผิดพลาด: ucp_ep_create (proc = 1) ล้มเหลว: ข้อผิดพลาดของหน่วยความจำที่ใช้ร่วมกัน
[1569392914.130027] [fedora01: 767: 0] mm_posix.c: 445 UCX ข้อผิดพลาดส่งคืนข้อผิดพลาดจากการเปิดในไฟล์แนบ ปฏิเสธการอนุญาต ชื่อไฟล์คือ: / proc / 764 / fd / 16
[1569392914.130070] [fedora01: 767: 0] mm_ep.c: 75 UCX ข้อผิดพลาดล้มเหลวในการเชื่อมต่อกับเพียร์ระยะไกลด้วย mm remote mm_id: 3281355014278
[fedora01: 00767] pml_ucx.c: 383 ข้อผิดพลาด: ucp_ep_create (proc = 0) ล้มเหลว: ข้อผิดพลาดของหน่วยความจำที่ใช้ร่วมกัน
-------------------------------------------------- ------------------------
ดูเหมือนว่า MPI_INIT ล้มเหลวสำหรับเหตุผลบางอย่าง; กระบวนการคู่ขนานของคุณคือ
มีแนวโน้มที่จะยกเลิก มีหลายเหตุผลที่กระบวนการแบบขนานสามารถทำได้
ล้มเหลวในระหว่าง MPI_INIT ; บางส่วนเกิดจากการกำหนดค่าหรือสภาพแวดล้อม
ปัญหาที่เกิดขึ้น ความล้มเหลวนี้ดูเหมือนจะเป็นความล้มเหลวภายใน; นี่คือ
ข้อมูลเพิ่มเติม
บางส่วน(ซึ่งอาจเกี่ยวข้องกับนักพัฒนาOpen MPI เท่านั้น):

  PML เพิ่ม procs ล้มเหลว
  -> ส่งคืน "ข้อผิดพลาด" (-1) แทน "สำเร็จ" (0) 
--------------------------- ----------------------------------------------- 
[1569392914.130773] [fedora01: 765: 0] mm_posix.c: 445 UCX ข้อผิดพลาดส่งคืนข้อผิดพลาดจากการเปิดในไฟล์แนบ ปฏิเสธการอนุญาต ชื่อไฟล์คือ: / proc / 766 / fd / 16 
[1569392914.130818] [fedora01: 765: 0] mm_ep.c: 75 ข้อผิดพลาด UCX ล้มเหลวในการเชื่อมต่อกับเครื่องปลายทางระยะไกลด้วย mm remote mm_id: 3289944948870 
[fedora01: 00765] pml_ucx.c: 383 ข้อผิดพลาด: ucp_ep_create (proc = 2) ล้มเหลว: ข้อผิดพลาดของหน่วยความจำที่ใช้ร่วมกัน
[fedora01: 00764] *** เกิดข้อผิดพลาดใน MPI_Init 
[fedora01: 00764] *** รายงานโดย กระบวนการ [336265217,0] 
[fedora01: 00764] *** บนตัวสื่อสาร NULL 
[fedora01: 00764] *** ข้อผิดพลาดที่ไม่รู้จัก
[fedora01: 00764] *** MPI_ERRORS_ARE_FATAL (กระบวนการในเครื่องมือสื่อสารนี้จะยกเลิก
[fedora01: 00764] *** และอาจเป็นงาน MPI ของคุณ) 
[fedora01: 00759] 3 กระบวนการเพิ่มเติมได้ส่งข้อความช่วยเหลือ help-mpi-runtime txt / mpi_init: เริ่มต้น: internal-failure 
[fedora01: 00759] ตั้งค่าพารามิเตอร์ MCA "orte_base_help_aggregate" เป็น 0 เพื่อดูข้อความช่วยเหลือ / ข้อผิดพลาดทั้งหมด
[fedora01: 00759] 3 กระบวนการเพิ่มเติมได้ส่งข้อความช่วยเหลือ help-mpi -err.txt จัดการที่ไม่รู้จัก

@adrianreber
Copy link
Author

we pushed few changes into UCX master branch for containers support. for now only IPC namespace should be shared across containers to allow SHM devices to be used. If you have time it would be great if you try it on your environment

Last time I tried to test the master branch it required a lot of rebuilds as I was just adding patches to the distribution packages. I have not created a environment where I can install all necessary libraries and packages based on the latest version of UCX. If there are patches against 1.6.x (without SO name changes) it would be easier for me to test.

@hoopoepg
Copy link
Contributor

hoopoepg commented Jan 9, 2020

hi
unfortunately this fix is based on another set of fixes which is hard to backport into 1.6 branch

@vanzod
Copy link

vanzod commented Feb 17, 2022

A very similar issue also happens outside a containerized environment.
Moreover, it seems to be transient since not all MPI runs end in an error as shown below for two consecutive MPI launches on the same system.

  • OpenMPI 4.1.1
  • UCX 1.11.2
  • OSU Micro Benchmarks 5.7.1
[admin@ndv2-1 ~]$ mpirun -np 40 osu_scatter

# OSU MPI Scatter Latency Test v5.7.1
# Size       Avg Latency(us)
1                       1.69
2                       1.62
4                       1.64
8                       1.82
16                      2.15
32                      2.31
64                      2.68
128                     3.22
256                     4.00
512                    11.77
1024                   15.50
2048                   19.41
4096                   27.65
8192                   36.38
16384                 175.19
32768                 208.84
65536                 253.05
131072                600.50
262144               1862.54
524288               4966.82
1048576             10852.59

[admin@ndv2-1 ~]$ mpirun -np 40 osu_scatter
[ndv2-1:25388] [[64574,1],8] selected pml cm, but peer [[64574,1],0] on ndv2-1 selected pml ucx
[ndv2-1:25382] [[64574,1],3] selected pml cm, but peer [[64574,1],0] on ndv2-1 selected pml ucx
--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another.  This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used.  Your MPI job will now abort.

You may wish to try to narrow down the problem;

 * Check the output of ompi_info to see which BTL/MTL plugins are
   available.
 * Run your application with MPI_THREAD_SINGLE.
 * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
   if using MTL-based communications) to see exactly which
   communication plugins were considered and/or discarded.
--------------------------------------------------------------------------
[1645056146.646175] [ndv2-1:25386:0]        mm_posix.c:206  UCX  ERROR   open(file_name=/proc/25388/fd/29 flags=0x0) failed: No such file or directory
[1645056146.646224] [ndv2-1:25386:0]           mm_ep.c:158  UCX  ERROR   mm ep failed to connect to remote FIFO id 0xc00000074000632c: Shared memory error
[ndv2-1:25386] pml_ucx.c:419  Error: ucp_ep_create(proc=8) failed: Shared memory error
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
[ndv2-1:25388] *** An error occurred in MPI_Init
[ndv2-1:25388] *** reported by process [4231921665,8]
[ndv2-1:25388] *** on a NULL communicator
[ndv2-1:25388] *** Unknown error
[ndv2-1:25388] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[ndv2-1:25388] ***    and potentially your MPI job)
[1645056146.666594] [ndv2-1:25384:0]        mm_posix.c:206  UCX  ERROR   open(file_name=/proc/25386/fd/29 flags=0x0) failed: Permission denied
[1645056146.666630] [ndv2-1:25384:0]           mm_ep.c:158  UCX  ERROR   mm ep failed to connect to remote FIFO id 0xc00000074000632a: Shared memory error
[1645056146.654733] [ndv2-1:25387:0]        mm_posix.c:206  UCX  ERROR   open(file_name=/proc/25386/fd/29 flags=0x0) failed: Permission denied
[1645056146.654783] [ndv2-1:25387:0]           mm_ep.c:158  UCX  ERROR   mm ep failed to connect to remote FIFO id 0xc00000074000632a: Shared memory error
[1645056146.658680] [ndv2-1:25385:0]        mm_posix.c:206  UCX  ERROR   open(file_name=/proc/25386/fd/29 flags=0x0) failed: Permission denied
[1645056146.658770] [ndv2-1:25385:0]           mm_ep.c:158  UCX  ERROR   mm ep failed to connect to remote FIFO id 0xc00000074000632a: Shared memory error
[1645056146.773674] [ndv2-1:25391:0]        mm_posix.c:206  UCX  ERROR   open(file_name=/proc/25382/fd/29 flags=0x0) failed: No such file or directory
[1645056146.773704] [ndv2-1:25391:0]           mm_ep.c:158  UCX  ERROR   mm ep failed to connect to remote FIFO id 0xc000000740006326: Shared memory error
[1645056146.768402] [ndv2-1:25389:0]        mm_posix.c:206  UCX  ERROR   open(file_name=/proc/25382/fd/29 flags=0x0) failed: No such file or directory
[1645056146.768436] [ndv2-1:25389:0]           mm_ep.c:158  UCX  ERROR   mm ep failed to connect to remote FIFO id 0xc000000740006326: Shared memory error
[...]
[ndv2-1:25384] pml_ucx.c:419  Error: ucp_ep_create(proc=7) failed: Shared memory error
[ndv2-1:25387] pml_ucx.c:419  Error: ucp_ep_create(proc=7) failed: Shared memory error
[ndv2-1:25385] pml_ucx.c:419  Error: ucp_ep_create(proc=7) failed: Shared memory error
[ndv2-1:25391] pml_ucx.c:419  Error: ucp_ep_create(proc=3) failed: Shared memory error
[ndv2-1:25389] pml_ucx.c:419  Error: ucp_ep_create(proc=3) failed: Shared memory error
[ndv2-1:25390] pml_ucx.c:419  Error: ucp_ep_create(proc=3) failed: Shared memory error
[...]
[ndv2-1:25372] 1 more process has sent help message help-mpi-runtime.txt / mpi_init:startup:pml-add-procs-fail
[ndv2-1:25372] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[ndv2-1:25372] 37 more processes have sent help message help-mpi-runtime.txt / mpi_init:startup:internal-failure
[ndv2-1:25372] 2 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
[...]

@hoopoepg
Copy link
Contributor

hi
does setting UCX_POSIX_USE_PROC_LINK=n helps?

@brminich
Copy link
Contributor

@vanzod, can you please check whether setting UCX_POSIX_USE_PROC_LINK=n environment variable helps?

@vanzod
Copy link

vanzod commented Mar 16, 2022

@hoopoepg @brminich Unfortunately even with that environment variable the error still occurs sometimes.
One thing that I noticed is that this issue presents itself only on AMD Epyc Milan processors (7V12, 7V13).
I have a working test environment so happy to run more tests if needed.

@hoopoepg
Copy link
Contributor

hi @vanzod
sorry for late response - we are in release process.

is it possible to build UCX with debug info (add --enable-debug tp configure arguments) and run failed test with debug log level env UCX_LOG_LEVEL=debug?

thank you

@vanzod
Copy link

vanzod commented Apr 20, 2022

@hoopoepg No problem. Here are the debug logs you requested.
Note that UCX_POSIX_USE_PROC_LINK=n is defined in the environment.

Successful osu_scatter run:
https://gist.github.com/vanzod/3a8d04f14614d8a0914b0bbfb1ecafca

Failed osu_scatter run:
https://gist.github.com/vanzod/4e4d51d76c0acc1081256174a482bd4a

@hoopoepg
Copy link
Contributor

hmmm, as I can see there is POSIX SHM infrastructure in inaccessible from process for some reasons.
let's try to force sysV shm transport: cat you add variable UCX_TLS=sysv,cma,ib to test run?

thank you

@vanzod
Copy link

vanzod commented Apr 22, 2022

@hoopoepg Here is the log you asked for:

https://gist.github.com/vanzod/cddbda25b9674a38de5b6e886db255da

@hoopoepg
Copy link
Contributor

does it works as expected?

@hoopoepg
Copy link
Contributor

I don't see there any critical errors

@vanzod
Copy link

vanzod commented Apr 22, 2022

No, now it fails consistently.
For some reason the previous gist does not provide the full file view. Please find the full log at:

https://gist.github.com/vanzod/ce6cfc5b823bfe5d71f4e1c8097a1e43

@hoopoepg
Copy link
Contributor

I see from logs that endpoint is created and UCX is able to allocate shared memory, still don't see any issues. non-complete log file?
could you zip log file and send it to [email protected] ?
thank you

@hoopoepg
Copy link
Contributor

thank you for logs.
so, as I can see UCX was able to startup, but process exit with error.

could you run ucx_perftest application (installed with UCX package) to check if UCX is able to run on your system? run commands on compute nodes:

UCX_TLS=cma,sysv ~/local/ucx/bin/ucx_perftest -t tag_lat &
UCX_TLS=cma,sysv ~/local/ucx/bin/ucx_perftest -t tag_lat localhost

and in case if it failed set UCX_LOG_LEVEL=debug and send logs to me

thank you

@vanzod
Copy link

vanzod commented Apr 22, 2022

@hoopoepg ucx_perftest completed successfully. Here is the output:

$ UCX_TLS=cma,sysv ucx_perftest -t tag_lat & UCX_TLS=cma,sysv ucx_perftest -t tag_lat localhost
[1] 63262
[1650649643.029021] [ndv4:63262:0]        perftest.c:1580 UCX  WARN  CPU affinity is not set (bound to 96 cpus). Performance may be impacted.
[1650649643.029023] [ndv4:63263:0]        perftest.c:1580 UCX  WARN  CPU affinity is not set (bound to 96 cpus). Performance may be impacted.
Waiting for connection...
+------------------------------------------------------------------------------------------+
+--------------+--------------+-----------------------------+---------------------+-----------------------+
| API:          protocol layer                                                             |
|              |              |       latency (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
| Test:         tag match latency                                                          |
+--------------+--------------+---------+---------+---------+----------+----------+-----------+-----------+
| Data layout:  (automatic)                                                                |
|    Stage     | # iterations | typical | average | overall |  average |  overall |  average  |  overall  |
| Send memory:  host                                                                       |
+--------------+--------------+---------+---------+---------+----------+----------+-----------+-----------+
| Recv memory:  host                                                                       |
| Message size: 8                                                                          |
+------------------------------------------------------------------------------------------+
Final:               1000000     0.000     0.099     0.099       76.75      76.75    10059549    10059549

@hoopoepg
Copy link
Contributor

looks like OMPI is trying to use posix SHM and failed to initialise.

@hoopoepg
Copy link
Contributor

@vanzod could you run OMPI application with parameter --mca opal_common_ucx_verbose 9 to enable UCX PML debug output. may be it help to get source of issue

thank you

@jamesongithub
Copy link

@hoopoepg @vanzod created #8511 specifically for shared memory error remote fifo

@kcgthb
Copy link

kcgthb commented Oct 14, 2022

Just wanted to add another use case that produces those errors.

Using OpenMPI+UCX 1.10 in Singularity/Apptainer containers in non-setuid mode (the new default) produces the same kind of error:

[1665612809.408366] [sh03-01n71:20358:0]       mm_posix.c:194  UCX  ERROR open(file_name=/proc/20353/fd/18 flags=0x0) failed: No such file or directory
[1665612809.408388] [sh03-01n71:20358:0]          mm_ep.c:154  UCX  ERROR mm ep failed to connect to remote FIFO id 0xc000000480004f81: Shared memory error
[sh03-01n71.int:20358] pml_ucx.c:419  Error: ucp_ep_create(proc=0) failed: Shared memory error
[1665612809.408436] [sh03-01n71:20353:0]       mm_posix.c:194  UCX  ERROR open(file_name=/proc/20358/fd/18 flags=0x0) failed: No such file or directory
[1665612809.408460] [sh03-01n71:20353:0]          mm_ep.c:154  UCX  ERROR mm ep failed to connect to remote FIFO id 0xc000000480004f86: Shared memory error
[sh03-01n71.int:20353] pml_ucx.c:419  Error: ucp_ep_create(proc=1) failed: Shared memory error

Using UCX_POSIX_USE_PROC_LINK=n does solve the problem and allow the MPI program to work properly in the container in non-setuid mode.

The issue is being discussed in apptainer/apptainer#769, but if anyone here could shed some light on the problem, that would be much appreciated.

Thanks!

@panda1100
Copy link
Contributor

panda1100 commented Jul 6, 2023

@hoopoepg UCX_TLS=sysv,cma,ib works on our environment.
UCX_POSIX_USE_PROC_LINK=n also works. (UCX_POSIX_USE_PROC_LINK=n + UCX_TLS=posix,cma,ib works too)
Now I have working test environment (OS: Rocky Linux 8).

I found #4511 already merged to master.

But, I'm still facing this issue, I tested against OMPI 4.1.5 + UCX v1.10.1 (the both workaround works though..)

@rodrigo-ceccato
Copy link

Is there any workaround for MPICH? Facing the same error with apptainer version 1.1.9-1.el8 and mpich 4.1 + UCX 1.14

@panda1100
Copy link
Contributor

panda1100 commented Sep 14, 2023

@rodrigo-ceccato This is temporary workaround (permanent solution will be released as v1.3.0) but apptainer instance workaround should work for MPICH as well. Please jump to "The Apptainer instance workaround for intra-node communication issue with MPI applications and Apptainer without setuid" on the following article. I explained a bit why it works on the article.
https://ciq.com/blog/workaround-for-communication-issue-with-mpi-apps-apptainer-without-setuid/

If it doesn't work because of ssh restriction, please see the following discussion (This is not really clean solution but at least it works, please consider this as "temporary" workaround).
#8958

@DavidCdeB
Copy link

@adrianreber this is known limitation and this is something that will be fixed in next UCX release. For meantime, can you try mpirun ... -x UCX_POSIX_USE_PROC_LINK=n ... as workaround?

@yosefe Thanks for this suggestion. I've tried:

mpirun -n 600 -ppn 8 -x UCX_POSIX_USE_PROC_LINK=n  executable.x ${input}.inp > ${input}.out

But I get this error:

[mpiexec@g-08-c0549] match_arg (../../../../../src/pm/i_hydra/libhydra/arg/hydra_arg.c:91): unrecognized argument x           
[mpiexec@g-08-c0549] Similar arguments:
[mpiexec@g-08-c0549]     demux
[mpiexec@g-08-c0549]     s   
[mpiexec@g-08-c0549]     n   
[mpiexec@g-08-c0549]     enable-x
[mpiexec@g-08-c0549]     f   
[mpiexec@g-08-c0549] HYD_arg_parse_array (../../../../../src/pm/i_hydra/libhydra/arg/hydra_arg.c:128): argument matching returned error
[mpiexec@g-08-c0549] mpiexec_get_parameters (../../../../../src/pm/i_hydra/mpiexec/mpiexec_params.c:1359): error parsing input array
[mpiexec@g-08-c0549] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1787): error parsing parameters

Shall I use this differently?
Many thanks

@panda1100
Copy link
Contributor

@DavidCdeB mpirun -env UCX_POSIX_USE_PROC_LINK=n how about this?

@DavidCdeB
Copy link

@DavidCdeB mpirun -env UCX_POSIX_USE_PROC_LINK=n how about this?

@panda1100 Thanks. I added that, but I'm still receiving:

[1697387421.097742] [g-02-c0107:2826 :0]         select.c:438  UCX  ERROR no active messages transport to <no debug data>: posix/memory - Destination is unreachable, sysv/memory - Destination is unreachable, self/memory - Destination is unreachable, sockcm/sockaddr - no am bcopy, ud/mlx5_0:1 - Destination is unreachable, ud_mlx5/mlx5_0:1 - Destination is unreachable, rdmac
[1697387421.097715] [g-02-c0107:2827 :0]         select.c:438  UCX  ERROR no active messages transport to <no debug data>: posix/memory - Destination is unreachable, sysv/memory - Destination is unreachable, self/memory - Destination is unreachable, sockcm/sockaddr - no am bcopy, ud/mlx5_0:1 - Destination is unreachable, ud_mlx5/mlx5_0:1 - Destination is unreachable, rdmac
[1697387421.097757] [g-02-c0107:2828 :0]         select.c:438  UCX  ERROR no active messages transport to <no debug data>: posix/memory - Destination is unreachable, sysv/memory - Destination is unreachable, self/memory - Destination is unreachable, sockcm/sockaddr - no am bcopy, ud/mlx5_0:1 - Destination is unreachable, ud_mlx5/mlx5_0:1 - Destination is unreachable, rdmac
[1697387421.097778] [g-02-c0107:2825 :0]         select.c:438  UCX  ERROR no active messages transport to <no debug data>: posix/memory - Destination is unreachable, sysv/memory - Destination is unreachable, self/memory - Destination is unreachable, sockcm/sockaddr - no am bcopy, ud/mlx5_0:1 - Destination is unreachable, ud_mlx5/mlx5_0:1 - Destination is unreachable, rdmac

@panda1100
Copy link
Contributor

panda1100 commented Oct 15, 2023

@DavidCdeB What container solution do you use? Apptainer, Podman, etc.

@DavidCdeB
Copy link

@DavidCdeB What container solution do you use? Apptainer, Podman, etc.

@panda1100 I'm sorry, can you please expand which command should I execute to obtain this information? Many thanks again.

@panda1100
Copy link
Contributor

@DavidCdeB How did you build your executable??

@DavidCdeB
Copy link

@DavidCdeB How did you build your executable??

Thanks, could you please specify more specifically, which information is required. ldd or similar commands over the executable file? Thanks.

@panda1100
Copy link
Contributor

Hi @hoopoepg -san,
We are finally implementing solution on our side apptainer/apptainer#1760
We are planning to merge this to Apptainer v1.3.0 release (probably next release).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests