-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open MPI with UCX breaks in user namespaces #4224
Comments
@adrianreber this is known limitation and this is something that will be fixed in next UCX release. |
@yosefe Thanks, that works for me. |
i'd like to keep this open to make it work out-of-box |
adding @hoopoepg to be handled as part of docker support feature |
Sure, just sounded like it is already tracked somewhere.
I was actually using Open MPI with Podman when it failed. I am using the following command on Fedora 31:
|
@yosefe as short plan we may block CMA on different EP namespaces (have to add namespace ID to system GUID generation) |
hi @adrianreber could you try this PR: #4225 thank you again |
git-diff.txt |
@hoopoepg Thanks for 1.6.1 based patch. It works. I added the patch to the Fedora 31 RPM https://koji.fedoraproject.org/koji/taskinfo?taskID=37856715 I rebuilt my test container with it (quay.io/adrianreber/mpi-test:31) and now I can run podman with UCX based Open MPI without errors: [mpi@host-08 ~]$ mpirun --hostfile hostfile --mca orte_tmpdir_base /tmp/podman-mpirun podman run --env-host -v /tmp/podman-mpirun:/tmp/podman-mpirun --userns=keep-id --net=host --pid=host --ipc=host quay.io/adrianreber/mpi-test:31 /home/ring
Rank 0 has cleared MPI_Init
Rank 1 has cleared MPI_Init
Rank 2 has cleared MPI_Init
Rank 3 has cleared MPI_Init
Rank 0 has completed ring
Rank 1 has completed ring
Rank 2 has completed ring
Rank 0 has completed MPI_Barrier
Rank 3 has completed ring
Rank 2 has completed MPI_Barrier
Rank 1 has completed MPI_Barrier
Rank 3 has completed MPI_Barrier Thanks for the quick fix! @yosefe I see your name in the Fedora UCX spec file. Would it be okay for you if I update ucx on Fedora rawhide and Fedora 31 to include this patch? Currently it is only a scratch build, no changes done to Fedora's dist-git, yet. |
@adrianreber this fix appears to block shared memory between containers completely, i'm not sure it's desired. can we wait with this patch for now? |
Sure. In my setup I am sharing the IPC namespace between all containers, so shared memory should work. Running podman with --ipc=host mounts /dev/shm from the host. |
hi @adrianreber we pushed few changes into UCX master branch for containers support. for now only IPC namespace should be shared across containers to allow SHM devices to be used. If you have time it would be great if you try it on your environment thank you |
@hoopoepg @yosefe - please create PR for 1.6.x branch with the patch. Who knows, maybe at some point we will be asked to do 1.6.2 |
|
|
1 similar comment
|
Last time I tried to test the master branch it required a lot of rebuilds as I was just adding patches to the distribution packages. I have not created a environment where I can install all necessary libraries and packages based on the latest version of UCX. If there are patches against 1.6.x (without SO name changes) it would be easier for me to test. |
hi |
A very similar issue also happens outside a containerized environment.
|
hi |
@vanzod, can you please check whether setting UCX_POSIX_USE_PROC_LINK=n environment variable helps? |
hi @vanzod is it possible to build UCX with debug info (add thank you |
@hoopoepg No problem. Here are the debug logs you requested. Successful osu_scatter run: Failed osu_scatter run: |
hmmm, as I can see there is POSIX SHM infrastructure in inaccessible from process for some reasons. thank you |
@hoopoepg Here is the log you asked for: https://gist.github.com/vanzod/cddbda25b9674a38de5b6e886db255da |
does it works as expected? |
I don't see there any critical errors |
No, now it fails consistently. https://gist.github.com/vanzod/ce6cfc5b823bfe5d71f4e1c8097a1e43 |
I see from logs that endpoint is created and UCX is able to allocate shared memory, still don't see any issues. non-complete log file? |
thank you for logs. could you run ucx_perftest application (installed with UCX package) to check if UCX is able to run on your system? run commands on compute nodes:
and in case if it failed set UCX_LOG_LEVEL=debug and send logs to me thank you |
@hoopoepg ucx_perftest completed successfully. Here is the output:
|
looks like OMPI is trying to use posix SHM and failed to initialise. |
@vanzod could you run OMPI application with parameter thank you |
Just wanted to add another use case that produces those errors. Using OpenMPI+UCX 1.10 in Singularity/Apptainer containers in non-setuid mode (the new default) produces the same kind of error:
Using The issue is being discussed in apptainer/apptainer#769, but if anyone here could shed some light on the problem, that would be much appreciated. Thanks! |
@hoopoepg I found #4511 already merged to master. But, I'm still facing this issue, I tested against OMPI 4.1.5 + UCX v1.10.1 (the both workaround works though..) |
Is there any workaround for MPICH? Facing the same error with apptainer version 1.1.9-1.el8 and mpich 4.1 + UCX 1.14 |
@rodrigo-ceccato This is temporary workaround (permanent solution will be released as v1.3.0) but If it doesn't work because of |
@yosefe Thanks for this suggestion. I've tried:
But I get this error:
Shall I use this differently? |
@DavidCdeB |
@panda1100 Thanks. I added that, but I'm still receiving:
|
@DavidCdeB What container solution do you use? Apptainer, Podman, etc. |
@panda1100 I'm sorry, can you please expand which command should I execute to obtain this information? Many thanks again. |
@DavidCdeB How did you build your executable?? |
Thanks, could you please specify more specifically, which information is required. |
Hi @hoopoepg -san, |
Trying to run a UCX based Open MPI with each process in a user namespace (container) breaks UCX completely it seems:
I fixed a similar thing recently in Open MPI vader: open-mpi/ompi#6844
Autodetect that each process is running in a different user namespace and do not use
ptrace()
based copy mechanisms.This can be easily reproduced on Fedora 31 with:
The text was updated successfully, but these errors were encountered: