Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libfabric with ucx breaks with user namespaces #5571

Open
jappa opened this issue Aug 13, 2020 · 3 comments
Open

Libfabric with ucx breaks with user namespaces #5571

jappa opened this issue Aug 13, 2020 · 3 comments
Assignees

Comments

@jappa
Copy link

jappa commented Aug 13, 2020

When using libfrabric mlx provider together with ucx 1.8.1 fails with the following error when called with processes from different user namespaces (using Singularity). A similar issue was addressed in #4511 for OpenMPI and I wonder if the libfabric interface is calling a different function that requires the same fix to disable CMA when user namespaces differ.

Setting UCX_POSIX_USE_PROC_LINK=n or setting UCX_TLS=tcp,self provides a workaround

OS RHEL 8.1

[0] MPI startup(): libfabric version: 1.10.0a1-impi
[0] MPI startup(): libfabric provider: mlx
[0] MPI startup(): detected mlx provider, set device name to "mlx"
[0] MPI startup(): max_ch4_vcis: 1, max_reg_eps 1, enable_sep 0, enable_shared_ctxs 0, do_av_insert 1
[0] MPI startup(): addrname_len: 512, addrname_firstlen: 512
[1597347557.520494] [r2i7n3:64774:0] mm_posix.c:195 UCX ERROR open(file_name=/proc/64773/fd/21 flags=0x0) failed: Permission denied
[1597347557.520503] [r2i7n3:64774:0] mm_ep.c:149 UCX ERROR mm ep failed to connect to remote FIFO id 0xc00000054000fd05: Shared memory error

ucx_info -v
# UCT version=1.8.1 revision 6b29558
# configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check --prefix=/lustre/sw/ucx/1.8.1/gcc_820 --enable-shared --enable-static --enable-numa

@yosefe
Copy link
Contributor

yosefe commented Aug 21, 2020

@jappa the PROC_LINK method should be disabled for non-default PID namespace, but currently we don't check USER namespace.
This is missing feature in UCX shared memory for Containers: "support different user namespaces"

@hoopoepg
Copy link
Contributor

hi @jappa
do you use shared IPC namespace with host system?

@haampie
Copy link

haampie commented Jul 5, 2022

@yosefe can confirm, ran into the same issue when using bwrap, and the issue went away by either passing --unshare-pid to bwrap or using UCX_POSIX_USE_PROC_LINK=n

Is this fixable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants