-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running container using nvidia-docker2 #84
Comments
You'll all be happy to hear that I was able to solve this issue. What seems to be happening is that the opengl libraries inside the container were not compatible with what is running on my system. I tried both the Nvidia drivers 390 and 440 but no luck. I'm not sure what the actual issue is, maybe it also has something to do with how the X server is configured on the host. What worked in resolving this issue is installing libglvnd which is designed as a compatibility layer between the graphics libraries. It supports GLX which is used by LabelFusion. I derived a new image which is based on the labelfusion image and installed those libraries as they are installed in the official Nvidia cudagl images. All credit to them. See the end of the message for the exact Dockerfile I used. It seems the current setup is quite reliant on how the host machine is set up. I would create a formal pull request updating the image, but unfortunately, I was unable to build the original image. Quite a few libraries seem to have updated and some of the dependencies no longer build. It could be worth looking into updating e.g. Director to use a newer official version to reduce the risk of this software being left behind permanently. Would anyone more familiar with these projects be able to estimate how big an undertaking that would be? What would be the main issues? Here is the Dockerfile I used to build my image.
|
Thanks, @kekeblom , this resolved the issue for me! 👍🏻 Note that this expects COPY --from=nvidia/opengl:1.0-glvnd-runtime-ubuntu16.04 \
/usr/local/share/glvnd/egl_vendor.d/10_nvidia.json \
/usr/local/share/glvnd/egl_vendor.d/10_nvidia.json
|
I seem to have the same issue as here #74
I.e. when I run
run_alignment_tool
in the mounted data directory (using the provided sample data), I get:libGL error: No matching fbConfigs or visuals found
. The director GUI pops up, but it's unable to open an OpenGL context.I get the exact same error message when I run the
glxgears
test program from themesa-utils
package.My understanding of the issue is that the OpenGL libraries that are inside the container do not match those which are running on the host computer or are unable to load.
I'm running
nvidia-docker2
and my Docker version is19.03.8, build afacb8b7f0
. I'm running Nvidia driver version 440.64.00 on the host machine.It seems Nvidia does not officially support glx on Nvidia docker. However, they do have cudagl images available here https://hub.docker.com/r/nvidia/cudagl. I'm not exactly sure which part of that image's docker file is key, but on that image, I am able to run
glxgears
i.e. OpenGL runs fine.I could rebuild the container using that image. I tried that, but the container no longer builds. There is an error message related to vtk not being the right version. I can get around this by changing/adding
-DUSE_SYSTEM_VTK:BOOL=OFF
and-DUSE_PRECOMPILED_VTK=ON
in thecompile_all.sh
script. However, the install then fails for some other reason which I didn't fully investigate. Other issues might come up though as the cuda version would get bumped up to 9 and some system packages might get updated.Probably there is just some minor glitch on my system, which is why I'm opening this issue. A comment on the original issue I referenced, says "you should pull the nvidia-docker2 image, not the nvidia-docker1." and this seems to have resolved that persons issue. However, I don't exactly know what that means. I'm running
nvidia-docker2
and I'm pulling the latest image from docker hub usingnvidia-docker pull robotlocomotion/labelfusion
.Does anyone know what might be wrong here?
The text was updated successfully, but these errors were encountered: