Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not finding CUDA when building docker image #287

Open
Labulitiolle opened this issue Apr 28, 2021 · 0 comments
Open

Not finding CUDA when building docker image #287

Labulitiolle opened this issue Apr 28, 2021 · 0 comments

Comments

@Labulitiolle
Copy link

Labulitiolle commented Apr 28, 2021

RUN git clone https://github.com/NVIDIA/apex.git && cd apex && python setup.py install --cuda_ext --cpp_ext

When running the container/build_and_push.sh script, I get the following error:

> [13/19] RUN git clone https://github.com/NVIDIA/apex.git && cd apex && python setup.py install --cuda_ext --cpp_ext:                                                                         
#16 0.730 Cloning into 'apex'...                                                                                                                                                                
#16 6.454 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'                                                                                                                           
#16 6.454 /opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)                                                         
#16 6.454   return torch._C._cuda_getDeviceCount() > 0
#16 6.456 
#16 6.456 Warning: Torch did not find available GPUs on this system.
#16 6.456  If your intention is to cross-compile, this is not an error.
#16 6.456 By default, Apex will cross-compile for Pascal (compute capabilities 6.0, 6.1, 6.2),
#16 6.456 Volta (compute capability 7.0), Turing (compute capability 7.5),
#16 6.456 and, if the CUDA version is >= 11.0, Ampere (compute capability 8.0).
#16 6.456 If you wish to cross-compile for a single specific architecture,
#16 6.456 export TORCH_CUDA_ARCH_LIST="compute capability" before running setup.py.
#16 6.456 
#16 6.466 
#16 6.466 
#16 6.466 torch.__version__  = 1.7.1

... (skipping links)...

#12 1.206 Collecting torch
#12 1.207   Created temporary directory: /tmp/pip-unpack-3pmclieq
#12 1.209   Looking up "https://files.pythonhosted.org/packages/56/74/6fc9dee50f7c93d6b7d9644554bdc9692f3023fa5d1de779666e6bf8ae76/torch-1.8.1-cp37-cp37m-manylinux1_x86_64.whl" in the cache
#12 1.210   No cache entry available
#12 1.211   Starting new HTTPS connection (1): files.pythonhosted.org:443
#12 1.368   https://files.pythonhosted.org:443 "GET /packages/56/74/6fc9dee50f7c93d6b7d9644554bdc9692f3023fa5d1de779666e6bf8ae76/torch-1.8.1-cp37-cp37m-manylinux1_x86_64.whl HTTP/1.1" 200 804097215
#12 1.371   Downloading torch-1.8.1-cp37-cp37m-manylinux1_x86_64.whl (804.1 MB)
#12 74.25   Ignoring unknown cache-control directive: immutable
#12 74.25   Updating cache with response from "https://files.pythonhosted.org/packages/56/74/6fc9dee50f7c93d6b7d9644554bdc9692f3023fa5d1de779666e6bf8ae76/torch-1.8.1-cp37-cp37m-manylinux1_x86_64.whl"
#12 74.25   Caching due to etag
#12 80.17 Killed
------
executor failed running [/bin/sh -c pip install --trusted-host pypi.python.org -v --log /tmp/pip.log torch torchvision]: exit code: 137

Is there a versioning mismatch between torch and CUDA or should the cache directory be defined?

@Labulitiolle Labulitiolle changed the title Not finding CUDA when Not finding CUDA when building docker image Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant