You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After running SCVI.setup_anndata and creating the model, when training the model with devices != 1 (ex devices = 2) an error occurs:
RuntimeError: CUDA error: initialization error
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
RuntimeError: CUDA error: initialization error
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Versions:
scvi-tools '1.1.2'
torch '2.2.1+cu118'
OS is ubuntu 22.04
I have tested ddp_notebook_find_unused_parameters_true
and it does not work either.
In fact, providing a strategy parameter causes it to fail.
Running on 1 gpu works fine. Sometime last year I was able to run on two gpus, I don't remember the exact time.
The text was updated successfully, but these errors were encountered:
Hi, sorry you're running into this issue. Did you happen to try running multi-GPU training outside of the notebook? Does it work then or is it the same error?
Apologies for the delay. Is there a different error when passing in a strategy parameter? Could you try passing in the DDPStrategy using the spawn start method?
After running SCVI.setup_anndata and creating the model, when training the model with devices != 1 (ex devices = 2) an error occurs:
RuntimeError: CUDA error: initialization error
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.import your h5ad data,
Versions:
scvi-tools '1.1.2'
torch '2.2.1+cu118'
OS is ubuntu 22.04
I have tested ddp_notebook_find_unused_parameters_true
and it does not work either.
In fact, providing a strategy parameter causes it to fail.
Running on 1 gpu works fine. Sometime last year I was able to run on two gpus, I don't remember the exact time.
The text was updated successfully, but these errors were encountered: