Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError when training on multiple GPUs #1

Open
Walid-Ked opened this issue Feb 26, 2023 · 1 comment
Open

RuntimeError when training on multiple GPUs #1

Walid-Ked opened this issue Feb 26, 2023 · 1 comment

Comments

@Walid-Ked
Copy link

Walid-Ked commented Feb 26, 2023

I'm trying to train the model from scratch on a custom subset of Imagenet, the training works fine on a single gpu, but when running on multiple gpus I get the following error:

Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 3 does not equal 0 (while checking arguments for cudnn_batch_norm)

my configuration file looks like this:

name: train_colorformer
model_type: LABGANRGBModel
scale: 1
num_gpu: 4
manual_seed: 0
queue_size: 64

and I'm using CUDA_VISIBLE_DEVICES to specify the gpus to be used.
I tried looking for any inputs that are not moved to cuda but without success.

@taotaoshuai0428
Copy link

Could you please tell me what is the 'meta_info_file' in the (train_colorformer.yaml), how to configure it? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants