RuntimeError when training on multiple GPUs #1

Walid-Ked · 2023-02-26T00:01:25Z

I'm trying to train the model from scratch on a custom subset of Imagenet, the training works fine on a single gpu, but when running on multiple gpus I get the following error:

Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 3 does not equal 0 (while checking arguments for cudnn_batch_norm)

my configuration file looks like this:

name: train_colorformer
model_type: LABGANRGBModel
scale: 1
num_gpu: 4
manual_seed: 0
queue_size: 64

and I'm using CUDA_VISIBLE_DEVICES to specify the gpus to be used.
I tried looking for any inputs that are not moved to cuda but without success.

taotaoshuai0428 · 2023-06-14T06:35:35Z

Could you please tell me what is the 'meta_info_file' in the (train_colorformer.yaml), how to configure it? Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError when training on multiple GPUs #1

RuntimeError when training on multiple GPUs #1

Walid-Ked commented Feb 26, 2023 •

edited

Loading

taotaoshuai0428 commented Jun 14, 2023

RuntimeError when training on multiple GPUs #1

RuntimeError when training on multiple GPUs #1

Comments

Walid-Ked commented Feb 26, 2023 • edited Loading

taotaoshuai0428 commented Jun 14, 2023

Walid-Ked commented Feb 26, 2023 •

edited

Loading