Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaErrorInvalidDeviceFunction: invalid device function #48

Open
illuosion opened this issue Aug 18, 2023 · 1 comment
Open

cudaErrorInvalidDeviceFunction: invalid device function #48

illuosion opened this issue Aug 18, 2023 · 1 comment

Comments

@illuosion
Copy link

hello, I tested the experiment by following the install guide,and I come across the problem about cuda

  • using 8 Tesla K80s [8,9,10,11,12,13,14,15,16]
    `[08/18 09:54:47 main-logger]: #Model parameters: 32311715
    [08/18 09:54:47 main-logger]: class_weight: tensor([ 3.1557, 8.7029, 7.8281, 6.1354, 6.3161, 7.9937, 8.9704, 10.1922,
    1.6155, 4.2187, 1.9385, 5.5455, 2.0198, 2.6261, 1.3212, 5.1102,
    2.5492, 5.8585, 7.3929], device='cuda:0')
    [08/18 09:54:47 main-logger]: loss_name: ce_loss
    [08/18 09:54:47 main-logger]: train_data samples: '19130'
    [08/18 09:54:47 main-logger]: val_data samples: '4071'
    [08/18 09:54:47 main-logger]: scheduler: Poly. scheduler_update: step
    [08/18 09:54:47 main-logger]: lr: [0.006, 0.0006000000000000001]
    [Exception|implicit_gemm_pair]indices=torch.Size([78654, 4]),bs=1,ss=[1977, 1756, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
    SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
    [Exception|implicit_gemm_pair]indices=torch.Size([80831, 4]),bs=1,ss=[1624, 2049, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
    SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
    [Exception|implicit_gemm_pair]indices=torch.Size([92342, 4]),bs=1,ss=[2048, 1388, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
    SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
    [Exception|implicit_gemm_pair]indices=torch.Size([82930, 4]),bs=1,ss=[2049, 2049, 129],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
    SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
    [Exception|implicit_gemm_pair]indices=torch.Size([73867, 4]),bs=1,ss=[2049, 1523, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
    SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
    [Exception|implicit_gemm_pair]indices=torch.Size([93205, 4]),bs=1,ss=[1718, 2048, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
    SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
    [Exception|implicit_gemm_pair]indices=torch.Size([88331, 4]),bs=1,ss=[2049, 2049, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
    SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
    [Exception|implicit_gemm_pair]indices=torch.Size([84985, 4]),bs=1,ss=[1975, 1676, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
    SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
    Traceback (most recent call last):
    File "train.py", line 902, in
    main()
    File "train.py", line 90, in main
    mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))
    File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
    File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
    File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
    torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/fyy/SphereFormer-master/train.py", line 410, in main_worker
loss_train, mIoU_train, mAcc_train, allAcc_train = train(train_loader, model, criterion, optimizer, epoch, scaler, scheduler, gpu)
File "/home/fyy/SphereFormer-master/train.py", line 498, in train
output = model(sinput, xyz, batch)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyy/SphereFormer-master/model/unet_spherical_transformer.py", line 284, in forward
output = self.input_conv(input)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/spconv/pytorch/modules.py", line 137, in forward
input = module(input)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/spconv/pytorch/conv.py", line 404, in forward
raise e
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/spconv/pytorch/conv.py", line 395, in forward
timer=input._timer)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/spconv/pytorch/ops.py", line 359, in get_indice_pairs_implicit_gemm
mask_argsort_tv[j], stream)
RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function`

@illuosion
Copy link
Author

I have searched for resolution but let me change the Pytorch version to 1.9.x,but it came into a new problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant