You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hello, I tested the experiment by following the install guide,and I come across the problem about cuda
using 8 Tesla K80s [8,9,10,11,12,13,14,15,16]
`[08/18 09:54:47 main-logger]: #Model parameters: 32311715
[08/18 09:54:47 main-logger]: class_weight: tensor([ 3.1557, 8.7029, 7.8281, 6.1354, 6.3161, 7.9937, 8.9704, 10.1922,
1.6155, 4.2187, 1.9385, 5.5455, 2.0198, 2.6261, 1.3212, 5.1102,
2.5492, 5.8585, 7.3929], device='cuda:0')
[08/18 09:54:47 main-logger]: loss_name: ce_loss
[08/18 09:54:47 main-logger]: train_data samples: '19130'
[08/18 09:54:47 main-logger]: val_data samples: '4071'
[08/18 09:54:47 main-logger]: scheduler: Poly. scheduler_update: step
[08/18 09:54:47 main-logger]: lr: [0.006, 0.0006000000000000001]
[Exception|implicit_gemm_pair]indices=torch.Size([78654, 4]),bs=1,ss=[1977, 1756, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
[Exception|implicit_gemm_pair]indices=torch.Size([80831, 4]),bs=1,ss=[1624, 2049, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
[Exception|implicit_gemm_pair]indices=torch.Size([92342, 4]),bs=1,ss=[2048, 1388, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
[Exception|implicit_gemm_pair]indices=torch.Size([82930, 4]),bs=1,ss=[2049, 2049, 129],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
[Exception|implicit_gemm_pair]indices=torch.Size([73867, 4]),bs=1,ss=[2049, 1523, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
[Exception|implicit_gemm_pair]indices=torch.Size([93205, 4]),bs=1,ss=[1718, 2048, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
[Exception|implicit_gemm_pair]indices=torch.Size([88331, 4]),bs=1,ss=[2049, 2049, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
[Exception|implicit_gemm_pair]indices=torch.Size([84985, 4]),bs=1,ss=[1975, 1676, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
Traceback (most recent call last):
File "train.py", line 902, in
main()
File "train.py", line 90, in main
mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/fyy/SphereFormer-master/train.py", line 410, in main_worker
loss_train, mIoU_train, mAcc_train, allAcc_train = train(train_loader, model, criterion, optimizer, epoch, scaler, scheduler, gpu)
File "/home/fyy/SphereFormer-master/train.py", line 498, in train
output = model(sinput, xyz, batch)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyy/SphereFormer-master/model/unet_spherical_transformer.py", line 284, in forward
output = self.input_conv(input)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/spconv/pytorch/modules.py", line 137, in forward
input = module(input)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/spconv/pytorch/conv.py", line 404, in forward
raise e
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/spconv/pytorch/conv.py", line 395, in forward
timer=input._timer)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/spconv/pytorch/ops.py", line 359, in get_indice_pairs_implicit_gemm
mask_argsort_tv[j], stream)
RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function`
The text was updated successfully, but these errors were encountered:
hello, I tested the experiment by following the install guide,and I come across the problem about cuda
`[08/18 09:54:47 main-logger]: #Model parameters: 32311715
[08/18 09:54:47 main-logger]: class_weight: tensor([ 3.1557, 8.7029, 7.8281, 6.1354, 6.3161, 7.9937, 8.9704, 10.1922,
1.6155, 4.2187, 1.9385, 5.5455, 2.0198, 2.6261, 1.3212, 5.1102,
2.5492, 5.8585, 7.3929], device='cuda:0')
[08/18 09:54:47 main-logger]: loss_name: ce_loss
[08/18 09:54:47 main-logger]: train_data samples: '19130'
[08/18 09:54:47 main-logger]: val_data samples: '4071'
[08/18 09:54:47 main-logger]: scheduler: Poly. scheduler_update: step
[08/18 09:54:47 main-logger]: lr: [0.006, 0.0006000000000000001]
[Exception|implicit_gemm_pair]indices=torch.Size([78654, 4]),bs=1,ss=[1977, 1756, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
[Exception|implicit_gemm_pair]indices=torch.Size([80831, 4]),bs=1,ss=[1624, 2049, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
[Exception|implicit_gemm_pair]indices=torch.Size([92342, 4]),bs=1,ss=[2048, 1388, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
[Exception|implicit_gemm_pair]indices=torch.Size([82930, 4]),bs=1,ss=[2049, 2049, 129],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
[Exception|implicit_gemm_pair]indices=torch.Size([73867, 4]),bs=1,ss=[2049, 1523, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
[Exception|implicit_gemm_pair]indices=torch.Size([93205, 4]),bs=1,ss=[1718, 2048, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
[Exception|implicit_gemm_pair]indices=torch.Size([88331, 4]),bs=1,ss=[2049, 2049, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
[Exception|implicit_gemm_pair]indices=torch.Size([84985, 4]),bs=1,ss=[1975, 1676, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
Traceback (most recent call last):
File "train.py", line 902, in
main()
File "train.py", line 90, in main
mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/fyy/SphereFormer-master/train.py", line 410, in main_worker
loss_train, mIoU_train, mAcc_train, allAcc_train = train(train_loader, model, criterion, optimizer, epoch, scaler, scheduler, gpu)
File "/home/fyy/SphereFormer-master/train.py", line 498, in train
output = model(sinput, xyz, batch)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyy/SphereFormer-master/model/unet_spherical_transformer.py", line 284, in forward
output = self.input_conv(input)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/spconv/pytorch/modules.py", line 137, in forward
input = module(input)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/spconv/pytorch/conv.py", line 404, in forward
raise e
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/spconv/pytorch/conv.py", line 395, in forward
timer=input._timer)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/spconv/pytorch/ops.py", line 359, in get_indice_pairs_implicit_gemm
mask_argsort_tv[j], stream)
RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function`
The text was updated successfully, but these errors were encountered: