Slow training speed and low GPU utilization #147

Sau1-Goodman · 2024-06-17T08:13:06Z

Hi, thank you for your work, it's amazing! I'm a student who just started DRL. I set up the simulation environment according to the tutorial and used your original program to train (by executing 'python3 train_velodyne_td3.py'). In RVIZ, I can see that the robot is running normally (just like the GIF image in the example). But the GPU usage is very low (power: 48W/170W, Memory-usage: 3074MiB/12050MiB), and the time between each epoch is also very long (about 10 minutes). My computer's CPU is AMD 5800X, GPU is RTX3060, nvidia driver is 470.256.02, and cudatoolkit 11.3.1 is installed in the anaconda environment. Execute 'torch.cuda.is_available()' in the python environment, and the output result is True. Is this training speed and GPU usage normal? Thank you very much for your answer!

reiniscimurs · 2024-06-17T19:54:13Z

Hi,

Cuda will mostly be used only during the train call of the model and memory consumption depends on your batch size/gradients/model size. There is no substantial reason why consumption should be high.

10 minutes between epochs seems reasonable. By default, epoch will run for approx. 5000 steps with step length 0.1 seconds. That means that 1 epoch will run for at least 8.3 minutes. So that makes sense to me.

Sau1-Goodman · 2024-06-18T02:18:57Z

Thank you very much for your response. I have two more queries that I'm hoping to clarify：
1、I'm unsure if it's feasible to make adjustments to some parameters, such as increasing the batch_size on line 119 of train_velodyne_td3.py , to enhance GPU utilization. This could potentially reduce training time and better leverage the computational power of the GPU. Do you think this approach is viable? Additionally, do you have any other suggestions to optimize this process?
2、Are the training results in the TD3 -> result & run folder? I'm trying to deploying these trained results onto an actual robot. Could you kindly offer me some tips on how to utilize these results? I would greatly appreciate any suggestions or pointers you might have.
Many thanks!

reiniscimurs · 2024-06-18T08:07:11Z

It would increase GPU consumption but only during the backpropagation. The average consumption will probably stay the same. It would not realistically speed up the training as most of the time is spent collecting samples/executing policy. See tutorial for details: https://medium.com/p/b744852345ac
No, the weights are stored in pytorch_models, see description in each folder for what is stored in them. See test_velodyne_td3.py on how to load model weights. Deploying on real robot will depend entirely on the robot and sensors used, but you can adapt the env file with the proper topics once you have connected everything to ROS.

Sau1-Goodman · 2024-06-18T09:13:49Z

Thank you very much for your answer, I will work harder！

reiniscimurs mentioned this issue Jul 15, 2024

Effect of eval_freq #155

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow training speed and low GPU utilization #147

Slow training speed and low GPU utilization #147

Sau1-Goodman commented Jun 17, 2024

reiniscimurs commented Jun 17, 2024

Sau1-Goodman commented Jun 18, 2024

reiniscimurs commented Jun 18, 2024

Sau1-Goodman commented Jun 18, 2024

Slow training speed and low GPU utilization #147

Slow training speed and low GPU utilization #147

Comments

Sau1-Goodman commented Jun 17, 2024

reiniscimurs commented Jun 17, 2024

Sau1-Goodman commented Jun 18, 2024

reiniscimurs commented Jun 18, 2024

Sau1-Goodman commented Jun 18, 2024