-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to see the training plots? #6
Comments
We inherited the tensorboard code from the original tensorpack implementation of BA3C but never used it. Making it work correctly will require some work. We never used tensorboard for viewing these the plots. We have our own in-house tool for this (https://neptune.ml/. We handle the distributed setup by aggregating all the datapoints from all the workers in a single worker using sockets. The implementation can be found here. About the mean score plot: in the original TP implementation there was only one worker which periodically stopped the training to perform evaluation. Obviously, this doesn't scale well on distributed setups so we changed it to have a single worker that performs evaluations for every model checkpoint saved and shut down evaluations in all other workers to make them work faster. The tensorboard files you're seeing are leftovers of these changes. Additionally we also had 'online score' plots that showed the scores achieved in the training games (which are played a bit differently from the evalution games so the scores aren't exactly the same) I guess you could try to modify this file so that it sends the results to tensorboard instead of neptune and then you'd have all the results from all workers in a single tensorboard file. |
I see, thank you.
Why are they played a bit differently? What are the differences? Isn't that an issue: to train on game A, but evaluate on a slightly different game A', without matching scores? |
The policy network assigns a probability to each action that the agent can take. During evaluation you aim to achieve the highest possible score so you just pick the action with the highest probability. In training you want to have exploration, so instead of choosing max, you sample from this distribution in order to take a different action from time to time and thus explore a different trajectory. In most games this yields worse scores (there are exceptions to this rule though) but speeds up the training. |
That sounds good, thanks. |
I tried to save stats to TensorBoard event files, but adding that requires more effort. So, I switched to the idea of using Neptune UI.
Ok, the code is running a ZeroMQ server that aggregates all the data from the workers. Then it is logging |
This project was completed using the neptune version 1.4, which is no longer supported. I guess you could port it all to neptune 2.0 (which should not be that hard since the API haven't changed substantially). Otherwise you could just plot the CSV files using some other tool like matplotlib. |
I added saving With this, I tried to reproduce the results on The plots show only max scores, right? In further comparisons I will guess so. (Because near Table 3 it's written "Best stable score and time (in hours) to achieve it are given".) Breakout, after 6 hours and ~50k global steps: After 22 minutes I got 320 max_score. This matches your plot. Seaquest, also after 6 hours and ~50k global steps: After 43 minutes I got 1940 and later 1840. You achieved 1840 in 20 minutes. Maybe mine training run was unlucky. |
All evaluation scores in the paper are mean scores from 50 consecutive games. Please DO NOT compare online scores with evaluation scores since they are achieved using a different algorithm and will be different. If you want to reproduce the results from the paper you'll have to get the evaluation score by using the evaluation node and saving the data from it. Moreover, this setup requires extensive tuning to run efficiently. Especially important hyperparameters:
|
Also please make sure you're using a TensorFlow version that supports SIMD instructions for CPU, especially AVX-2. We were using the optimized version provided by Intel. |
One more thing -- there's a possibility that due to some race conditions not all of the workers started the computations correctly. Unfortunately the setup is not as stable as I'd like it to be. There's a data channel called 'active_workers' somewhere, it says how many workers are currently participating in computations. After a few minutes of training all workers should be working correctly, if not you can try restarting the training from scratch. Should you like to debug the situation to make the training more stable you can count on my help :) |
This is the
Sure, I understand the difference, I didn't compare those.
This is
Oh, in my previous post I forgot to include the full command I used:
So the hyperparams are exactly the same, I took them from README.
Ah, I'm sorry, I missed this completely. With this it will be probably much faster - will try it. Thanks!
I think I saw this once. One line was repeated over and over in the log, workers didn't start. After restarting it was fine.
So far it happened to me once on ~30 runs, it's not biting me hard enough. Prefer to spend time on other things. I checked If you or anybody else would like to debug it - I can offer some guidance here. A good first step would be: Create a small script starting about 50 run_job.py with 4-10 workers and 1 ps, pipe outputs to {1..50} log files, If I see this error again, I will paste it. |
With Intel's TensorFlow, same command as before, after 6 hours both runs reached ~150k global steps (3x more than before): Breakout, mean_score 292 after 28 minutes, quite close to ~330 from the paper, but a huge drop after: Seaquest, mean_score 1690 after 23 minutes, also close to paper's 1840, also the plot shape is similar: The differences perhaps could be due to variance in between runs, i.e. if I run Breakout training more times, maybe there would be a plot very close to the one from the paper. |
The Seaquest results look good to me. Setup for Breakout might be somewhat unstable in a sense that the 'catastrophic forgetting' is quite common for this game and learning rate. Also the training should be a bit faster (our mean time (average from 10 experiments) to mean score of 300 was 21 minutes). I suggest running more experiments and seeing if these issues persist. Please bear in mind that our experiments from the paper did not use any learning rate scheduling. If you're really after getting very high scores for these games, or very stable learning without catastrophes, I suggest you play with the 'schedule_hyper' parameter. The schedule for annealing the learning rate and exploration factor can be found here, by using it you could start with a high learning rate of 0.001 at the beginning and then drop it after ~15 minutes of training to avoid catastrophes and get much higher scores in some games (especially breakout). |
Now the log for
python run_job.py -n 5 -g 60 -c 12 --simulator_procs 10 --use_sync --name breakout --short
looks better. But I don't know how to view the plots.I noticed that in the
experiments/breakout_1517329773.87/storage/atari_trainlog
there are 4 directories corresponding to workers and each of them has a TensorBoard events file, e.g.events.out.tfevents.1517329790.p0112
.I downloaded with
scp
the wholeexperiments
directory and runtensorboard --logdir=experiments
to view it in the browser:After 40 minutes of training on 4 workers I would expect to see the full
mean_score
plot, not only 1 data point with value 1.8 for step 100. Whereas one can see from the log that each worker made about 1800 steps.How to see the training plots?
The text was updated successfully, but these errors were encountered: