How to see the training plots? #6

AdamStelmaszczyk · 2018-01-30T17:44:41Z

Now the log for python run_job.py -n 5 -g 60 -c 12 --simulator_procs 10 --use_sync --name breakout --short looks better. But I don't know how to view the plots.

I noticed that in the experiments/breakout_1517329773.87/storage/atari_trainlog there are 4 directories corresponding to workers and each of them has a TensorBoard events file, e.g. events.out.tfevents.1517329790.p0112.

I downloaded with scp the whole experiments directory and run tensorboard --logdir=experiments to view it in the browser:

After 40 minutes of training on 4 workers I would expect to see the full mean_score plot, not only 1 data point with value 1.8 for step 100. Whereas one can see from the log that each worker made about 1800 steps.

How to see the training plots?

The text was updated successfully, but these errors were encountered:

tgrel · 2018-01-31T09:33:15Z

We inherited the tensorboard code from the original tensorpack implementation of BA3C but never used it. Making it work correctly will require some work.

We never used tensorboard for viewing these the plots. We have our own in-house tool for this (https://neptune.ml/. We handle the distributed setup by aggregating all the datapoints from all the workers in a single worker using sockets. The implementation can be found here.

About the mean score plot: in the original TP implementation there was only one worker which periodically stopped the training to perform evaluation. Obviously, this doesn't scale well on distributed setups so we changed it to have a single worker that performs evaluations for every model checkpoint saved and shut down evaluations in all other workers to make them work faster. The tensorboard files you're seeing are leftovers of these changes.

Additionally we also had 'online score' plots that showed the scores achieved in the training games (which are played a bit differently from the evalution games so the scores aren't exactly the same)

I guess you could try to modify this file so that it sends the results to tensorboard instead of neptune and then you'd have all the results from all workers in a single tensorboard file.

AdamStelmaszczyk · 2018-01-31T09:58:01Z

I see, thank you.

Additionally we also had 'online score' plots that showed the scores achieved in the training games (which are played a bit differently from the evalution games so the scores aren't exactly the same)

Why are they played a bit differently? What are the differences? Isn't that an issue: to train on game A, but evaluate on a slightly different game A', without matching scores?

tgrel · 2018-01-31T10:03:16Z

The policy network assigns a probability to each action that the agent can take. During evaluation you aim to achieve the highest possible score so you just pick the action with the highest probability. In training you want to have exploration, so instead of choosing max, you sample from this distribution in order to take a different action from time to time and thus explore a different trajectory. In most games this yields worse scores (there are exceptions to this rule though) but speeds up the training.

AdamStelmaszczyk · 2018-01-31T10:17:54Z

That sounds good, thanks.

AdamStelmaszczyk · 2018-02-01T18:06:44Z

I tried to save stats to TensorBoard event files, but adding that requires more effort. So, I switched to the idea of using Neptune UI.

We never used tensorboard for viewing these the plots. We have our own in-house tool for this (https://neptune.ml/. We handle the distributed setup by aggregating all the datapoints from all the workers in a single worker using sockets. The implementation can be found here.

Ok, the code is running a ZeroMQ server that aggregates all the data from the workers. Then it is logging ##### Sending to neptune: online_score : 0.241437337531 , 1.4 ##### and it saves stats to 16 CSV files. How to view the plots in Neptune UI? I have a Neptune account.

tgrel · 2018-02-05T14:12:18Z

This project was completed using the neptune version 1.4, which is no longer supported. I guess you could port it all to neptune 2.0 (which should not be that hard since the API haven't changed substantially).

Otherwise you could just plot the CSV files using some other tool like matplotlib.

AdamStelmaszczyk · 2018-02-05T18:52:41Z

I added saving online score to tenserboard events file: #7.

With this, I tried to reproduce the results on Breakout-v0 and Seaquest-v0 from your research paper:

The plots show only max scores, right? In further comparisons I will guess so. (Because near Table 3 it's written "Best stable score and time (in hours) to achieve it are given".)

Breakout, after 6 hours and ~50k global steps:

After 22 minutes I got 320 max_score. This matches your plot.

Seaquest, also after 6 hours and ~50k global steps:

After 43 minutes I got 1940 and later 1840. You achieved 1840 in 20 minutes. Maybe mine training run was unlucky.

tgrel · 2018-02-06T08:09:55Z

All evaluation scores in the paper are mean scores from 50 consecutive games. Please DO NOT compare online scores with evaluation scores since they are achieved using a different algorithm and will be different. If you want to reproduce the results from the paper you'll have to get the evaluation score by using the evaluation node and saving the data from it.

Moreover, this setup requires extensive tuning to run efficiently. Especially important hyperparameters:

learning rate: 0.001
number of workers: 64
number of parameter servers: 4
synchronous training
epsilon for Adam optimizer: 1e-8
local batch size: 32

tgrel · 2018-02-06T08:15:32Z

Also please make sure you're using a TensorFlow version that supports SIMD instructions for CPU, especially AVX-2. We were using the optimized version provided by Intel.

tgrel · 2018-02-06T09:10:10Z

One more thing -- there's a possibility that due to some race conditions not all of the workers started the computations correctly. Unfortunately the setup is not as stable as I'd like it to be.

There's a data channel called 'active_workers' somewhere, it says how many workers are currently participating in computations. After a few minutes of training all workers should be working correctly, if not you can try restarting the training from scratch.

Should you like to debug the situation to make the training more stable you can count on my help :)

AdamStelmaszczyk · 2018-02-06T09:40:58Z

All evaluation scores in the paper are mean scores from 50 consecutive games.

This is the mean_score (saved to mean_score.csv and now mean_score in tensorboard). Please correct me if I'm wrong.

Please DO NOT compare online scores with evaluation scores since they are achieved using a different algorithm and will be different.

Sure, I understand the difference, I didn't compare those.

If you want to reproduce the results from the paper you'll have to get the evaluation score by using the evaluation node and saving the data from it.

This is mean_score as this line is executed. args.eval_node is False, we can't pass --eval_node, otherwise InvalidArgumentError is raised, #2.

Especially important hyperparameters:

Oh, in my previous post I forgot to include the full command I used:

python run_job.py -n 71 -g 60 -c 12 -o adam --use_sync -l 0.001 -b 32 --fc_neurons 128 --simulator_procs 10 --ps 4 --fc_init u
niform --conv_init normal --fc_splits 4 --epsilon 1e-8 --beta1 0.8 --beta2 0.75 --save_every 1000 -e Seaquest-v0 --name seaque
st

So the hyperparams are exactly the same, I took them from README.

Also please make sure you're using a TensorFlow version that supports SIMD instructions for CPU, especially AVX-2. We were using the optimized version provided by Intel.

Ah, I'm sorry, I missed this completely. With this it will be probably much faster - will try it. Thanks!

One more thing -- there's a possibility that due to some race conditions not all of the workers started the computations correctly.

I think I saw this once. One line was repeated over and over in the log, workers didn't start. After restarting it was fine.

Should you like to debug the situation to make the training more stable you can count on my help :)

So far it happened to me once on ~30 runs, it's not biting me hard enough. Prefer to spend time on other things. I checked active_workers.csv, it always was 66 or 65.

If you or anybody else would like to debug it - I can offer some guidance here. A good first step would be:

Create a small script starting about 50 run_job.py with 4-10 workers and 1 ps, pipe outputs to {1..50} log files, tail -f all of them to find the faulty one. Paste that full output log here.

If I see this error again, I will paste it.

AdamStelmaszczyk · 2018-02-07T21:53:05Z

With Intel's TensorFlow, same command as before, after 6 hours both runs reached ~150k global steps (3x more than before):

Breakout, mean_score 292 after 28 minutes, quite close to ~330 from the paper, but a huge drop after:

Seaquest, mean_score 1690 after 23 minutes, also close to paper's 1840, also the plot shape is similar:

The differences perhaps could be due to variance in between runs, i.e. if I run Breakout training more times, maybe there would be a plot very close to the one from the paper.

tgrel · 2018-02-08T09:25:58Z

The Seaquest results look good to me. Setup for Breakout might be somewhat unstable in a sense that the 'catastrophic forgetting' is quite common for this game and learning rate. Also the training should be a bit faster (our mean time (average from 10 experiments) to mean score of 300 was 21 minutes). I suggest running more experiments and seeing if these issues persist.

Please bear in mind that our experiments from the paper did not use any learning rate scheduling. If you're really after getting very high scores for these games, or very stable learning without catastrophes, I suggest you play with the 'schedule_hyper' parameter. The schedule for annealing the learning rate and exploration factor can be found here, by using it you could start with a high learning rate of 0.001 at the beginning and then drop it after ~15 minutes of training to avoid catastrophes and get much higher scores in some games (especially breakout).

AdamStelmaszczyk closed this as completed Jan 31, 2018

AdamStelmaszczyk reopened this Feb 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to see the training plots? #6

How to see the training plots? #6

AdamStelmaszczyk commented Jan 30, 2018 •

edited

Loading

tgrel commented Jan 31, 2018 •

edited

Loading

AdamStelmaszczyk commented Jan 31, 2018

tgrel commented Jan 31, 2018 •

edited

Loading

AdamStelmaszczyk commented Jan 31, 2018

AdamStelmaszczyk commented Feb 1, 2018

tgrel commented Feb 5, 2018

AdamStelmaszczyk commented Feb 5, 2018

tgrel commented Feb 6, 2018 •

edited

Loading

tgrel commented Feb 6, 2018 •

edited

Loading

tgrel commented Feb 6, 2018

AdamStelmaszczyk commented Feb 6, 2018 •

edited

Loading

AdamStelmaszczyk commented Feb 7, 2018

tgrel commented Feb 8, 2018

How to see the training plots? #6

How to see the training plots? #6

Comments

AdamStelmaszczyk commented Jan 30, 2018 • edited Loading

tgrel commented Jan 31, 2018 • edited Loading

AdamStelmaszczyk commented Jan 31, 2018

tgrel commented Jan 31, 2018 • edited Loading

AdamStelmaszczyk commented Jan 31, 2018

AdamStelmaszczyk commented Feb 1, 2018

tgrel commented Feb 5, 2018

AdamStelmaszczyk commented Feb 5, 2018

tgrel commented Feb 6, 2018 • edited Loading

tgrel commented Feb 6, 2018 • edited Loading

tgrel commented Feb 6, 2018

AdamStelmaszczyk commented Feb 6, 2018 • edited Loading

AdamStelmaszczyk commented Feb 7, 2018

tgrel commented Feb 8, 2018

AdamStelmaszczyk commented Jan 30, 2018 •

edited

Loading

tgrel commented Jan 31, 2018 •

edited

Loading

tgrel commented Jan 31, 2018 •

edited

Loading

tgrel commented Feb 6, 2018 •

edited

Loading

tgrel commented Feb 6, 2018 •

edited

Loading

AdamStelmaszczyk commented Feb 6, 2018 •

edited

Loading