Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The rule of collision penalty #65

Open
tongtybj opened this issue May 1, 2022 · 7 comments
Open

The rule of collision penalty #65

tongtybj opened this issue May 1, 2022 · 7 comments

Comments

@tongtybj
Copy link
Contributor

tongtybj commented May 1, 2022

@yun-long

Hi, yunlong, thank you for developing this amazing repository.
There is one thing that need your double check, which is the rule of collision penalty:
https://github.com/uzh-rpg/flightmare/blob/092ff357139b2e98fc92bcdee50f38f85b55246d/flightlib/src/envs/vision_env/vision_env.cpp#L296-L300

I have confirmed the result of relative_dist would be either 1 or max_detection_range_ according to this coding.

I am not sure whether you code like this intentionally, or maybe following coding would be better?

Scalar relative_dist = (relative_pos_norm_[sort_idx] > 0) && (relative_pos_norm_[sort_idx] < max_detection_range_) ?\
      relative_pos_norm_[sort_idx]: max_detection_range_;
@yun-long
Copy link
Contributor

yun-long commented May 2, 2022

hi,

the relative dist is a measurement of the euclidean distance between the drone center and the obstacle.

If the actual relative dist is larger than the section range, relative dist will be clipped as the maximum detection range. otherwise, it is the same as the actual dist.

@tongtybj
Copy link
Contributor Author

tongtybj commented May 2, 2022

the relative dist is a measurement of the euclidean distance between the drone center and the obstacle.

OK. Then, you have to check https://github.com/uzh-rpg/flightmare/blob/092ff357139b2e98fc92bcdee50f38f85b55246d/flightlib/src/envs/vision_env/vision_env.cpp#L296-L300 grammerly. I mean the usage of ? : should be condition ? case1 : case2

@yun-long
Copy link
Contributor

yun-long commented May 2, 2022

ohhhhhhhhhhh,
you are absolutely right. sorry.

@yun-long
Copy link
Contributor

yun-long commented May 2, 2022

thanks a lot @tongtybj

@tongtybj
Copy link
Contributor Author

tongtybj commented May 2, 2022

@yun-long

You are welcome.

Actually, I also trained with the true std::exp(-1.0 * relative_dist) model, but got worse result. So I wondered that you wrote in this way intentioanlly.

@yun-long
Copy link
Contributor

yun-long commented May 2, 2022

I didn't tune the reward. I am not surprised that the result is not good.

Some general suggestions are

  • tune the reward, check the learning curve not only for the total reward but also for each individual reward. Each individual reward component is logged. you can visualize the learning curve by cd ./saved and tensorboard --logdir=./
  • use different policy representations. Currently, the policy is represented via a multilayer perception, this is not a good representation for dynamic environments. Consider using a memory-based network, such as RNN/LSTM/GRU/TCN.

@tongtybj
Copy link
Contributor Author

tongtybj commented May 2, 2022

Thanks a lot for your important advice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants