The rule of collision penalty #65

tongtybj · 2022-05-01T11:58:28Z

Hi, yunlong, thank you for developing this amazing repository.
There is one thing that need your double check, which is the rule of collision penalty:
https://github.com/uzh-rpg/flightmare/blob/092ff357139b2e98fc92bcdee50f38f85b55246d/flightlib/src/envs/vision_env/vision_env.cpp#L296-L300

I have confirmed the result of relative_dist would be either 1 or max_detection_range_ according to this coding.

I am not sure whether you code like this intentionally, or maybe following coding would be better?

Scalar relative_dist = (relative_pos_norm_[sort_idx] > 0) && (relative_pos_norm_[sort_idx] < max_detection_range_) ?\
      relative_pos_norm_[sort_idx]: max_detection_range_;

The text was updated successfully, but these errors were encountered:

yun-long · 2022-05-02T10:16:22Z

hi,

the relative dist is a measurement of the euclidean distance between the drone center and the obstacle.

If the actual relative dist is larger than the section range, relative dist will be clipped as the maximum detection range. otherwise, it is the same as the actual dist.

tongtybj · 2022-05-02T10:20:25Z

the relative dist is a measurement of the euclidean distance between the drone center and the obstacle.

OK. Then, you have to check https://github.com/uzh-rpg/flightmare/blob/092ff357139b2e98fc92bcdee50f38f85b55246d/flightlib/src/envs/vision_env/vision_env.cpp#L296-L300 grammerly. I mean the usage of ? : should be condition ? case1 : case2

yun-long · 2022-05-02T11:52:43Z

ohhhhhhhhhhh,
you are absolutely right. sorry.

yun-long · 2022-05-02T12:06:31Z

thanks a lot @tongtybj

tongtybj · 2022-05-02T12:26:39Z

@yun-long

You are welcome.

Actually, I also trained with the true std::exp(-1.0 * relative_dist) model, but got worse result. So I wondered that you wrote in this way intentioanlly.

yun-long · 2022-05-02T12:35:14Z

I didn't tune the reward. I am not surprised that the result is not good.

Some general suggestions are

tune the reward, check the learning curve not only for the total reward but also for each individual reward. Each individual reward component is logged. you can visualize the learning curve by cd ./saved and tensorboard --logdir=./
use different policy representations. Currently, the policy is represented via a multilayer perception, this is not a good representation for dynamic environments. Consider using a memory-based network, such as RNN/LSTM/GRU/TCN.

tongtybj · 2022-05-02T13:22:43Z

Thanks a lot for your important advice!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The rule of collision penalty #65

The rule of collision penalty #65

tongtybj commented May 1, 2022

yun-long commented May 2, 2022

tongtybj commented May 2, 2022

yun-long commented May 2, 2022

yun-long commented May 2, 2022

tongtybj commented May 2, 2022

yun-long commented May 2, 2022

tongtybj commented May 2, 2022

The rule of collision penalty #65

The rule of collision penalty #65

Comments

tongtybj commented May 1, 2022

yun-long commented May 2, 2022

tongtybj commented May 2, 2022

yun-long commented May 2, 2022

yun-long commented May 2, 2022

tongtybj commented May 2, 2022

yun-long commented May 2, 2022

tongtybj commented May 2, 2022