environment？ #13

xss1006 · 2024-12-26T09:24:39Z

Hello, thank you for your work!
I would like to ask, when normalizing the state, what is the basis for setting the values? For example:
delta_yaw_t = np.array(self.state_info['delta_yaw_t']).reshape((1, )) / 2.0
dyaw_dt_t = np.array(self.state_info['dyaw_dt_t']).reshape((1, )) / 5.0
lateral_dist_t = self.state_info['lateral_dist_t'].reshape((1, )) * 10.0
action_last = self.state_info['action_t_1'] * 10.0
future_angles = self.state_info['angles_t'] / 2.0

What do the values you divide or multiply by represent? Your answer would be of great help to me. I look forward to your response.

ShuaibinLi · 2025-01-13T11:26:04Z

No specific physical meaning, the purpose is to normalize the state.

ShuaibinLi · 2025-01-13T11:27:34Z

Similar to action normalization in continuous envs

xss1006 · 2025-01-13T11:40:00Z

Similar to action normalization in continuous envs
Thank you for your reply，If I set it to different values, will it have a significant impact?

ShuaibinLi · 2025-01-13T11:44:10Z

Similar to action normalization in continuous envs
Thank you for your reply，If I set it to different values, will it have a significant impact?

maybe, worth trying! Hope for better results.

ShuaibinLi · 2025-01-13T11:47:04Z

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/envs.py some hints?

xss1006 · 2025-01-13T12:06:04Z

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/envs.py some hints?

Thank you for your suggestion. I'll take a look at it later. I noticed that in your code, the states are not normalized to a range of [0, 1] or [-1, 1], but instead, they are divided by a value to make them fall within a different range, right?

ShuaibinLi · 2025-01-13T12:29:26Z

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/envs.py some hints?

Thank you for your suggestion. I'll take a look at it later. I noticed that in your code, the states are not normalized to a range of [0, 1] or [-1, 1], but instead, they are divided by a value to make them fall within a different range, right?

Due to their different meaning, yes. Actually, I didnot do many experiments here

xss1006 · 2025-01-13T12:46:07Z

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/envs.py some hints?

Thank you for your suggestion. I'll take a look at it later. I noticed that in your code, the states are not normalized to a range of [0, 1] or [-1, 1], but instead, they are divided by a value to make them fall within a different range, right?

Due to their different meaning, yes. Actually, I didnot do many experiments here

Sure, may I ask, if the states are simply converted from one range to another, what is the significance of normalize state？

ShuaibinLi · 2025-01-13T12:48:44Z

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/envs.py some hints?

Thank you for your suggestion. I'll take a look at it later. I noticed that in your code, the states are not normalized to a range of [0, 1] or [-1, 1], but instead, they are divided by a value to make them fall within a different range, right?

Due to their different meaning, yes. Actually, I didnot do many experiments here

Sure, may I ask, if the states are simply converted from one range to another, what is the significance of norm

Great question!! To improve the efficiency and stability of learning, states & actions are suggested to be normalized， especially in some dynamic envs (eg mujoco). While to [0,1] or [-1,1]，actions maybe a good choice, I think it is best not to normalize the state too much for better generalization (need experiments).

xss1006 · 2025-01-13T12:59:39Z

I would like to know more about why you divide by these specific values when processing different states. What is the reasoning behind choosing these particular numbers?

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/envs.py some hints?

Thank you for your suggestion. I'll take a look at it later. I noticed that in your code, the states are not normalized to a range of [0, 1] or [-1, 1], but instead, they are divided by a value to make them fall within a different range, right?

Due to their different meaning, yes. Actually, I didnot do many experiments here

Sure, may I ask, if the states are simply converted from one range to another, what is the significance of norm

Great question!! To improve the efficiency and stability of learning, states & actions are suggested to be normalized， especially in some dynamic envs (eg mujoco). While to [0,1] or [-1,1]，actions maybe a good choice, I think it is best not to normalize the state too much for better generalization (need experiments).

Thank you very much for your reply; it has been very helpful. When you normalize the state, is it based on experience, like in the case of this: delta_yaw_t = np.array(self.state_info['delta_yaw_t']).reshape((1, )) / 2.0? Can I more about why you divide by these specific values when processing different states. What is the reasoning behind choosing these particular numbers?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

environment？ #13

environment？ #13

xss1006 commented Dec 26, 2024

ShuaibinLi commented Jan 13, 2025

ShuaibinLi commented Jan 13, 2025

xss1006 commented Jan 13, 2025

ShuaibinLi commented Jan 13, 2025 •

edited

Loading

ShuaibinLi commented Jan 13, 2025

xss1006 commented Jan 13, 2025

ShuaibinLi commented Jan 13, 2025

xss1006 commented Jan 13, 2025

ShuaibinLi commented Jan 13, 2025

xss1006 commented Jan 13, 2025

environment？ #13

environment？ #13

Comments

xss1006 commented Dec 26, 2024

ShuaibinLi commented Jan 13, 2025

ShuaibinLi commented Jan 13, 2025

xss1006 commented Jan 13, 2025

ShuaibinLi commented Jan 13, 2025 • edited Loading

ShuaibinLi commented Jan 13, 2025

xss1006 commented Jan 13, 2025

ShuaibinLi commented Jan 13, 2025

xss1006 commented Jan 13, 2025

ShuaibinLi commented Jan 13, 2025

xss1006 commented Jan 13, 2025

ShuaibinLi commented Jan 13, 2025 •

edited

Loading