Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

environment? #13

Open
xss1006 opened this issue Dec 26, 2024 · 10 comments
Open

environment? #13

xss1006 opened this issue Dec 26, 2024 · 10 comments

Comments

@xss1006
Copy link

xss1006 commented Dec 26, 2024

Hello, thank you for your work!
I would like to ask, when normalizing the state, what is the basis for setting the values? For example:
delta_yaw_t = np.array(self.state_info['delta_yaw_t']).reshape((1, )) / 2.0
dyaw_dt_t = np.array(self.state_info['dyaw_dt_t']).reshape((1, )) / 5.0
lateral_dist_t = self.state_info['lateral_dist_t'].reshape((1, )) * 10.0
action_last = self.state_info['action_t_1'] * 10.0
future_angles = self.state_info['angles_t'] / 2.0

What do the values you divide or multiply by represent? Your answer would be of great help to me. I look forward to your response.

@ShuaibinLi
Copy link
Owner

No specific physical meaning, the purpose is to normalize the state.

@ShuaibinLi
Copy link
Owner

Similar to action normalization in continuous envs

@xss1006
Copy link
Author

xss1006 commented Jan 13, 2025

Similar to action normalization in continuous envs
Thank you for your reply,If I set it to different values, will it have a significant impact?

@ShuaibinLi
Copy link
Owner

ShuaibinLi commented Jan 13, 2025

Similar to action normalization in continuous envs
Thank you for your reply,If I set it to different values, will it have a significant impact?

maybe, worth trying! Hope for better results.

@ShuaibinLi
Copy link
Owner

@xss1006
Copy link
Author

xss1006 commented Jan 13, 2025

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/envs.py some hints?

Thank you for your suggestion. I'll take a look at it later. I noticed that in your code, the states are not normalized to a range of [0, 1] or [-1, 1], but instead, they are divided by a value to make them fall within a different range, right?

@ShuaibinLi
Copy link
Owner

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/envs.py some hints?

Thank you for your suggestion. I'll take a look at it later. I noticed that in your code, the states are not normalized to a range of [0, 1] or [-1, 1], but instead, they are divided by a value to make them fall within a different range, right?

Due to their different meaning, yes. Actually, I didnot do many experiments here

@xss1006
Copy link
Author

xss1006 commented Jan 13, 2025

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/envs.py some hints?

Thank you for your suggestion. I'll take a look at it later. I noticed that in your code, the states are not normalized to a range of [0, 1] or [-1, 1], but instead, they are divided by a value to make them fall within a different range, right?

Due to their different meaning, yes. Actually, I didnot do many experiments here

Sure, may I ask, if the states are simply converted from one range to another, what is the significance of normalize state?

@ShuaibinLi
Copy link
Owner

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/envs.py some hints?

Thank you for your suggestion. I'll take a look at it later. I noticed that in your code, the states are not normalized to a range of [0, 1] or [-1, 1], but instead, they are divided by a value to make them fall within a different range, right?

Due to their different meaning, yes. Actually, I didnot do many experiments here

Sure, may I ask, if the states are simply converted from one range to another, what is the significance of norm

Great question!! To improve the efficiency and stability of learning, states & actions are suggested to be normalized, especially in some dynamic envs (eg mujoco). While to [0,1] or [-1,1],actions maybe a good choice, I think it is best not to normalize the state too much for better generalization (need experiments).

@xss1006
Copy link
Author

xss1006 commented Jan 13, 2025

I would like to know more about why you divide by these specific values when processing different states. What is the reasoning behind choosing these particular numbers?

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/envs.py some hints?

Thank you for your suggestion. I'll take a look at it later. I noticed that in your code, the states are not normalized to a range of [0, 1] or [-1, 1], but instead, they are divided by a value to make them fall within a different range, right?

Due to their different meaning, yes. Actually, I didnot do many experiments here

Sure, may I ask, if the states are simply converted from one range to another, what is the significance of norm

Great question!! To improve the efficiency and stability of learning, states & actions are suggested to be normalized, especially in some dynamic envs (eg mujoco). While to [0,1] or [-1,1],actions maybe a good choice, I think it is best not to normalize the state too much for better generalization (need experiments).

Thank you very much for your reply; it has been very helpful. When you normalize the state, is it based on experience, like in the case of this: delta_yaw_t = np.array(self.state_info['delta_yaw_t']).reshape((1, )) / 2.0? Can I more about why you divide by these specific values when processing different states. What is the reasoning behind choosing these particular numbers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants