Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if the trajectory stays n the terminal state (for a limited number of times) #6

Open
ArezooAalipanah opened this issue Feb 5, 2024 · 3 comments

Comments

@ArezooAalipanah
Copy link

hi thank you sooooooo much for this amazing repo.
I have been trying to build mu own environment but I faced some issues.
what if we have something like this : going from s0 to s1 to s2 and then staying in s3 for ever
(I changed the value iteration so now my trajectories are all 50 steps ) so my svf is something like(1,1,1,47, 0,...,0)
However I am facing some difficulties.
my zs and za start getting so big and then they become nan. and this ends in my omega to be nan as well
I was wondering if you have any idea how I can fix it? and what is the problem.
I am reading Dr.Zeibart's thesis but still have no clue how to tackle such problem(since z_terminal is 1 I am thinking maybe that results in the problem)
if you have any idea I would be so grateful if you share your thoughts
Thanks again

@ArezooAalipanah
Copy link
Author

here is a bit more info :
my trajectory (I made it of len 40 this time)
image
the first iteration with initialization of 1
next array is my parameters after first iteration
however after second iteration they all end up being nan
image

@qzed
Copy link
Owner

qzed commented Apr 16, 2024

Hi, I'm sorry for the long silence. I will likely need some time to look into this, and as I was a bit busy with work-related things lately I never got around to it. Things should be less stressful now, so I will try to look into it this weekend.

@ArezooAalipanah
Copy link
Author

thank you so much. I made some modifications, like normalizing the rewards and weights each time to avoid going to infinity, but I still have to keep number of iterations limited since it will never converge. I am looking forward for your insight as well _

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants