Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training slows down #25

Closed
GoingMyWay opened this issue Jun 30, 2022 · 5 comments
Closed

Training slows down #25

GoingMyWay opened this issue Jun 30, 2022 · 5 comments

Comments

@GoingMyWay
Copy link

GoingMyWay commented Jun 30, 2022

Hi, I am using epymarl to train melting pot, I replaced the rnn+mlp network with rnn+cnn. I found in IPPO and IA2C that the time cost for learning each batch increases over time shown below. I tried my best to debug it and I found it very hard to find the reason. I even used th.cuda.empty_cache() and th.cuda.synchronize(device=th.device("cuda")) but no help. The following figure shows the average time cost of the past 10 updates.

Avg time cost of the past 10 updates

Did you find also found such an issue?

@papoudakis
Copy link
Member

Do the batch dimensions change through time??
For example, what is the time dimension of the batch at the first update and what is the time dimension of the batch after 30M steps?

@GoingMyWay
Copy link
Author

GoingMyWay commented Jun 30, 2022

Do the batch dimensions change through time?? For example, what is the time dimension of the batch at the first update and what is the time dimension of the batch after 30M steps?

Hi, the batch did not change over time. The time dimension in my case is 51, and the batch size is 60, meaning 60 trajectories and each trajectory's length is 51.

@GoingMyWay
Copy link
Author

I guess maybe the other components introduced by me raised the issue. With vanilla epymarl, the time cost per update on Foraging-2s-8x8-2p-2f-coop-v2 trained with IPPO did not increase over time as shown below. I am closing this issue.

image

@GoingMyWay GoingMyWay reopened this Jul 4, 2022
@GoingMyWay
Copy link
Author

GoingMyWay commented Jul 4, 2022

I guess maybe the other components introduced by me raised the issue. With vanilla epymarl, the time cost per update on Foraging-2s-8x8-2p-2f-coop-v2 trained with IPPO did not increase over time as shown below. I am closing this issue.

image

@papoudakis Hi, I tried to debug the code with MLP+IPPO (in the melting pot, I used CNN) for training the melting pot, I found the time cost per update increased as shown below (3 seeds). For each time step, normally the time cost is 0.9, but there are some updates whose time cost could be larger than 0.9. Such a pattern is frequent. In my previous figure, as you can see, such a pattern is not normal. I just wonder are there any potential issues with the code of the IPPO learner? By the way, I did not change the code of the IPPO learner, what I just changed is the agent's network for the training of the melting pot.

image

The following is the time cost before the 30M steps of the above figure.

image

Do you have any clue?

@GoingMyWay
Copy link
Author

Hi, this is an issue of sacred, IDSIA/sacred#877

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants