Training slows down #25

GoingMyWay · 2022-06-30T15:40:29Z

Hi, I am using epymarl to train melting pot, I replaced the rnn+mlp network with rnn+cnn. I found in IPPO and IA2C that the time cost for learning each batch increases over time shown below. I tried my best to debug it and I found it very hard to find the reason. I even used th.cuda.empty_cache() and th.cuda.synchronize(device=th.device("cuda")) but no help. The following figure shows the average time cost of the past 10 updates.

Did you find also found such an issue?

The text was updated successfully, but these errors were encountered:

papoudakis · 2022-06-30T16:10:30Z

Do the batch dimensions change through time??
For example, what is the time dimension of the batch at the first update and what is the time dimension of the batch after 30M steps?

GoingMyWay · 2022-06-30T16:12:32Z

Do the batch dimensions change through time?? For example, what is the time dimension of the batch at the first update and what is the time dimension of the batch after 30M steps?

Hi, the batch did not change over time. The time dimension in my case is 51, and the batch size is 60, meaning 60 trajectories and each trajectory's length is 51.

GoingMyWay · 2022-07-01T08:42:23Z

I guess maybe the other components introduced by me raised the issue. With vanilla epymarl, the time cost per update on Foraging-2s-8x8-2p-2f-coop-v2 trained with IPPO did not increase over time as shown below. I am closing this issue.

GoingMyWay · 2022-07-04T01:14:31Z

I guess maybe the other components introduced by me raised the issue. With vanilla epymarl, the time cost per update on Foraging-2s-8x8-2p-2f-coop-v2 trained with IPPO did not increase over time as shown below. I am closing this issue.

@papoudakis Hi, I tried to debug the code with MLP+IPPO (in the melting pot, I used CNN) for training the melting pot, I found the time cost per update increased as shown below (3 seeds). For each time step, normally the time cost is 0.9, but there are some updates whose time cost could be larger than 0.9. Such a pattern is frequent. In my previous figure, as you can see, such a pattern is not normal. I just wonder are there any potential issues with the code of the IPPO learner? By the way, I did not change the code of the IPPO learner, what I just changed is the agent's network for the training of the melting pot.

The following is the time cost before the 30M steps of the above figure.

Do you have any clue?

GoingMyWay · 2022-07-08T11:27:34Z

Hi, this is an issue of sacred, IDSIA/sacred#877

GoingMyWay closed this as completed Jul 1, 2022

GoingMyWay reopened this Jul 4, 2022

GoingMyWay closed this as completed Jul 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training slows down #25

Training slows down #25

GoingMyWay commented Jun 30, 2022 •

edited

Loading

papoudakis commented Jun 30, 2022

GoingMyWay commented Jun 30, 2022 •

edited

Loading

GoingMyWay commented Jul 1, 2022

GoingMyWay commented Jul 4, 2022 •

edited

Loading

GoingMyWay commented Jul 8, 2022

Training slows down #25

Training slows down #25

Comments

GoingMyWay commented Jun 30, 2022 • edited Loading

papoudakis commented Jun 30, 2022

GoingMyWay commented Jun 30, 2022 • edited Loading

GoingMyWay commented Jul 1, 2022

GoingMyWay commented Jul 4, 2022 • edited Loading

GoingMyWay commented Jul 8, 2022

GoingMyWay commented Jun 30, 2022 •

edited

Loading

GoingMyWay commented Jun 30, 2022 •

edited

Loading

GoingMyWay commented Jul 4, 2022 •

edited

Loading