Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reproduce the PendulumSwingup results #3

Open
dadadadawjb opened this issue Jun 29, 2024 · 3 comments
Open

Unable to reproduce the PendulumSwingup results #3

dadadadawjb opened this issue Jun 29, 2024 · 3 comments

Comments

@dadadadawjb
Copy link

Hi team,

Thanks for sharing the great work! I have tried reproducing the PendulumSwingup experiments, both continuous and discontinuous. I just used the scripts and codes you gave, without any modification. But I find the results do not match the performance shown in Figure 3. (c) in paper CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning. Do any hyperparameters need to tune, or anything else I need to change to get that results?

pendulum_results

Thanks a lot!

@abhaybd
Copy link
Member

abhaybd commented Jul 2, 2024

Thanks for your interest in our work! The exact numbers vary depending on the seeds tested, but CCIL should almost always outperform BC on the pendulum task. Please ensure you're running with the hyperparameters specified in the corresponding .yml files.

Running the following command on my machine:

./scripts/train_ccil.sh "pendulum_cont pendulum_disc" "40 41 42 43 44 45 46 47 48 49" 0.0001

yields the following results:

+-------------------------------------------------------+-----------+---------------+-----------+
| Task                                                  |     Score |   Score (std) |   # seeds |
+=======================================================+===========+===============+===========+
| PendulumSwingupCont-v0_naive                          | -3335.941 |       132.394 |        10 |
+-------------------------------------------------------+-----------+---------------+-----------+
| PendulumSwingupCont-v0_noisy_action_soft_samplingL2.0 | -2527.913 |       363.101 |        10 |
+-------------------------------------------------------+-----------+---------------+-----------+
| PendulumSwingupDisc-v0_noisy_action_slackL2.0         | -2794.500 |       400.131 |        10 |
+-------------------------------------------------------+-----------+---------------+-----------+
| PendulumSwingupDisc-v0_naive                          | -3001.869 |       231.012 |        10 |
+-------------------------------------------------------+-----------+---------------+-----------+

As you can see, the exact numbers change due to the associated variance, but CCIL still outperforms standard BC.

@dadadadawjb
Copy link
Author

Thanks for your prompt reply! I got it, but as I observed, especially in the discontinuous Pendulum, the performance between CCIL (-2912.906) and naive BC (-2978.408) is actually hard to distinguish on my machine, even with "40 41 42 43 44 45 46 47 48 49" 10 random seeds. Any good suggestions?

@Kelym
Copy link
Member

Kelym commented Aug 6, 2024

Thanks for bringing it into our attention - it seems there are more variance than we initially realized on PendulumDiscontinuous. (We validated our config on 10 random seeds and 2 computing machines.) We might be able to try tweaking and updating the params, if we can reproduce the experiments that don't have the performance gap and then try sweep parameters from there.

In the meantime, do you have a chance to verify the performance on the other task suite? Just want to double check if this is just a problem with stochasticity in PendulumDiscontinuous or there is more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants