Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reproduce results on Cityscapes #7

Open
benearnthof opened this issue Sep 12, 2024 · 4 comments
Open

Unable to reproduce results on Cityscapes #7

benearnthof opened this issue Sep 12, 2024 · 4 comments

Comments

@benearnthof
Copy link

benearnthof commented Sep 12, 2024

After running the preprocessing and training scripts as they are provided in this repository I was unable to replicate the results of EAGLE on the cityscapes dataset. I trained with the configs presented here and adjusted the training hyperparameters to those available through downloading the cityscapes checkpoint the authors provided through google drive. Even after 25000 training steps on cityscapes cluster test accuracy only reaches 67%. I have attached an output plot to this issue. Could you provide insight into how to replicate the results? I've noticed that there are additional clustering parameters present in the SOTA checkpoint that are not used in the train config files here. Do you perform an addtional post processing step?

This is what my results look like after 25k training steps on cityscapes.

@kochanha
Copy link
Collaborator

Hi, thanks for your interest in our work.
For the Cityscapes dataset, we trained on a single GPU and used a weight near 2.9K steps as our final weight.
Also, the picture you have attached is the result before applying the CRF. You can have the results after CRF processing via eval_segmentation.py and note that there is a significant performance difference between what you see in wandb (before applying CRF) and after applying CRF.
Here are my wandb results for reference.
image
image

@benearnthof
Copy link
Author

That must be what I have been missing, but I had assumed CRF would not make that much of a difference. I'll report back after I run the eval script. Thank you very much for the swift reply!

@benearnthof
Copy link
Author

After running the evaluation on my own cityscapes checkpoint I obtain the following metrics:

{
'final/linear/mIoU': 32.06715285778046, 
'final/linear/Accuracy': 90.96953272819519, 
'assignments': [9, 8, 4, 6, 14, 5, 7, 11, 3, 18, 16, 26, 20, 12, 22, 23, 0, 1, 24, 10, 19, 25, 13, 15, 21, 2, 17], 
'final/cluster/mIoU': 15.385963022708893,
'final/cluster/Accuracy': 73.27690720558167
}

Seems like the CRF postprocessing does a lot of heavy lifting I'll rerun training with 3000 steps like you recommended and report back. Thanks a lot for the help!

@benearnthof
Copy link
Author

benearnthof commented Sep 12, 2024

I've done evaluation on 5 other checkpoints that each were trained for 5000 steps where I picked the highest performing checkpoint for each run and ran them through the evaluation script. (Each training run saved checkpoints every 10 steps as was suggested to me in another issue.) The mean cluster Accuracy for Cityscapes is 70.4, the maximum Accuracy I obtained after CRF evaluation was 74.1. A lot better than my previous results but still quite far from the performance reported in the paper. Did you do any additional hyperparameter tuning?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants