Unable to reproduce results on Cityscapes #7

benearnthof · 2024-09-12T13:59:52Z

After running the preprocessing and training scripts as they are provided in this repository I was unable to replicate the results of EAGLE on the cityscapes dataset. I trained with the configs presented here and adjusted the training hyperparameters to those available through downloading the cityscapes checkpoint the authors provided through google drive. Even after 25000 training steps on cityscapes cluster test accuracy only reaches 67%. I have attached an output plot to this issue. Could you provide insight into how to replicate the results? I've noticed that there are additional clustering parameters present in the SOTA checkpoint that are not used in the train config files here. Do you perform an addtional post processing step?

This is what my results look like after 25k training steps on cityscapes.

kochanha · 2024-09-12T14:32:03Z

Hi, thanks for your interest in our work.
For the Cityscapes dataset, we trained on a single GPU and used a weight near 2.9K steps as our final weight.
Also, the picture you have attached is the result before applying the CRF. You can have the results after CRF processing via eval_segmentation.py and note that there is a significant performance difference between what you see in wandb (before applying CRF) and after applying CRF.
Here are my wandb results for reference.

benearnthof · 2024-09-12T14:35:16Z

That must be what I have been missing, but I had assumed CRF would not make that much of a difference. I'll report back after I run the eval script. Thank you very much for the swift reply!

benearnthof · 2024-09-12T15:23:31Z

After running the evaluation on my own cityscapes checkpoint I obtain the following metrics:

{
'final/linear/mIoU': 32.06715285778046, 
'final/linear/Accuracy': 90.96953272819519, 
'assignments': [9, 8, 4, 6, 14, 5, 7, 11, 3, 18, 16, 26, 20, 12, 22, 23, 0, 1, 24, 10, 19, 25, 13, 15, 21, 2, 17], 
'final/cluster/mIoU': 15.385963022708893,
'final/cluster/Accuracy': 73.27690720558167
}

Seems like the CRF postprocessing does a lot of heavy lifting I'll rerun training with 3000 steps like you recommended and report back. Thanks a lot for the help!

benearnthof · 2024-09-12T17:01:51Z

I've done evaluation on 5 other checkpoints that each were trained for 5000 steps where I picked the highest performing checkpoint for each run and ran them through the evaluation script. (Each training run saved checkpoints every 10 steps as was suggested to me in another issue.) The mean cluster Accuracy for Cityscapes is 70.4, the maximum Accuracy I obtained after CRF evaluation was 74.1. A lot better than my previous results but still quite far from the performance reported in the paper. Did you do any additional hyperparameter tuning?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce results on Cityscapes #7

Unable to reproduce results on Cityscapes #7

benearnthof commented Sep 12, 2024 •

edited

Loading

kochanha commented Sep 12, 2024

benearnthof commented Sep 12, 2024

benearnthof commented Sep 12, 2024

benearnthof commented Sep 12, 2024 •

edited

Loading

Unable to reproduce results on Cityscapes #7

Unable to reproduce results on Cityscapes #7

Comments

benearnthof commented Sep 12, 2024 • edited Loading

kochanha commented Sep 12, 2024

benearnthof commented Sep 12, 2024

benearnthof commented Sep 12, 2024

benearnthof commented Sep 12, 2024 • edited Loading

benearnthof commented Sep 12, 2024 •

edited

Loading

benearnthof commented Sep 12, 2024 •

edited

Loading