issues about Evaluation performances #18

Ethereal0725 · 2024-11-26T11:47:52Z

liming-ai · 2024-11-26T22:55:11Z

Hello, we used the model you provided to test on the test data you provided, and obtained the following results. The test model is SD=1.5. Model | Dataset | FID | CLIP-Score | Paper ControlNet | COCOSeg | 23.72 | 31.52 ControlNet | MultiGen-20M(Canny Edge) | 17.76 | 31.93 ControlNet | MultiGen-20M(Hed Edge) | 19.17 | 32.01 ControlNet | MultiGen-20M(LineArt Edge) | 16.92 | 32.15 ControlNet | MultiGen-20M(Depth Map) | 21.84 | 32.04 ControlNet++ | COCOSeg | 20.97 | 31.98 ControlNet++ | MultiGen-20M(Canny Edge) | 23.16 | 31.53 ControlNet++ | MultiGen-20M(Hed Edge) | 74.77 | 27.50 ControlNet++ | MultiGen-20M(LineArt Edge) | 14.16 | 31.91 ControlNet++ | MultiGen-20M(Depth Map) | 17.95 | 32.02 we observed a significant performance drop between the results obtained from our tests and those provided in your paper. Are there any specific steps or considerations we should be aware of to ensure accurate replication of your results?

Thanks for the question. I will check the results and reply to you ASAP

liming-ai · 2024-11-28T22:08:11Z

Hello, we used the model you provided to test on the test data you provided, and obtained the following results. The test model is SD=1.5. Model | Dataset | FID | CLIP-Score | Paper ControlNet | COCOSeg | 23.72 | 31.52 ControlNet | MultiGen-20M(Canny Edge) | 17.76 | 31.93 ControlNet | MultiGen-20M(Hed Edge) | 19.17 | 32.01 ControlNet | MultiGen-20M(LineArt Edge) | 16.92 | 32.15 ControlNet | MultiGen-20M(Depth Map) | 21.84 | 32.04 ControlNet++ | COCOSeg | 20.97 | 31.98 ControlNet++ | MultiGen-20M(Canny Edge) | 23.16 | 31.53 ControlNet++ | MultiGen-20M(Hed Edge) | 74.77 | 27.50 ControlNet++ | MultiGen-20M(LineArt Edge) | 14.16 | 31.91 ControlNet++ | MultiGen-20M(Depth Map) | 17.95 | 32.02 we observed a significant performance drop between the results obtained from our tests and those provided in your paper. Are there any specific steps or considerations we should be aware of to ensure accurate replication of your results?

Hi @Ethereal0725
As so far, I re-test our models on CLIP-score metric

# Hed: 32.0142
# LineArt: 31.9036
# Depth: 32.0261
# ADE20K: 31.379
# COCOStuff: 31.9913

which is aligned with our paper results except COCOStuff

# Hed: 32.33
# LineArt: 32.46
# Depth: 32.45
# ADE20K: 31.96
# COCOStuff: 13.13

The maximum error is around 0.6. Taking into account the randomness caused by different environments and machines, these errors are within the normal range and there is no significant decrease as you mentioned.

As for the result error on COCOStuff, it may be due to our previous test with .jpg format images. We will update the arXiv paper and instructions. Thank you for your question.

Ethereal0725 · 2024-12-02T02:13:13Z

Hello, we used the model you provided to test on the test data you provided, and obtained the following results. The test model is SD=1.5. Model | Dataset | FID | CLIP-Score | Paper ControlNet | COCOSeg | 23.72 | 31.52 ControlNet | MultiGen-20M(Canny Edge) | 17.76 | 31.93 ControlNet | MultiGen-20M(Hed Edge) | 19.17 | 32.01 ControlNet | MultiGen-20M(LineArt Edge) | 16.92 | 32.15 ControlNet | MultiGen-20M(Depth Map) | 21.84 | 32.04 ControlNet++ | COCOSeg | 20.97 | 31.98 ControlNet++ | MultiGen-20M(Canny Edge) | 23.16 | 31.53 ControlNet++ | MultiGen-20M(Hed Edge) | 74.77 | 27.50 ControlNet++ | MultiGen-20M(LineArt Edge) | 14.16 | 31.91 ControlNet++ | MultiGen-20M(Depth Map) | 17.95 | 32.02 we observed a significant performance drop between the results obtained from our tests and those provided in your paper. Are there any specific steps or considerations we should be aware of to ensure accurate replication of your results?

Hi @Ethereal0725 As so far, I re-test our models on CLIP-score metric
# Hed: 32.0142
# LineArt: 31.9036
# Depth: 32.0261
# ADE20K: 31.379
# COCOStuff: 31.9913
which is aligned with our paper results except COCOStuff
# Hed: 32.33
# LineArt: 32.46
# Depth: 32.45
# ADE20K: 31.96
# COCOStuff: 13.13
The maximum error is around 0.6. Taking into account the randomness caused by different environments and machines, these errors are within the normal range and there is no significant decrease as you mentioned.

As for the result error on COCOStuff, it may be due to our previous test with .jpg format images. We will update the arXiv paper and instructions. Thank you for your question.

Thanks for your answer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issues about Evaluation performances #18

issues about Evaluation performances #18

Ethereal0725 commented Nov 26, 2024

liming-ai commented Nov 26, 2024

liming-ai commented Nov 28, 2024 •

edited

Loading

Ethereal0725 commented Dec 2, 2024

issues about Evaluation performances #18

issues about Evaluation performances #18

Comments

Ethereal0725 commented Nov 26, 2024

liming-ai commented Nov 26, 2024

liming-ai commented Nov 28, 2024 • edited Loading

Ethereal0725 commented Dec 2, 2024

liming-ai commented Nov 28, 2024 •

edited

Loading