Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues about Evaluation performances #18

Open
Ethereal0725 opened this issue Nov 26, 2024 · 3 comments
Open

issues about Evaluation performances #18

Ethereal0725 opened this issue Nov 26, 2024 · 3 comments

Comments

@Ethereal0725
Copy link

Hello, we used the model you provided to test on the test data you provided, and obtained the following results. The test model is SD=1.5.
Model | Dataset | FID | CLIP-Score | Paper
ControlNet | COCOSeg | 23.72 | 31.52
ControlNet | MultiGen-20M(Canny Edge) | 17.76 | 31.93
ControlNet | MultiGen-20M(Hed Edge) | 19.17 | 32.01
ControlNet | MultiGen-20M(LineArt Edge) | 16.92 | 32.15
ControlNet | MultiGen-20M(Depth Map) | 21.84 | 32.04
ControlNet++ | COCOSeg | 20.97 | 31.98
ControlNet++ | MultiGen-20M(Canny Edge) | 23.16 | 31.53
ControlNet++ | MultiGen-20M(Hed Edge) | 74.77 | 27.50
ControlNet++ | MultiGen-20M(LineArt Edge) | 14.16 | 31.91
ControlNet++ | MultiGen-20M(Depth Map) | 17.95 | 32.02
we observed a significant performance drop between the results obtained from our tests and those provided in your paper. Are there any specific steps or considerations we should be aware of to ensure accurate replication of your results?

@liming-ai
Copy link
Owner

Hello, we used the model you provided to test on the test data you provided, and obtained the following results. The test model is SD=1.5. Model | Dataset | FID | CLIP-Score | Paper ControlNet | COCOSeg | 23.72 | 31.52 ControlNet | MultiGen-20M(Canny Edge) | 17.76 | 31.93 ControlNet | MultiGen-20M(Hed Edge) | 19.17 | 32.01 ControlNet | MultiGen-20M(LineArt Edge) | 16.92 | 32.15 ControlNet | MultiGen-20M(Depth Map) | 21.84 | 32.04 ControlNet++ | COCOSeg | 20.97 | 31.98 ControlNet++ | MultiGen-20M(Canny Edge) | 23.16 | 31.53 ControlNet++ | MultiGen-20M(Hed Edge) | 74.77 | 27.50 ControlNet++ | MultiGen-20M(LineArt Edge) | 14.16 | 31.91 ControlNet++ | MultiGen-20M(Depth Map) | 17.95 | 32.02 we observed a significant performance drop between the results obtained from our tests and those provided in your paper. Are there any specific steps or considerations we should be aware of to ensure accurate replication of your results?

Thanks for the question. I will check the results and reply to you ASAP

@liming-ai
Copy link
Owner

liming-ai commented Nov 28, 2024

Hello, we used the model you provided to test on the test data you provided, and obtained the following results. The test model is SD=1.5. Model | Dataset | FID | CLIP-Score | Paper ControlNet | COCOSeg | 23.72 | 31.52 ControlNet | MultiGen-20M(Canny Edge) | 17.76 | 31.93 ControlNet | MultiGen-20M(Hed Edge) | 19.17 | 32.01 ControlNet | MultiGen-20M(LineArt Edge) | 16.92 | 32.15 ControlNet | MultiGen-20M(Depth Map) | 21.84 | 32.04 ControlNet++ | COCOSeg | 20.97 | 31.98 ControlNet++ | MultiGen-20M(Canny Edge) | 23.16 | 31.53 ControlNet++ | MultiGen-20M(Hed Edge) | 74.77 | 27.50 ControlNet++ | MultiGen-20M(LineArt Edge) | 14.16 | 31.91 ControlNet++ | MultiGen-20M(Depth Map) | 17.95 | 32.02 we observed a significant performance drop between the results obtained from our tests and those provided in your paper. Are there any specific steps or considerations we should be aware of to ensure accurate replication of your results?

Hi @Ethereal0725
As so far, I re-test our models on CLIP-score metric

# Hed: 32.0142
# LineArt: 31.9036
# Depth: 32.0261
# ADE20K: 31.379
# COCOStuff: 31.9913

which is aligned with our paper results except COCOStuff

# Hed: 32.33
# LineArt: 32.46
# Depth: 32.45
# ADE20K: 31.96
# COCOStuff: 13.13

The maximum error is around 0.6. Taking into account the randomness caused by different environments and machines, these errors are within the normal range and there is no significant decrease as you mentioned.

As for the result error on COCOStuff, it may be due to our previous test with .jpg format images. We will update the arXiv paper and instructions. Thank you for your question.

@Ethereal0725
Copy link
Author

Hello, we used the model you provided to test on the test data you provided, and obtained the following results. The test model is SD=1.5. Model | Dataset | FID | CLIP-Score | Paper ControlNet | COCOSeg | 23.72 | 31.52 ControlNet | MultiGen-20M(Canny Edge) | 17.76 | 31.93 ControlNet | MultiGen-20M(Hed Edge) | 19.17 | 32.01 ControlNet | MultiGen-20M(LineArt Edge) | 16.92 | 32.15 ControlNet | MultiGen-20M(Depth Map) | 21.84 | 32.04 ControlNet++ | COCOSeg | 20.97 | 31.98 ControlNet++ | MultiGen-20M(Canny Edge) | 23.16 | 31.53 ControlNet++ | MultiGen-20M(Hed Edge) | 74.77 | 27.50 ControlNet++ | MultiGen-20M(LineArt Edge) | 14.16 | 31.91 ControlNet++ | MultiGen-20M(Depth Map) | 17.95 | 32.02 we observed a significant performance drop between the results obtained from our tests and those provided in your paper. Are there any specific steps or considerations we should be aware of to ensure accurate replication of your results?

Hi @Ethereal0725 As so far, I re-test our models on CLIP-score metric

# Hed: 32.0142
# LineArt: 31.9036
# Depth: 32.0261
# ADE20K: 31.379
# COCOStuff: 31.9913

which is aligned with our paper results except COCOStuff

# Hed: 32.33
# LineArt: 32.46
# Depth: 32.45
# ADE20K: 31.96
# COCOStuff: 13.13

The maximum error is around 0.6. Taking into account the randomness caused by different environments and machines, these errors are within the normal range and there is no significant decrease as you mentioned.

As for the result error on COCOStuff, it may be due to our previous test with .jpg format images. We will update the arXiv paper and instructions. Thank you for your question.

Thanks for your answer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants