-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can you provide the metrics evaluation scripts for the WORDS and FLARE datasets? #14
Comments
Dear @OeslleLucena, Thank you for reaching out. We primarily calculated the Dice Similarity Coefficient (DSC) for each class. You should be able to find relevant code for this quite easily. For instance, the official FLARE repository contains scripts for computing the DSC. We recommend checking there as a starting point. Best regards, Ziyan Huang |
Dear @Ziyan-Huang Thank you for your response. Apologies on my side because I think I did not make myself clear enough. What I meant is that STU-NET outputs the segmentation for all labels from the TotalSegmentator dataset, and I would like to know how the selection and merging of these labels were when compared with the ground truth for the WORD and FLARE datasets. I.e. WORD datast has 16 labels, some were them are easy to find such as liver but the rest are a bit different than TotalSegmentator ones. Hope that is clear enough. Many thanks in advance, |
Hi, @OeslleLucena For WORD, we selected 13 out of 16 classes overlapping with TotalSegmentator for inference and metric calculation; for FLARE22, all 13 categories were calculated. You can refer to the appendix of our arxiv paper for details. To clarify the details and help to conduct experiments and reproduce the results, we will release the code for direct inference soon. Here is a simple
Hope my answer can help you. |
HI @blueyo0, Thank you loads for the details. Looking forward to the code for direct inference. Best! |
I am trying to reproduce STU-NET results and would like to have the same evaluation scripts you used to compute the dice score for the WORDS and FLARE datasets. I would appreciate it if the authors could make these scripts available.
Best
The text was updated successfully, but these errors were encountered: