Can you provide the metrics evaluation scripts for the WORDS and FLARE datasets? #14

OeslleLucena · 2023-10-13T14:30:16Z

I am trying to reproduce STU-NET results and would like to have the same evaluation scripts you used to compute the dice score for the WORDS and FLARE datasets. I would appreciate it if the authors could make these scripts available.
Best

Ziyan-Huang · 2023-10-25T04:08:25Z

Dear @OeslleLucena,

Thank you for reaching out. We primarily calculated the Dice Similarity Coefficient (DSC) for each class. You should be able to find relevant code for this quite easily. For instance, the official FLARE repository contains scripts for computing the DSC. We recommend checking there as a starting point.

Best regards,

Ziyan Huang

OeslleLucena · 2023-10-25T10:17:19Z

Dear @Ziyan-Huang

Thank you for your response. Apologies on my side because I think I did not make myself clear enough. What I meant is that STU-NET outputs the segmentation for all labels from the TotalSegmentator dataset, and I would like to know how the selection and merging of these labels were when compared with the ground truth for the WORD and FLARE datasets. I.e. WORD datast has 16 labels, some were them are easy to find such as liver but the rest are a bit different than TotalSegmentator ones. Hope that is clear enough. Many thanks in advance,

blueyo0 · 2023-10-25T12:59:33Z

Hi, @OeslleLucena

For WORD, we selected 13 out of 16 classes overlapping with TotalSegmentator for inference and metric calculation; for FLARE22, all 13 categories were calculated. You can refer to the appendix of our arxiv paper for details. To clarify the details and help to conduct experiments and reproduce the results, we will release the code for direct inference soon.

Here is a simple dict in Python showing which categories are selected, more details will be clarified soon 😉.

Task560_WORD_sys = {
    "1": "liver",
    "10": "colon",
    # "11": "intestine",
    # "12": "adrenal",
    # "13": "rectum",
    "14": "urinary_bladder",
    "15": "femur_left",
    "16": "femur_right",
    "2": "spleen",
    "3": "kidney_left",
    "4": "kidney_right",
    "5": "stomach",
    "6": "gallbladder",
    "7": "esophagus",
    "8": "pancreas",
    "9": "duodenum"
}
FLARE22_sys = {
    "1":  "liver",
    "10": "esophagus",
    "11": "stomach",
    "12": "duodenum",
    "13": "left kidney",
    "2":  "right kidney",
    "3":  "spleen",
    "4":  "pancreas",
    "5":  "aorta",
    "6":  "IVC",
    "7":  "RAG",
    "8":  "LAG",
    "9":  "gallbladder"
}

Hope my answer can help you.

OeslleLucena · 2023-10-26T10:01:12Z

HI @blueyo0, Thank you loads for the details. Looking forward to the code for direct inference. Best!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can you provide the metrics evaluation scripts for the WORDS and FLARE datasets? #14

Can you provide the metrics evaluation scripts for the WORDS and FLARE datasets? #14

OeslleLucena commented Oct 13, 2023 •

edited

Loading

Ziyan-Huang commented Oct 25, 2023

OeslleLucena commented Oct 25, 2023

blueyo0 commented Oct 25, 2023

OeslleLucena commented Oct 26, 2023

Can you provide the metrics evaluation scripts for the WORDS and FLARE datasets? #14

Can you provide the metrics evaluation scripts for the WORDS and FLARE datasets? #14

Comments

OeslleLucena commented Oct 13, 2023 • edited Loading

Ziyan-Huang commented Oct 25, 2023

OeslleLucena commented Oct 25, 2023

blueyo0 commented Oct 25, 2023

OeslleLucena commented Oct 26, 2023

OeslleLucena commented Oct 13, 2023 •

edited

Loading