Replies: 12 comments 21 replies
-
This is an outstanding idea! Here are two thoughts:
|
Beta Was this translation helpful? Give feedback.
-
Hello everyone, The problem with the existing open-source datasets like TCIA is that the dicom data in these repositories is very "civilized" and curated. and they do not need any special treatment. However, this is not the case with regular clinical data acquired in different hospitals. Thus, there was a proposal to create a "real world clinical dicom dataset". In order to create this dataset, we can ask the participating institutions to donate some of their data. This can be patient data after scrubbing the PHI or any phantom data which has no PHI. Once we have such a dataset and is public, we can write some cookbooks as @ericspod mentioned around this data. The goal of this cookbook is to ensure that it works on all the contributed datasets and that it is organized as a tutorial or as an ongoing code repo with lots and lots of documentation. I will refer to the Huggingface documentation as a model for knowledge organization in chapters Our overarching goal would be to write enough documentation and code samples that we can fine-tune a GPT-type model for searching through the documentation. If we are successful people will be able to ask specific questions and be directed to specific code snippets. I feel that because of the diversity and unstructured nature of Dicom it becomes unwieldy to write a structured code base. Efforts like pydicom, highdicom, and ITK has tried to do that but we still come across numerous issues. So I believe if we document everything properly and provide code snippets. At some point, we can train or fine-tune a language model to give the right answers on demand. |
Beta Was this translation helpful? Give feedback.
-
Based on my experience, I do not think this is true. There is a lot of variety in those datasets, and many challenges for the users. Many of those challenges are quite representative of what users encounter in datasets outside of TCIA.
I would be very surprised and in fact pleased if you manage to find any significant number of clinical sites who will volunteer to donate data into public datasets AND perform de-identification (which means - and accept the liability for the possible PHI breach if they missed anything). Publicly available de-identified medical imaging data is not easy to find. If you think what you find in TCIA and Imaging Data Commons is not enough, I am not sure where you can find more, and what resources you would need to support collection of data that is more expansive and more representative. |
Beta Was this translation helpful? Give feedback.
-
I think there is significant, near-term value from starting on the cookbook, perhaps in parallel with attempting to gather more data. The cookbook should cover common / "easy" use cases (DICOM SEG to nrrd/nifti) as well as the challenging ones. I've found that often the "easy" ones are the ones that people get wrong (e.g., forgetting to set image directions or forgetting to map recorded values to HU values - I've seen MANY code examples on Medium/LinkedIn get it wrong!). Also, the "easy" ones often have multiple solution options, and some are slower or require more memory than others. Etc etc. Regarding gathering more data, I agree with @fedorov. It could be argued that time spent sharing our knowledge and encouraging comparison/debate might be more impactful than spending that time focusing corner cases. Admittedly, those corner cases are critical to many commercial products, but we've got to get the basics first. |
Beta Was this translation helpful? Give feedback.
-
from our offline discussion, it seems one viable option is to create a new public repository named |
Beta Was this translation helpful? Give feedback.
-
We had discussed further at the Friday meeting what form this should take. My idea is to have a repository for markdown documents listing DICOM resources like libraries and available public datasets, documents discussing concepts and issues with DICOM, documents with good practice information and "hacks", and notebooks illustrating basic operations and demonstrating good practices (this aligns with what @vikashg mentioned). To expand this idea further I was wondering if we wanted to have a more general repository of this sort and include information on other areas of deep learning. One inspiration is this page https://github.com/soumith/ganhacks listing ideas about training GANs and good practice advice. We could have some similar things for data preprocessing, defining transform pipelines, training segmentation networks, training generative models, training other networks, and deployment. This would be broken up into subdirectories for different subjects with documents on best practices, tips, "hacks", cookbook code people can just copy-paste and use, and other sorts of collected wisdom we should be writing down and sharing with the community. These would be all living documents anyone can contribute to. Any thoughts? Getting too ambitious? |
Beta Was this translation helpful? Give feedback.
-
Hi all, I just noticed that there's a Deep Learning Lab class happening at RSNA that you might find useful in relation to this discussion. It's focused on keeping everything in DICOM format while building a generic CT segmentation tool based on the Mask RCNN network as implemented in Facebook's detectron2 library: https://github.com/RSNA/AI-Deep-Learning-Lab-2023/blob/main/sessions/dicom-seg/RSNA_2021_DICOM_IN_DICOM_OUT_Segmentation.ipynb. |
Beta Was this translation helpful? Give feedback.
-
Really inspires me to start a working group on visualization...
…On Wed, Nov 22, 2023 at 1:50 PM Justin Kirby ***@***.***> wrote:
Go figure, right after I posted this they updated the repo with a new
notebook (using TotalSegmentator now) which breaks that URL above. Here are
better URLs:
- This year's TotalSegmentator notebook:
https://github.com/RSNA/AI-Deep-Learning-Lab-2023/tree/main/sessions/dicom-seg
- Last year's detectron2 notebook:
https://github.com/RSNA/AI-Deep-Learning-Lab-2022/tree/main/sessions/dicom-seg
—
Reply to this email directly, view it on GitHub
<#6551 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACEJL6B4RBDYHLYZG26ZE3YFZCRDAVCNFSM6AAAAAAYNRJ7ZCVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMNBVGUYDS>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Stephen R. Aylward, Ph.D.
Chair, MONAI Advisory Board
Senior Director, Strategic Initiatives, Kitware
|
Beta Was this translation helpful? Give feedback.
-
I appreciate the feedback Andrey! I am humbled (and slightly embarrassed!)
that someone of your stature is doing a code review on my notebook! In my
inexcusable defense, I'm a decent radiologist but a pretty poor (and
therefore dangerous) programmer. My approach to this hands-on session is
not necessarily a best-practices example but more of an introduction of
what is possible, and perhaps more importantly begging the radiology
community to embrace DICOM and DICOM SEG fully!
I will work on addressing the issues you point out - I can certainly use
the correct `segmented_property_category` and `_type` based on the totalseg
label. I'm not sure I have the brain power or time available to understand
nifti coordinate mapping to DICOM coordinated beyond the `rot90` and
`transpose(2,0,1)` hacks I stumbled across.
I appreciate your highlighting of MHub too - I hadn't come across that
before!
Thanks again for the feedback!
Tom
…On Wed, Nov 22, 2023 at 11:30 AM Andrey Fedorov ***@***.***> wrote:
Also, that rot90 + transpose(2,0,1) applied to the mask is not something
that should be promoted or used - it is recipe for a future disaster.
—
Reply to this email directly, view it on GitHub
<#6551 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB6BG6EBY4PWYYZZJ4OVW2TYFZHF7AVCNFSM6AAAAAAYNRJ7ZCVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMNBVG43TE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi all, I have started the cookbook repo with an outline of initial sections. I haven't much content yet but it's something to start on, it's currently private until we have something fuller. I can add anyone who's interested in contributing now. |
Beta Was this translation helpful? Give feedback.
-
The compilation of best practices is a wonderful idea! One thing that I wanted to add here is that there is a major revision to the DICOM standard coming up soon (the first report is due on March 5th 2024). Sounds like it would be a great time to integrate such best practices into MONAI documentation given the upcoming DICOM revision. |
Beta Was this translation helpful? Give feedback.
-
Just to add another source of knowledge: Chris Rorden and his collaborators on dcm2niix have a lot of experience in dealing with peculiarities of DICOM from various manufacturers, focused on conversion to NIfTI. You can find useful explanations in either the .md docs or the issues about this, e.g. pros and cons of different methods of figuring out slice spacing and gantry tilt. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone,
DICOM is obviously really important to a lot of what we do, but there's often questions about what tools to use to accomplish a task, best practice for solving a problem, and dealing with complex data. Some of these things are converting to and from nifti or other formats, dealing with RTstructure, dealing with DICOM Seg, how to put together 3D/4D volumes from DICOMs reliably, detecting which files to include or omit when doing this, detecting a broken/unusable series or study.
I had thought to consolidate what we know into a library in the MONAI organisation but instead what might be more useful is to list all the best practices, cookbook ideas, examples, and known DICOM libraries into one place so that we have a common reference to how to do a range of tasks. One list of DICOM libraries is here https://github.com/open-dicom/awesome-dicom#python but I'm sure there's many people use that aren't mentioned here. There will be more complex tasks that need to be tackled with one or more such libraries, such as filtering out the members of a volume from a random collection of DICOM files containing reports, screenshots, indices, and other non-image data.
So my question to the community is do we want to put this kind of best practices document(s) together as part of MONAI documentation? What do we want to include in it: cookbook code sample, library usage guidance, etc.? Do we have publicly available datasets that are small which can be used for illustrating certain concepts or problems?
Beta Was this translation helpful? Give feedback.
All reactions