Consolidating DICOM Best Practices, Techniques, and Libraries #6551

ericspod · 2023-05-24T15:15:51Z

ericspod
May 24, 2023
Maintainer

Hello everyone,
DICOM is obviously really important to a lot of what we do, but there's often questions about what tools to use to accomplish a task, best practice for solving a problem, and dealing with complex data. Some of these things are converting to and from nifti or other formats, dealing with RTstructure, dealing with DICOM Seg, how to put together 3D/4D volumes from DICOMs reliably, detecting which files to include or omit when doing this, detecting a broken/unusable series or study.

I had thought to consolidate what we know into a library in the MONAI organisation but instead what might be more useful is to list all the best practices, cookbook ideas, examples, and known DICOM libraries into one place so that we have a common reference to how to do a range of tasks. One list of DICOM libraries is here https://github.com/open-dicom/awesome-dicom#python but I'm sure there's many people use that aren't mentioned here. There will be more complex tasks that need to be tackled with one or more such libraries, such as filtering out the members of a volume from a random collection of DICOM files containing reports, screenshots, indices, and other non-image data.

So my question to the community is do we want to put this kind of best practices document(s) together as part of MONAI documentation? What do we want to include in it: cookbook code sample, library usage guidance, etc.? Do we have publicly available datasets that are small which can be used for illustrating certain concepts or problems?

aylward · 2023-05-25T16:12:18Z

aylward
May 25, 2023
Maintainer

This is an outstanding idea! Here are two thoughts:

We should involve other open-source medical image analysis communities in this effort. For example, 3D Slicer, highdicom, IDC and TCIA. The TCIA recently published a paper and hosted a workshop on DICOM anonymization. We should include that info, and they might have some suggestions / lessons learned for us regarding documentation and dissemination. I've shared this link with a few of them already :)
When choosing a forum for hosting this info, perhaps emphasize two aspects:
(a) easy to update, as tools develop new capabilities and as new lessons are learned
(b) supports a discussion format. Example code is extremely helpful, but rarely is there one "right" answer for any general DICOM use case. Most often there are nuances / corner-cases that might require special handling, and a discussion forum will help people understand why and not just how.

1 reply

kirbyju Oct 10, 2023
Collaborator

I love the idea of combining cookbook/notebooks with data from TCIA to help demonstrate things. Please let me know if you need help identifying specific types of data that we host. I've written many notebooks already that show the basics of how to get data out of TCIA using our APIs, so you could just add the MONAI steps on to the end of them as you see fit: https://github.com/kirbyju/TCIA_Notebooks.

vikashg · 2023-06-02T13:39:18Z

vikashg
Jun 2, 2023
Collaborator

Hello everyone,
So initially the idea started with building a monai-dicom repo which brings some uniformity in terms of API calls with the existing MONAI repos. The goal of this repo was to build tools for handling everything dicom. In this week's discussion in the monai-deploy working group, there was a concern that if we build such a repo then we also have to "guarantee" that it works in all cases. But dicom being really diverse and flexible it is very difficult to bring such guarantees and also becomes difficult to justify when there are existing dicom-specific libraries as @ericspod mentioned in the previous post.

The problem with the existing open-source datasets like TCIA is that the dicom data in these repositories is very "civilized" and curated. and they do not need any special treatment. However, this is not the case with regular clinical data acquired in different hospitals. Thus, there was a proposal to create a "real world clinical dicom dataset". In order to create this dataset, we can ask the participating institutions to donate some of their data. This can be patient data after scrubbing the PHI or any phantom data which has no PHI.

Once we have such a dataset and is public, we can write some cookbooks as @ericspod mentioned around this data. The goal of this cookbook is to ensure that it works on all the contributed datasets and that it is organized as a tutorial or as an ongoing code repo with lots and lots of documentation. I will refer to the Huggingface documentation as a model for knowledge organization in chapters

Our overarching goal would be to write enough documentation and code samples that we can fine-tune a GPT-type model for searching through the documentation. If we are successful people will be able to ask specific questions and be directed to specific code snippets. I feel that because of the diversity and unstructured nature of Dicom it becomes unwieldy to write a structured code base. Efforts like pydicom, highdicom, and ITK has tried to do that but we still come across numerous issues. So I believe if we document everything properly and provide code snippets. At some point, we can train or fine-tune a language model to give the right answers on demand.

0 replies

fedorov · 2023-06-02T13:50:46Z

fedorov
Jun 2, 2023

The problem with the existing open-source datasets like TCIA is that the dicom data in these repositories is very "civilized" and curated. and they do not need any special treatment.

Based on my experience, I do not think this is true. There is a lot of variety in those datasets, and many challenges for the users. Many of those challenges are quite representative of what users encounter in datasets outside of TCIA.

However, this is not the case with regular clinical data acquired in different hospitals. Thus, there was a proposal to create a "real world clinical dicom dataset". In order to create this dataset, we can ask the participating institutions to donate some of their data. This can be patient data after scrubbing the PHI or any phantom data which has no PHI.

I would be very surprised and in fact pleased if you manage to find any significant number of clinical sites who will volunteer to donate data into public datasets AND perform de-identification (which means - and accept the liability for the possible PHI breach if they missed anything).

Publicly available de-identified medical imaging data is not easy to find. If you think what you find in TCIA and Imaging Data Commons is not enough, I am not sure where you can find more, and what resources you would need to support collection of data that is more expansive and more representative.

7 replies

ericspod Oct 11, 2023
Maintainer Author

Hi @kirbyju thanks for the feedback which is welcome any time. We have to get going with this cookbook idea so we'll definitely keep in mind these ideas and look to your notebooks for others.

kirbyju Oct 17, 2023
Collaborator

Here's a specific example that might be interesting to tackle. The "Lung nodule ct detection" model in https://monai.io/model-zoo.html was trained using the LIDC-IDRI data in TCIA. However, the model contributors indicate that the scans in the dataset have various voxel sizes so the first step is to resample them to the same voxel size. They resampled them into 0.703125 x 0.703125 x 1.25 mm and then encouraged others to use their preprocessed NIfTI data instead of working with the original DICOM. There's a related notebook at https://github.com/Project-MONAI/tutorials/tree/main/detection.

Would it be a useful DICOM tutorial to make a notebook that walks through how to download some sample lung CT data from TCIA, do the resampling and run inference against it with this model from the zoo? We have many lung CT datasets that also contain expert manually generated DICOM SEG or RTSTRUCT nodule labels which could also be visualized alongside the model's labels for comparison.

ericspod Oct 18, 2023
Maintainer Author

A tutorial on this topic to go here with other TCIA tutorials would be a good idea. For anything like this that's a large enough to be a full tutorial should be in this repo, for smaller snippets of ideas or patterns the cookbook makes more sense.

kirbyju Oct 18, 2023
Collaborator

Oh, I see. In that case, it sounds like I am exactly the type of user you're trying to assist with the cookbook and I'd be happy to try putting together the full tutorial once these snippets are ready :)

pwrightkcl Jan 31, 2024

I have used the CQ500 dataset, which is an open set of CT head images acquired from multiple hospitals in India. They come as DICOM, but may not include all the essential attributes. Edit: these are just the scans, no segmentations included. I'm not sure if this discussion is specifically about DICOM SEG.

aylward · 2023-06-02T15:25:46Z

aylward
Jun 2, 2023
Maintainer

I think there is significant, near-term value from starting on the cookbook, perhaps in parallel with attempting to gather more data. The cookbook should cover common / "easy" use cases (DICOM SEG to nrrd/nifti) as well as the challenging ones. I've found that often the "easy" ones are the ones that people get wrong (e.g., forgetting to set image directions or forgetting to map recorded values to HU values - I've seen MANY code examples on Medium/LinkedIn get it wrong!). Also, the "easy" ones often have multiple solution options, and some are slower or require more memory than others. Etc etc.

Regarding gathering more data, I agree with @fedorov. It could be argued that time spent sharing our knowledge and encouraging comparison/debate might be more impactful than spending that time focusing corner cases. Admittedly, those corner cases are critical to many commercial products, but we've got to get the basics first.

4 replies

vikashg Jun 2, 2023
Collaborator

Hi @aylward and @fedorov
I think I used the term "edge cases" more broadly in the sense that when your existing code/pipeline cannot load the patient data for doing whatever you need to do.

How about this? @ericspod
We start with this we start with a cookbook as discussed today with a basic pipeline to process some dicom data.

Load the dicom.
Load dicom seg/RT.
Check if they images and segmentations are matching
Save them as Nifti for further processing in MONAI
How to extract appropriate series from a study
How to fill in missing slices in dicom seg?
How to arrange the dicom based on the coordinate information from the slices ?

People can post multiple solutions to the same problem.
Lets start by writing some code and processing pipelines for these problems ? Our hope would be that they are robust enough to read "all or a sample data" in Imaging Data Commons or TCIA. If it fails for a particular subject, we will look into that and write some more tools for handling them. In our everyday practice whenever we encounter issues which deviates from the existing pipeline we just document it along with code snippets to solve the issue. Lets keep it free flow at the moment, and we can think of knowledge organization later down the road.

kirbyju Oct 10, 2023
Collaborator

Here's a notebook showing how to use the TCIA API to identify DICOM SEG and RTSTRUCT series, attempt to locate the corresponding series that was segmented, and download them: https://github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_Segmentations.ipynb. I'd love to make a similar one that uses MONAI to accomplish the additional steps you've laid out.

kirbyju May 10, 2024
Collaborator

I recently discovered a radiomics package called MIRP that natively handles DICOM (CT/MR/PT images and SEG and RTSTRUCT for segmentations). In addition to calculating the IBSI-defined standard radiomic features there is also a function for prepping data for deep learning: https://oncoray.github.io/mirp/deep_learning.html. Does the output this generates play well with MONAI?

fedorov May 10, 2024

MIRP is shared under a GPL-equivalent license, see https://github.com/oncoray/mirp. In general, as a practical rule, it is undesirable to depend on GPL libraries.

wyli · 2023-06-06T09:13:08Z

wyli
Jun 6, 2023
Collaborator

from our offline discussion, it seems one viable option is to create a new public repository named dicom-cookbook under project-monai, please comment for further ideas or concerns, thanks!

0 replies

ericspod · 2023-06-18T21:16:41Z

ericspod
Jun 18, 2023
Maintainer Author

We had discussed further at the Friday meeting what form this should take. My idea is to have a repository for markdown documents listing DICOM resources like libraries and available public datasets, documents discussing concepts and issues with DICOM, documents with good practice information and "hacks", and notebooks illustrating basic operations and demonstrating good practices (this aligns with what @vikashg mentioned).

To expand this idea further I was wondering if we wanted to have a more general repository of this sort and include information on other areas of deep learning. One inspiration is this page https://github.com/soumith/ganhacks listing ideas about training GANs and good practice advice. We could have some similar things for data preprocessing, defining transform pipelines, training segmentation networks, training generative models, training other networks, and deployment. This would be broken up into subdirectories for different subjects with documents on best practices, tips, "hacks", cookbook code people can just copy-paste and use, and other sorts of collected wisdom we should be writing down and sharing with the community. These would be all living documents anyone can contribute to.

Any thoughts? Getting too ambitious?

4 replies

vikashg Jun 22, 2023
Collaborator

Hi all,
Sorry couldn't attend the meeting last week because of conferences. @ericspod I like the idea and I will propose let's start with that maybe in a month or 2, we can get some feedback and restructure as needed.

This week I was participating in the SIIM Hackathon for a dicom challenge for Series Renaming and it was an interesting experience as we found that the same code was not working on all the different hospitals. (Something we already knew). But, we investigated the points of failure and tried to incorporate the points of failure "nicely" or at least give meaningful information about the points of failure.

I think what we should do with this is write a base code for let's say reading a dicom seg file. We will test the code on a set of sample images from a different number of public datahubs. Let's say it works on TCIA, we can add a label to the code that verifies that it works on TCIA, and the code will get a verified badge for that dataset along with the date of verification. This is a conditional badge as we can only test so much.

If at a later date, we find it is not working on Mayo or Stanford dataset, we will try to resolve it, if possible. If not we will write a separate piece of code for that dataset and so the code evolves by people using it.

In our experience, we found that we cannot foresee all the problems but we can give the framework and evolve things from there.
I can explain more tomorrow.

kirbyju Oct 10, 2023
Collaborator

@vikashg I am extremely interested in discussing the standardized series naming thing with you when you have time. We've been considering adding a new database field in TCIA for this for ages but have been unsure how to generate the standardized names.

vikashg Oct 16, 2023
Collaborator

Hi @kirbyju. Whats your email. I will send you a meeting.

Thanks

kirbyju Oct 16, 2023
Collaborator

[email protected]

kirbyju · 2023-11-22T17:40:01Z

kirbyju
Nov 22, 2023
Collaborator

Hi all, I just noticed that there's a Deep Learning Lab class happening at RSNA that you might find useful in relation to this discussion. It's focused on keeping everything in DICOM format while building a generic CT segmentation tool based on the Mask RCNN network as implemented in Facebook's detectron2 library: https://github.com/RSNA/AI-Deep-Learning-Lab-2023/blob/main/sessions/dicom-seg/RSNA_2021_DICOM_IN_DICOM_OUT_Segmentation.ipynb.

3 replies

kirbyju Nov 22, 2023
Collaborator

Go figure, right after I posted this they updated the repo with a new notebook (using TotalSegmentator now) which breaks that URL above. Here are better URLs:

This year's TotalSegmentator notebook: https://github.com/RSNA/AI-Deep-Learning-Lab-2023/tree/main/sessions/dicom-seg
Last year's detectron2 notebook: https://github.com/RSNA/AI-Deep-Learning-Lab-2022/tree/main/sessions/dicom-seg

fedorov Nov 22, 2023

Truly unfortunately, but I have to say that notebook does not provide an example of how DICOM SEG encoding should be done properly (unless I missed something from the quick look).

Specifically, instead of assigning an organ-specific term, the notebook assigns generic "Organ" segmentation type to all organs, see these lines:

This defeats one of the key advantages of using DICOM SEG - structured and coded description of the content. And the painstaking part - mapping of the TotalSegmentator labels has already been done, see wasserth/TotalSegmentator#218 and https://github.com/wasserth/TotalSegmentator/blob/master/resources/totalsegmentator_snomed_mapping.csv.

@twloehfelm I know the time is short to fix this for RSNA, but would be great if you could consider fixing this sometime. I am happy to help after RSNA!

Also, something to keep in mind is that TotalSegmentator is available in MHub (see https://mhub.ai/models/totalsegmentator), which allows running TotalSegmentator directly on DICOM, and produces DICOM SEG automatically, taking care of all the DICOM conversion details behind the scenes with just one command:

docker run --rm -t --gpus all -v $in:/app/data/input_data -v $out:/app/data/output_data mhubai/totalsegmentator

The DICOM conversion components used in MHub are general purpose, and can be reused outside of MHub model packages (in particular, DICOM SEG conversion is done with dcmqi https://github.com/QIICR/dcmqi, which provides a command-line converter that you can use to do the conversion without writing any code), or applied to new models. The notebook above uses highdicom https://github.com/ImagingDataCommons/highdicom, which is a very flexible and powerful package, but I just wanted to mention dcmqi since the conversion process can indeed be simpler - in case the code looks intimidating to some.

Definitely many options out there!

fedorov Nov 22, 2023

Also, that rot90 + transpose(2,0,1) applied to the mask is not something that should be promoted or used - it is recipe for a future disaster.

aylward · 2023-11-22T18:57:52Z

aylward
Nov 22, 2023
Maintainer

Really inspires me to start a working group on visualization...

…

On Wed, Nov 22, 2023 at 1:50 PM Justin Kirby ***@***.***> wrote: Go figure, right after I posted this they updated the repo with a new notebook (using TotalSegmentator now) which breaks that URL above. Here are better URLs: - This year's TotalSegmentator notebook: https://github.com/RSNA/AI-Deep-Learning-Lab-2023/tree/main/sessions/dicom-seg - Last year's detectron2 notebook: https://github.com/RSNA/AI-Deep-Learning-Lab-2022/tree/main/sessions/dicom-seg — Reply to this email directly, view it on GitHub <#6551 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACEJL6B4RBDYHLYZG26ZE3YFZCRDAVCNFSM6AAAAAAYNRJ7ZCVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMNBVGUYDS> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Stephen R. Aylward, Ph.D. Chair, MONAI Advisory Board Senior Director, Strategic Initiatives, Kitware

0 replies

twloehfelm · 2023-11-22T21:58:20Z

twloehfelm
Nov 22, 2023

I appreciate the feedback Andrey! I am humbled (and slightly embarrassed!) that someone of your stature is doing a code review on my notebook! In my inexcusable defense, I'm a decent radiologist but a pretty poor (and therefore dangerous) programmer. My approach to this hands-on session is not necessarily a best-practices example but more of an introduction of what is possible, and perhaps more importantly begging the radiology community to embrace DICOM and DICOM SEG fully! I will work on addressing the issues you point out - I can certainly use the correct `segmented_property_category` and `_type` based on the totalseg label. I'm not sure I have the brain power or time available to understand nifti coordinate mapping to DICOM coordinated beyond the `rot90` and `transpose(2,0,1)` hacks I stumbled across. I appreciate your highlighting of MHub too - I hadn't come across that before! Thanks again for the feedback! Tom

…

On Wed, Nov 22, 2023 at 11:30 AM Andrey Fedorov ***@***.***> wrote: Also, that rot90 + transpose(2,0,1) applied to the mask is not something that should be promoted or used - it is recipe for a future disaster. — Reply to this email directly, view it on GitHub <#6551 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB6BG6EBY4PWYYZZJ4OVW2TYFZHF7AVCNFSM6AAAAAAYNRJ7ZCVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMNBVG43TE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

1 reply

fedorov Nov 24, 2023

Tom, I should have reached out earlier, and it's me who should be embarrassed. I noticed your session at the first edition of RSNA DLL, but failed to step through the notebook early in time and connect with you. Hopefully we can meet this coming week! I am very impressed that a radiologist is promoting concepts that many programmers reject as too complex! I wish I knew as much about radiology as you do about programming and DICOM!

ericspod · 2023-12-12T17:18:29Z

ericspod
Dec 12, 2023
Maintainer Author

Hi all, I have started the cookbook repo with an outline of initial sections. I haven't much content yet but it's something to start on, it's currently private until we have something fuller. I can add anyone who's interested in contributing now.

0 replies

victormurcia · 2024-01-04T22:44:38Z

victormurcia
Jan 4, 2024

The compilation of best practices is a wonderful idea! One thing that I wanted to add here is that there is a major revision to the DICOM standard coming up soon (the first report is due on March 5th 2024). Sounds like it would be a great time to integrate such best practices into MONAI documentation given the upcoming DICOM revision.

1 reply

dclunie Jan 5, 2024

Don't get too excited - the report from the strategic planning group (WG 10) of DICOM will be on consideration of the need for, and feasibility of, a major revision - there is no commitment to such a revision actually happening. But the more information gathered describing problems with and solutions for the existing standard, as soon as possible, the better.

pwrightkcl · 2024-01-31T13:39:18Z

pwrightkcl
Jan 31, 2024

Just to add another source of knowledge: Chris Rorden and his collaborators on dcm2niix have a lot of experience in dealing with peculiarities of DICOM from various manufacturers, focused on conversion to NIfTI. You can find useful explanations in either the .md docs or the issues about this, e.g. pros and cons of different methods of figuring out slice spacing and gantry tilt.

0 replies

Consolidating DICOM Best Practices, Techniques, and Libraries #6551

ericspod May 24, 2023 Maintainer

Replies: 12 comments · 21 replies

aylward May 25, 2023 Maintainer

kirbyju Oct 10, 2023 Collaborator

vikashg Jun 2, 2023 Collaborator

ericspod Oct 11, 2023 Maintainer Author

kirbyju Oct 17, 2023 Collaborator

ericspod Oct 18, 2023 Maintainer Author

kirbyju Oct 18, 2023 Collaborator

aylward Jun 2, 2023 Maintainer

vikashg Jun 2, 2023 Collaborator

kirbyju Oct 10, 2023 Collaborator

kirbyju May 10, 2024 Collaborator

wyli Jun 6, 2023 Collaborator

ericspod Jun 18, 2023 Maintainer Author

vikashg Jun 22, 2023 Collaborator

kirbyju Oct 10, 2023 Collaborator

vikashg Oct 16, 2023 Collaborator

kirbyju Oct 16, 2023 Collaborator

kirbyju Nov 22, 2023 Collaborator

kirbyju Nov 22, 2023 Collaborator

aylward Nov 22, 2023 Maintainer

ericspod Dec 12, 2023 Maintainer Author

ericspod
May 24, 2023
Maintainer

Replies: 12 comments 21 replies

aylward
May 25, 2023
Maintainer

kirbyju Oct 10, 2023
Collaborator

vikashg
Jun 2, 2023
Collaborator

ericspod Oct 11, 2023
Maintainer Author

kirbyju Oct 17, 2023
Collaborator

ericspod Oct 18, 2023
Maintainer Author

kirbyju Oct 18, 2023
Collaborator

aylward
Jun 2, 2023
Maintainer

vikashg Jun 2, 2023
Collaborator

kirbyju Oct 10, 2023
Collaborator

kirbyju May 10, 2024
Collaborator

wyli
Jun 6, 2023
Collaborator

ericspod
Jun 18, 2023
Maintainer Author

vikashg Jun 22, 2023
Collaborator

kirbyju Oct 10, 2023
Collaborator

vikashg Oct 16, 2023
Collaborator

kirbyju Oct 16, 2023
Collaborator

kirbyju
Nov 22, 2023
Collaborator

kirbyju Nov 22, 2023
Collaborator

aylward
Nov 22, 2023
Maintainer

ericspod
Dec 12, 2023
Maintainer Author