Memory efficient way to read random subvolumes of large 3D datasets on-the-fly #3820

perlmutter · 2022-02-18T20:24:18Z

perlmutter
Feb 18, 2022

My training data consists of a few dozen large CT volumes, each ~500x2000x2000 voxels. I'd like to train a 3D FCN segmentation model on patches of size ~32x128x128 randomly distributed across all the volumes. One solution would be to simply patchify all the volumes ahead of time, but I'd prefer to randomly sample patches on the fly if possible, via eg. RandSpatialCropSamples. However 1) the entire dataset is too large to fit into memory, and 2) it would be very inefficient to read the entire ~500x2000x2000 volume every iteration, only to immediately crop to size 32x128x128 with RandSpatialCropSamples.

Is there any way for the dataloader to load only random subvolumes of the full image? The data can be stored in a zarr or some other chunked data format. If not, is there another method you would suggest?

wyli · 2022-02-18T20:36:01Z

wyli
Feb 18, 2022
Collaborator

there's no ready-to-use solution in monai at the moment, the closest approach would be the coordinate-based patch dataset for digital pathology https://github.com/Project-MONAI/MONAI/blob/dev/monai/apps/pathology/data/datasets.py#L80

0 replies

scarpma · 2022-03-29T10:49:41Z

scarpma
Mar 29, 2022

This would be very interesting, but I think there is no way of achieving this, since there are no readers that can read just a subvolume of a dicom or nii.gz.

By the way, RandSpatialCropSamples allows to extract multiple samples from one single volume through the parameter num_samples.

8 replies

scarpma Mar 29, 2022

Some time ago I used numpy arrays, but I found it not ideal in term of meta data info. I think one of the best solution is to convert dicom to nii.gz or nii. File management is very simple and loading times are very faster. The only bad thing is that nifti format does not support float16, but only 32 and 64.

rijobro Mar 30, 2022
Collaborator

I would suggest using the PersistentDataset in this case. You get the advantages of quick loading without losing the meta data.

Edit: To clarify, this wouldn't help loading a sub-volume of the image, but would accelerate the reading of dicoms and compressed niftis.

scarpma Mar 30, 2022

Thank you very much for the answer.

Actually, I tried that, but for my volumes of like 300x300x300 size the time for loading .nii.gz files (with normal CacheDataset) is very low (for 40 volumes it is like 10 seconds with 8 workers). Given these timings, I preferred to keep the available space in my disk for other stuff, rather than giving it to PersistentDataset cache.

By the way, I think your work is wonderful. This library is fantastic for medical applications of DL. Thanks !!!

vikashg Mar 30, 2022
Collaborator

Hey @scarpma Did you try the Tile on Grid solution? Just curious if it worked for 3D.

scarpma Mar 30, 2022

No, I don't know what it is

samshipengs · 2023-11-22T00:20:44Z

samshipengs
Nov 22, 2023

@perlmutter any luck on this?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory efficient way to read random subvolumes of large 3D datasets on-the-fly #3820

{{title}}

Replies: 3 comments 8 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Memory efficient way to read random subvolumes of large 3D datasets on-the-fly #3820

perlmutter Feb 18, 2022

Replies: 3 comments · 8 replies

wyli Feb 18, 2022 Collaborator

scarpma Mar 29, 2022

scarpma Mar 29, 2022

rijobro Mar 30, 2022 Collaborator

scarpma Mar 30, 2022

vikashg Mar 30, 2022 Collaborator

scarpma Mar 30, 2022

samshipengs Nov 22, 2023

perlmutter
Feb 18, 2022

Replies: 3 comments 8 replies

wyli
Feb 18, 2022
Collaborator

scarpma
Mar 29, 2022

rijobro Mar 30, 2022
Collaborator

vikashg Mar 30, 2022
Collaborator

samshipengs
Nov 22, 2023