Multiprocessing of RandCropByPosNegLabel #7964

StefanFischer · 2024-07-30T10:38:41Z

StefanFischer
Jul 30, 2024

Hello MONAI-Team and Users,

i would like to know if there is an option to run RandCropByPosNegLabel Transforms directly on the GPU with multiprocessing.
To ensure fast dataloading i use MONAI's CacheDataset to do determistic transforms before training start. During training the RandCropByPosNegLabel transform is applied on the GPU. I try to train with a very high batch size (>1000 samples per batch) and then the dataloading gets quite slow (5s per iteration) and minimal patch size (16x16x16, bs=1028). While the number of voxels per training input tensor is equal to the maximal patch size (128x128x128, bs=2), the training with the small patch size is significantly larger than with the maximal patch size. Besides the RandCropByPosNegLabel transformation i do NOT apply any other random transform.

Is there an option to do it more efficiently than currently implemented in the MONAI library?
The code of the RandCropByPosNegLabel shows a for loop over the cropping operation.

MONAI/monai/transforms/croppad/array.py

Lines 1047 to 1223 in 59a7211

    
           class RandCropByPosNegLabel(Randomizable, TraceableTransform, LazyTransform, MultiSampleTrait): 
        
               """ 
        
               Crop random fixed sized regions with the center being a foreground or background voxel 
        
               based on the Pos Neg Ratio. 
        
               And will return a list of arrays for all the cropped images. 
        
               For example, crop two (3 x 3) arrays from (5 x 5) array with pos/neg=1:: 
        
                   [[[0, 0, 0, 0, 0], 
        
                     [0, 1, 2, 1, 0],            [[0, 1, 2],     [[2, 1, 0], 
        
                     [0, 1, 3, 0, 0],     -->     [0, 1, 3],      [3, 0, 0], 
        
                     [0, 0, 0, 0, 0],             [0, 0, 0]]      [0, 0, 0]] 
        
                     [0, 0, 0, 0, 0]]] 
        
               If a dimension of the expected spatial size is larger than the input image size, 
        
               will not crop that dimension. So the cropped result may be smaller than expected size, and the cropped 
        
               results of several images may not have exactly same shape. 
        
               And if the crop ROI is partly out of the image, will automatically adjust the crop center to ensure the 
        
               valid crop ROI. 
        
               This transform is capable of lazy execution. See the :ref:`Lazy Resampling topic<lazy_resampling>` 
        
               for more information. 
        
               Args: 
        
                   spatial_size: the spatial size of the crop region e.g. [224, 224, 128]. 
        
                       if a dimension of ROI size is larger than image size, will not crop that dimension of the image. 
        
                       if its components have non-positive values, the corresponding size of `label` will be used. 
        
                       for example: if the spatial size of input data is [40, 40, 40] and `spatial_size=[32, 64, -1]`, 
        
                       the spatial size of output data will be [32, 40, 40]. 
        
                   label: the label image that is used for finding foreground/background, if None, must set at 
        
                       `self.__call__`.  Non-zero indicates foreground, zero indicates background. 
        
                   pos: used with `neg` together to calculate the ratio ``pos / (pos + neg)`` for the probability 
        
                       to pick a foreground voxel as a center rather than a background voxel. 
        
                   neg: used with `pos` together to calculate the ratio ``pos / (pos + neg)`` for the probability 
        
                       to pick a foreground voxel as a center rather than a background voxel. 
        
                   num_samples: number of samples (crop regions) to take in each list. 
        
                   image: optional image data to help select valid area, can be same as `img` or another image array. 
        
                       if not None, use ``label == 0 & image > image_threshold`` to select the negative 
        
                       sample (background) center. So the crop center will only come from the valid image areas. 
        
                   image_threshold: if enabled `image`, use ``image > image_threshold`` to determine 
        
                       the valid image content areas. 
        
                   fg_indices: if provided pre-computed foreground indices of `label`, will ignore above `image` and 
        
                       `image_threshold`, and randomly select crop centers based on them, need to provide `fg_indices` 
        
                       and `bg_indices` together, expect to be 1 dim array of spatial indices after flattening. 
        
                       a typical usage is to call `FgBgToIndices` transform first and cache the results. 
        
                   bg_indices: if provided pre-computed background indices of `label`, will ignore above `image` and 
        
                       `image_threshold`, and randomly select crop centers based on them, need to provide `fg_indices` 
        
                       and `bg_indices` together, expect to be 1 dim array of spatial indices after flattening. 
        
                       a typical usage is to call `FgBgToIndices` transform first and cache the results. 
        
                   allow_smaller: if `False`, an exception will be raised if the image is smaller than 
        
                       the requested ROI in any dimension. If `True`, any smaller dimensions will be set to 
        
                       match the cropped size (i.e., no cropping in that dimension). 
        
                   lazy: a flag to indicate whether this transform should execute lazily or not. Defaults to False. 
        
               Raises: 
        
                   ValueError: When ``pos`` or ``neg`` are negative. 
        
                   ValueError: When ``pos=0`` and ``neg=0``. Incompatible values. 
        
               """ 
        
               backend = SpatialCrop.backend 
        
               def __init__( 
        
                   self, 
        
                   spatial_size: Sequence[int] | int, 
        
                   label: torch.Tensor | None = None, 
        
                   pos: float = 1.0, 
        
                   neg: float = 1.0, 
        
                   num_samples: int = 1, 
        
                   image: torch.Tensor | None = None, 
        
                   image_threshold: float = 0.0, 
        
                   fg_indices: NdarrayOrTensor | None = None, 
        
                   bg_indices: NdarrayOrTensor | None = None, 
        
                   allow_smaller: bool = False, 
        
                   lazy: bool = False, 
        
               ) -> None: 
        
                   LazyTransform.__init__(self, lazy) 
        
                   self.spatial_size = spatial_size 
        
                   self.label = label 
        
                   if pos < 0 or neg < 0: 
        
                       raise ValueError(f"pos and neg must be nonnegative, got pos={pos} neg={neg}.") 
        
                   if pos + neg == 0: 
        
                       raise ValueError("Incompatible values: pos=0 and neg=0.") 
        
                   self.pos_ratio = pos / (pos + neg) 
        
                   self.num_samples = num_samples 
        
                   self.image = image 
        
                   self.image_threshold = image_threshold 
        
                   self.centers: tuple[tuple] | None = None 
        
                   self.fg_indices = fg_indices 
        
                   self.bg_indices = bg_indices 
        
                   self.allow_smaller = allow_smaller 
        
               def randomize( 
        
                   self, 
        
                   label: torch.Tensor | None = None, 
        
                   fg_indices: NdarrayOrTensor | None = None, 
        
                   bg_indices: NdarrayOrTensor | None = None, 
        
                   image: torch.Tensor | None = None, 
        
               ) -> None: 
        
                   fg_indices_ = self.fg_indices if fg_indices is None else fg_indices 
        
                   bg_indices_ = self.bg_indices if bg_indices is None else bg_indices 
        
                   if fg_indices_ is None or bg_indices_ is None: 
        
                       if label is None: 
        
                           raise ValueError("label must be provided.") 
        
                       fg_indices_, bg_indices_ = map_binary_to_indices(label, image, self.image_threshold) 
        
                   _shape = None 
        
                   if label is not None: 
        
                       _shape = label.peek_pending_shape() if isinstance(label, MetaTensor) else label.shape[1:] 
        
                   elif image is not None: 
        
                       _shape = image.peek_pending_shape() if isinstance(image, MetaTensor) else image.shape[1:] 
        
                   if _shape is None: 
        
                       raise ValueError("label or image must be provided to get the spatial shape.") 
        
                   self.centers = generate_pos_neg_label_crop_centers( 
        
                       self.spatial_size, 
        
                       self.num_samples, 
        
                       self.pos_ratio, 
        
                       _shape, 
        
                       fg_indices_, 
        
                       bg_indices_, 
        
                       self.R, 
        
                       self.allow_smaller, 
        
                   ) 
        
               @LazyTransform.lazy.setter  # type: ignore 
        
               def lazy(self, _val: bool): 
        
                   self._lazy = _val 
        
               @property 
        
               def requires_current_data(self): 
        
                   return False 
        
               def __call__( 
        
                   self, 
        
                   img: torch.Tensor, 
        
                   label: torch.Tensor | None = None, 
        
                   image: torch.Tensor | None = None, 
        
                   fg_indices: NdarrayOrTensor | None = None, 
        
                   bg_indices: NdarrayOrTensor | None = None, 
        
                   randomize: bool = True, 
        
                   lazy: bool | None = None, 
        
               ) -> list[torch.Tensor]: 
        
                   """ 
        
                   Args: 
        
                       img: input data to crop samples from based on the pos/neg ratio of `label` and `image`. 
        
                           Assumes `img` is a channel-first array. 
        
                       label: the label image that is used for finding foreground/background, if None, use `self.label`. 
        
                       image: optional image data to help select valid area, can be same as `img` or another image array. 
        
                           use ``label == 0 & image > image_threshold`` to select the negative sample(background) center. 
        
                           so the crop center will only exist on valid image area. if None, use `self.image`. 
        
                       fg_indices: foreground indices to randomly select crop centers, 
        
                           need to provide `fg_indices` and `bg_indices` together. 
        
                       bg_indices: background indices to randomly select crop centers, 
        
                           need to provide `fg_indices` and `bg_indices` together. 
        
                       randomize: whether to execute the random operations, default to `True`. 
        
                       lazy: a flag to override the lazy behaviour for this call, if set. Defaults to None. 
        
                   """ 
        
                   if image is None: 
        
                       image = self.image 
        
                   if randomize: 
        
                       if label is None: 
        
                           label = self.label 
        
                       self.randomize(label, fg_indices, bg_indices, image) 
        
                   results: list[torch.Tensor] = [] 
        
                   if self.centers is not None: 
        
                       img_shape = img.peek_pending_shape() if isinstance(img, MetaTensor) else img.shape[1:] 
        
                       roi_size = fall_back_tuple(self.spatial_size, default=img_shape) 
        
                       lazy_ = self.lazy if lazy is None else lazy 
        
                       for i, center in enumerate(self.centers): 
        
                           cropper = SpatialCrop(roi_center=center, roi_size=roi_size, lazy=lazy_) 
        
                           cropped = cropper(img) 
        
                           if get_track_meta(): 
        
                               ret_: MetaTensor = cropped  # type: ignore 
        
                               ret_.meta[Key.PATCH_INDEX] = i 
        
                               ret_.meta["crop_center"] = center 
        
                               self.push_transform(ret_, replace=True, lazy=lazy_) 
        
                           results.append(cropped) 
        
                   return results

. I think that it is possible to parralize the following code:

for i, center in enumerate(self.centers): cropper = SpatialCrop(roi_center=center, roi_size=roi_size, lazy=lazy_) cropped = cropper(img)

I tried to use multiprocessing (torch.multiprocessing), but i always get CUDA related issues. I am also an absolute noob in multiprocessing. Does anyone has experience in CUDA multiprocessing?
Would be happy to hear from you guys :)

KumoLiu · 2024-07-30T15:47:18Z

KumoLiu
Jul 30, 2024
Maintainer

Hi @StefanFischer, to put transforms on the GPU, you can try ThreadDataLoader .

For more details, you can refer to this tutorial: https://github.com/Project-MONAI/tutorials/blob/main/acceleration/fast_model_training_guide.md

Hope it helps, thanks.

2 replies

StefanFischer Aug 1, 2024
Author

Hi KumoLiu,

thanks for the fast reply!
I already use the ThreadDataLoader, but i thought that the whole process of the RandCropByPosNegLabel transform can be implemented faster using multiprocessing.
Instead of iterating through the crop creation in a serial fashion (for-loop) as in the current MONAI implementation (class RandCropByPosNegLabel)
for i, center in enumerate(self.centers): cropper = SpatialCrop(roi_center=center, roi_size=roi_size, lazy=lazy_) cropped = cropper(img)
would it be also possible to do crop creation using multiprocessing (in parallel)?
I only got CUDA multiprocessing errors when I tried to parallelize it. I guess there are issues with Memory Managment on GPU tensors.

Thank you for the help!

Best,
Stefan

KumoLiu Aug 1, 2024
Maintainer

Hi @StefanFischer,

Yes, using multiprocessing for image cropping tasks can significantly improve performance, especially when dealing with large datasets or computationally intensive operations. However, as you mentioned, working with GPU tensors and CUDA in a multiprocessing context can be tricky due to memory management issues. To avoid these issues, one approach is to perform the cropping operations on the CPU using multiprocessing and then transfer the cropped tensors to the GPU if needed. But this way it may increase the time of the IO operations.

You can also refer to the discussion here: #2794
We didn't enable multi-threads for ThreadDataloader because most of MONAI random transforms are not thread-safe and definitely need to enhance our logic.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiprocessing of RandCropByPosNegLabel #7964

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Multiprocessing of RandCropByPosNegLabel #7964

StefanFischer Jul 30, 2024

Replies: 1 comment · 2 replies

KumoLiu Jul 30, 2024 Maintainer

StefanFischer Aug 1, 2024 Author

KumoLiu Aug 1, 2024 Maintainer

StefanFischer
Jul 30, 2024

Replies: 1 comment 2 replies

KumoLiu
Jul 30, 2024
Maintainer

StefanFischer Aug 1, 2024
Author

KumoLiu Aug 1, 2024
Maintainer