This repository contains our image segmentation models applicable to both 2D and 3D segmentation. Our most recent HNOSeg-XS (eXtremely Small Hartley Neural Operator for Segmentation) architecture, apart from its intrinsic zero-shot super-resolution property, it is also computationally efficient and extremely parameter efficient. When tested on the BraTS'23, KiTS'23, and MVSeg'23 datasets with a Tesla V100 GPU, HNOSeg-XS showed its superior resolution robustness with fewer than 34.7k model parameters. It also achieved the overall best inference time (< 0.24 s) and memory efficiency (< 1.8 GiB) compared to the tested CNN and transformer models. This repository also contains the model architectures of our previous publications, including HartleyMHA, FNOSeg3D, and V-Net-DS. Please see the technical details for installation and running.
Fig. 1: Computational requirements of tested models on a single image, with average values from three datasets.
Fig. 2: Comparisons of robustness to training image resolutions. Each point represents the average value of different regions of the testing data. Regardless of the training image sizes, the largest image sizes were used in testing.
The code was developed with Python 3.10.12 and PyTorch 2.5.1. If you are only interested in the architectures, the nets
module is all you need, though the hyperparameters are stored under experiments/config_files as they are dataset specific. Experimental setups such as data splits and training procedure are in the experiments
module. The nets
module is dataset independent, while some functions in the experiments
module (e.g., dataset partitioning) are dataset specific. We only provide the code for the BraTS'23 dataset as it is more standardized.
In experiments, arguments are provided through a config file using the Python's module ConfigParser
. The config file is saved to the output directory for future reference. Examples of the config files used in our experiments are provided under experiments/config_files for reproducibility.
Note:
Although the code is built for both 2D and 3D segmentation, it is not fully tested for 2D segmentation.
The PyTorch implementation of the HartleyMHA architecture is not fully tested.
The previous TensorFlow implementation is put under
tensorflow
, which was last updated in April 2024. It does not provide the new features such as the self-normalizing capability, and the segmentation accuracy and efficiency are suboptimal.
There are multiple Python packages required to run the code. You can install them by the following steps:
-
Create a virtual environment (https://docs.python.org/3/library/venv.html):
python -m venv /path/to/new/virtual/environment
You can check the Python version with
python -V
. -
Upgrade
pip
in the activated virtual environment:pip install --upgrade pip
This is important as the installed
pip
version can be outdated and the subsequent steps may fail. -
Clone and install the repository:
git clone https://github.com/IBM/multimodal-3d-image-segmentation.git pip install multimodal-3d-image-segmentation/
Note: The software package
graphviz
(https://graphviz.org/) is required to plot the model architecture. If you encounter the corresponding runtime error, you can either installgraphviz
(e.g., bysudo apt install graphviz
in Linux), or setis_plot_model = False
in the training config file to skipplot_model
. -
Use the following code to verify the installation:
from multimodal_3d_image_segmentation import nets model = nets.HNOSegXS(4, 4, 24, [3, 3, 3, 3, 3, 3, 3, 3], (10, 14, 14), device='cuda') print(sum(p.numel() for p in model.parameters()))
You should see the number of model parameters as 28248. You can change
device='cuda'
todevice='cpu'
if no GPU is available.
The experiments/brats23_data_preparation folder contains the script and config file for partitioning the BraTS'23 dataset. The program goes through the dataset folders to extract the patient IDs and groups them into training, validation, and testing sets. The resulted lists of file paths are saved as txt files. To run the script, we first modify the config_partitioning.ini
config file, then use the command:
python partitioning.py /path/to/config_partitioning.ini
The split examples used in our experiments are provided under split_examples
.
To perform training, we first modify the config_<arch>.ini
file under experiments/config_files, then run:
python experiments/run.py /path/to/config_<arch>.ini
where <arch>
stands for an architecture (e.g., fnoseg). The config files of different architectures are only different in the [model]
section and output_dir
. If the program stops before training is completed, add is_continue = True
in [main]
in the config file, so that the training can be restarted from the last checkpoint. The default setting saves a checkpoint for every 10 epochs, or when a new best model is available.
Although the above command runs both training and testing, the testing is performed on the same image size as training. To test on a different image size, modify the following in the config file:
- In
[main]
, changeis_train
toFalse
. - In
[input_lists]
, changedata_dir
. - In
[test]
, changeoutput_folder
. - In
[statistics]
, changeuse_surface_dice
anduse_hd95
as needed.
-
HNOSeg-XS (Please cite this paper if you use our models)
Ken C. L. Wong, Hongzhi Wang, and Tanveer Syeda-Mahmood, “HNOSeg-XS: extremely small Hartley neural operator for efficient and resolution-robust 3D image segmentation,” IEEE Transactions on Medical Imaging, 2025 (in press). [pdf]
@article{Journal:Wong:TMI2025:hnoseg-xs, title = {{HNOSeg-XS}: extremely small {Hartley} neural operator for efficient and resolution-robust {3D} image segmentation}, author = {Wong, Ken C. L. and Wang, Hongzhi and Syeda-Mahmood, Tanveer}, journal = {IEEE Transactions on Medical Imaging}, year = {2025}, }
-
HartleyMHA
Ken C. L. Wong, Hongzhi Wang, and Tanveer Syeda-Mahmood, “HartleyMHA: self-attention in frequency domain for resolution-robust and parameter-efficient 3D image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023, pp. 364–373. [pdf]
@inproceedings{Conference:Wong:MICCAI2023:hartleymha, title = {{HartleyMHA}: self-attention in frequency domain for resolution-robust and parameter-efficient {3D} image segmentation}, author = {Wong, Ken C. L. and Wang, Hongzhi and Syeda-Mahmood, Tanveer}, booktitle = {International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)}, pages = {364--373}, year = {2023}, }
-
FNOSeg3D
Ken C. L. Wong, Hongzhi Wang, and Tanveer Syeda-Mahmood, “FNOSeg3D: resolution-robust 3D image segmentation with Fourier neural operator,” in IEEE International Symposium on Biomedical Imaging (ISBI), 2023, pp. 1–5. [pdf]
@inproceedings{Conference:Wong:ISBI2023:fnoseg3d, title = {{FNOSeg3D}: resolution-robust {3D} image segmentation with {Fourier} neural operator}, author = {Wong, Ken C. L. and Wang, Hongzhi and Syeda-Mahmood, Tanveer}, booktitle = {IEEE International Symposium on Biomedical Imaging (ISBI)}, pages = {1--5}, year = {2023}, }
-
V-Net-DS (V-Net with deep supervision)
Ken C. L. Wong, Mehdi Moradi, Hui Tang, and Tanveer Syeda-Mahmood, “3D segmentation with exponential logarithmic loss for highly unbalanced object sizes,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2018, pp. 612–619. [pdf]
@inproceedings{Conference:Wong:MICCAI2018:3d, title = {{3D} segmentation with exponential logarithmic loss for highly unbalanced object sizes}, author = {Wong, Ken C. L. and Moradi, Mehdi and Tang, Hui and Syeda-Mahmood, Tanveer}, booktitle = {International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)}, pages = {612--619}, year = {2018}, }
-
Pearson’s Correlation Coefficient (PCC) loss
Ken C. L. Wong and Mehdi Moradi, “3D segmentation with fully trainable Gabor kernels and Pearson’s correlation coefficient,” in Machine Learning in Medical Imaging, 2022, pp. 53–61. [pdf]
@inproceedings{Workshop:Wong:MLMI2022:3d, title = {{3D} segmentation with fully trainable {Gabor} kernels and {Pearson's} correlation coefficient}, author = {Wong, Ken C. L. and Moradi, Mehdi}, booktitle = {Machine Learning in Medical Imaging}, pages = {53--61}, year = {2022}, }
- The new PyTorch implementation replaces the TensorFlow implementation. This PyTorch implementation includes the new and improved models in our IEEE TMI 2025 paper.
- Updated the code for the most recent version of TensorFlow (2.16.1).
- The
datagenerator.py
module is replaced by thedataset.py
module that usesPyDataset
in Keras 3. AsPyDataset
is new in Keras 3 and thus TensorFlow 2.16.1, classInputData
is not backward compatible.
Ken C. L. Wong ([email protected])