Annotations for the PI-CAI Challenge: Public Training and Development Dataset
To download the associated imaging data, visit: https://zenodo.org/record/6624726. Note, the Public Training and Development Dataset of the PI-CAI challenge includes 328 cases from the ProstateX challenge. Thus, we strongly recommend using this dataset exclusively, and not in addition to the ProstateX dataset.
Patient cases used for the training datasets of the PI-CAI challenge were annotated with the same reference standard as used for the ProstateX challenge, i.e. histologically-confirmed (ISUP ≥ 2) positives, and histologically- (ISUP ≤ 1) or MRI- (PI-RADS ≤ 2) confirmed negatives, without follow-up. Note, this means that certain patients (e.g. 11054
) can have a prior study that was found to be negative (1001074
in 2018), but a subsequent study that was found to be positive (1001075
in 2020). In this case, each study was annotated with respect to its associated histopathology or radiology findings only. From our institutional findings at RUMC (Venderink et al., 2019), such scenarios typically emerge for less than <1% negative cases.
For all cases, csPCa lesions were delineated and/or csPCa outcomes were recorded, by one of 10 trained investigators or 1 radiology resident, under supervision of one of 3 expert radiologists, at RUMC or UMCG. Lesion delineations were created using ITK-SNAP v3.80.
Out of the 1500 cases shared in the Public Training and Development Dataset, 1075 cases have benign tissue or indolent PCa (i.e. their labels should be empty or full of 0s) and 425 cases have csPCa (i.e. their labels should have lesion blobs of value 2, 3, 4 or 5). Out of these 425 positive cases, only 220 cases carry an annotation derived by a human expert. Remaining 205 positive cases have not been annotated. In other words, only 17% (220/1295) of the annotations provided in picai_labels/csPCa_lesion_delineations/human_expert should have csPCa lesion annotations, while the remaining 83% (1075/1295) of annotations should be empty.
Automated AI-derived delineations of the prostate whole-gland (see algorithm used for this task) and csPCa lesions (Bosma et al., 2022) have also been made available.
Location | Description |
---|---|
csPCa_lesion_delineations/ human_expert/original/ |
Original csPCa annotations, as made by one of the trained investigators or radiology resident. Depending on the annotator/center and their preference, some of these annotations were mapped or created at the spatial resolution of the T2W image, while others have been created at the resolution of the ADC or DWI/HBV images. Either way, for every annotation in this folder, all lesion delineations will always clearly map to observations in DWI/ADC imaging. Available for 1295/1500 (86%) cases. |
csPCa_lesion_delineations/ human_expert/resampled/ |
Original csPCa annotations resampled to the spatial resolution of the associated axial T2-weighted scan. Available for 1295/1500 (86%) cases. |
csPCa_lesion_delineations/ AI/Bosma22a |
Automated AI-derived delineations of csPCa lesions (Bosma et al., 2022a). |
anatomical_delineations/ whole_gland/AI/Bosma22b |
Automated AI-derived delineations of the prostate whole-gland (see algorithm used for this task). Note, that AI-derived annotations can be susceptible to errors or faulty segmentations (e.g. whole-gland segmentation for case 11050_1001070 ). |
clinical_information/ marksheet.csv/ |
Clinical information (patient age, PSA, PSA density, prostate volume) and overview of each study (e.g. anonymized study date, MRI vendor and scanner used for acquisition, GS per lesion {if prostatectomy or biopsies were performed}) in this dataset. |
Label Mapping of csPCa Annotations
All expert-derived csPCa annotations carry granular or multi-class labels (ISUP ≤ 1, 2, 3, 4, 5), while all automated AI-derived annotations carry binary labels (ISUP ≤ 1 or ≥ 2).
Label | Expert-Derived Annotations | AI-Derived Annotations |
---|---|---|
0 | ISUP ≤ 1 | ISUP ≤ 1 |
1 | N/A | ISUP ≥ 2 |
2 | ISUP 2 | N/A |
3 | ISUP 3 | N/A |
4 | ISUP 4 | N/A |
5 | ISUP 5 | N/A |
List of Clinical Information Descriptors
Descriptor | Meaning |
---|---|
patient_id |
Anonymized patient ID. |
study_id |
Anonymized study ID. Multiple study IDs can be assigned to the same patient ID. |
mri_date |
Anonymized date at the time of the MRI study. |
patient_age |
Patient age at the time of the MRI study. |
psa |
Prostate-specific antigen level (PSA) (unit: ng/mL), as stated in the radiology report associated with the MRI study. If this value is missing, then it was not reported for the given study. |
prostate_volume |
Prostate volume (unit: mL), as stated in the radiology report associated with the MRI study. In clinical practice, this value is typically approximated using the conventional prolate ellipsoid model. If this value is missing, then it was not reported for the given study. |
psad |
Prostate-specific antigen density (PSAd) (unit: ng/mL²), as stated in the radiology report associated with the MRI study. Note, this value may not neccessarily be the same as the PSA divided by the prostate volume, due to approximations and rounding errors during clinical reporting. If this value is missing, then it was not reported for the given study. |
histopath_type |
Procedure used to sample lesion tissue specimen for microscopic or histopathologic analysis. Its value can be SysBx for systematic biopsies, MRBx for MR-guided biopsies, SysBx+MRBx for systematic and MR-guided biopsies, or RP for radical prostatectomy. If its value is missing, then no tissue sampling procedure was performed; indicating a negative MRI study. |
lesion_GS |
Gleason score (GS) assigned to each lesion after histopathologic analysis, where scores for different lesions are separated by , (commas). If its value is missing, then no tissue sampling procedure was performed; indicating a negative MRI study. If its value is N/A only for specific lesion(s), then those lesion(s) (as observed in radiology) were not biopsied or graded in histopathology (typically the case for PI-RADS 1-2 lesions). |
Characteristic | Frequency |
---|---|
Number of sites | 11 |
Number of MRI scanners | 5 S, 2 P |
Number of patients | 1476 |
Number of cases | 1500 |
— Benign or indolent PCa | 1075 |
— csPCa (ISUP ≥ 2) | 425 |
Median age (years) | 66 (IQR: 61–70) |
Median PSA (ng/mL) | 8.5 (IQR: 6–13) |
Median prostate volume (mL) | 57 (IQR: 40–80) |
Number of positive MRI lesions | 1087 |
— PI-RADS 3 | 246 (23%) |
— PI-RADS 4 | 438 (40%) |
— PI-RADS 5 | 403 (37%) |
Number of ISUP-based lesions | 776 |
— ISUP 1 | 311 (40%) |
— ISUP 2 | 260 (34%) |
— ISUP 3 | 109 (14%) |
— ISUP 4 | 41 (5%) |
— ISUP 5 | 55 (7%) |
We encourage open-source contributions! For instance, you can contribute expert-derived delineations of the prostate whole-gland and zonal anatomy at the spatial resolution of axial T2-weighted images. If you're interested, feel free to propose PRs for inclusion to this repo. Pending quality control, substantial contributions will be merged in and credited accordingly.
If you are using this dataset or some part of it, please cite the following article:
BibTeX:
@ARTICLE{PICAI_BIAS,
author = {Anindo Saha, Jasper J. Twilt, Joeran S. Bosma, Bram van Ginneken, Derya Yakar, Mattijs Elschot, Jeroen Veltman, Jurgen Fütterer, Maarten de Rooij, Henkjan Huisman},
title = {{Artificial Intelligence and Radiologists at Prostate Cancer Detection in MRI: The PI-CAI Challenge (Study Protocol)}},
year = {2022},
doi = {10.5281/zenodo.6667655}
}
Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, The Netherlands
- Anindo Saha: [email protected]
- Jasper Twilt: [email protected]
- Joeran Bosma: [email protected]
- Henkjan Huisman: [email protected]