Welcome to the Pediatric Sepsis Data Challenge! This challenge focuses on predicting in-hospital mortality for pediatric sepsis cases using a synthetic dataset derived from real-world data. The ultimate goal is to improve early detection models for better resource allocation and clinical outcomes in low-resource healthcare settings.
Develop an open-source algorithm to predict in-hospital mortality among children with sepsis. This algorithm should be trained solely on the provided dataset, using any or all variables available within.
- Data and Code Requirements
- Submission Guidelines and Limits
- Submission Instructions for Your Code
- Testing and Evaluation Criteria
- Final Instructions
- Provided Dataset: Synthetic data derived from real hospital data from Uganda, use file SyntheticData_Training.csv as training data, and SyntheticData_DataDictionary_V1.docx as data dictionary.
- Feature Constraints: Your algorithm should exclusively use the provided dataset variables for predictions.
- Code and Model: Submit both:
- Training Code: All scripts and code required for training the model.
- Trained Model: The model file generated from your code.
- Language: Submissions must be in Python; however, R, and MATLAB submissions are no longer acceptable. Python is recommended to facilitate baseline comparisons.
- Environment: Code will run in a containerized setup.
- Execution Time: Maximum 24 hours for training, with 8 hours allocated for validation and testing.
- Autonomous Execution: Ensure your code can execute end-to-end without manual intervention.
- Dependencies: List all dependencies in
requirements.txt
or a compatible environment configuration file.
- Dependencies: List all dependencies in
- Each team may submit code up to 3 times throughout the challenge.
- Each submission will be assessed on a hidden evaluation set to ensure unbiased scoring.
- Only the final model from each training phase will be evaluated for the official score.
- Teams are expected to maintain their code in private repositories during the challenge to ensure fairness.
Upon completion, all final solutions must be shared publicly (e.g., GitHub) to promote reproducibility and transparency.
Public Release Requirements:
- Complete source code and trained models.
- Detailed README file with instructions for replication.
- An open-source license (e.g., MIT, BSD) specifying usage and redistribution rights.
Use the provided Python example code as a starting point. Clone or download this repository, replace the example code with your implementation, and push or upload the updated files to your repository. Share your repository with the aditya1000 & PediatricSepsisDataChallenge2024 user. Submit your entry using this submission form.
- Update the
Dockerfile
to specify the version of Python you are using locally. - Add any additional packages required for your code.
- Important: Do not rename or relocate the
Dockerfile
. Its structure must remain intact, especially the three lines marked as "DO NOT EDIT." These lines are critical for our submission system.
- Add all Python packages required by your code.
- Specify the exact versions of these packages to match your local environment.
- Remove any unnecessary packages that your code does not depend on.
- Update the following files as needed:
AUTHORS.txt
: Include the names of all contributors.LICENSE.txt
: Specify your license terms.README.md
: Provide relevant information about your code.
- Note: Our submission system does not use the README file to determine how to execute your code.
team_code.py
: Modify this script to load and run your trained model(s).train_model.py
: Do not modify this script. It calls functions inteam_code.py
to train your model using the training data.helper_code.py
: Do not modify this script. It provides helper functions for your code. Feel free to use these functions, but note that any changes made to this file will not be included when we run your code.run_model.py
: Do not modify this script. It calls functions inteam_code.py
to load and run your trained models on the test data. Any changes to this file will not be reflected in our execution environment.
- You can develop and test your code without using Docker. However, before submission, ensure that you can:
- Build a Docker image from your
Dockerfile
. - Successfully run your code within a Docker container.
- Build a Docker image from your
- Push or upload your updated code to the root directory of the
master
branch in your repository. - Ensure the repository contains all necessary files and updates as described above.
Once submitted, we will:
- Download your repository.
- Build a Docker image using your
Dockerfile
. - Execute your code in our local or cloud environment.
Your model will be evaluated on the following metrics:
- Area Under the ROC Curve (AUC-ROC): A secondary metric to measure general performance across thresholds.
- AUPRC: Focuses on precision and recall, especially useful for imbalanced datasets.
- Net Benefit: Balances true positives and false positives to measure decision-making utility.
- Estimated Calibration Error (ECE): Assesses how well predicted probabilities align with actual outcomes.
To get your leaderboard score on test data use evaluate_2024.py file after reading the respective README.md from evaluation-2024 folder of this repository.
- Ensure all components of your submission run autonomously from start to finish in a cloud-based container.
- Scores will be updated on the leaderboard based on the best score achieved.
- Ensure that your final submission is properly documented and made available publicly after the completion of the competition.
We are excited to see your innovative solutions aimed at improving pediatric sepsis outcomes in resource-constrained settings!