Python example code for the George B. Moody PhysioNet Challenge 2025

What's in this repository?

This repository contains a simple example that illustrates how to format a Python entry for the George B. Moody PhysioNet Challenge 2025. If you are participating in the 2025 Challenge, then we recommend using this repository as a template for your entry. You can remove some of the code, reuse other code, and add new code to create your entry. You do not need to use the models, features, and/or libraries in this example for your entry. We encourage a diversity of approaches to the Challenges.

For this example, we implemented a random forest model with several simple features. (This simple example is not designed to perform well, so you should not use it as a baseline for your approach's performance.) You can try it by running the following commands on the Challenge training set. If you are using a relatively recent personal computer, then you should be able to run these commands from start to finish on a small subset (1000 records) of the training data in a few minutes or less.

How do I run these scripts?

First, you can download and create data for these scripts by following the instructions in the following section.

Second, you can install the dependencies for these scripts by creating a Docker image (see below) or virtual environment and running

pip install -r requirements.txt

You can train your model by running

python train_model.py -d training_data -m model

where

training_data (input; required) is a folder with the training data files, which must include the labels; and
model (output; required) is a folder for saving your model.

You can run your trained model by running

python run_model.py -d holdout_data -m model -o holdout_outputs

where

holdout_data (input; required) is a folder with the holdout data files, which will not necessarily include the labels;
model (input; required) is a folder for loading your model; and
holdout_outputs (output; required) is a folder for saving your model outputs.

The Challenge website provides a training database with a description of the contents and structure of the data files.

You can evaluate your model by pulling or downloading the evaluation code and running

python evaluate_model.py -d holdout_data -o holdout_outputs -s scores.csv

where

holdout_data(input; required) is a folder with labels for the holdout data files, which must include the labels;
holdout_outputs (input; required) is a folder containing files with your model's outputs for the data; and
scores.csv (output; optional) is file with a collection of scores for your model.

You can use the provided training set for the training_data and holdout_data files, but we will use different datasets for the validation and test sets, and we will not provide the labels to your code.

How do I create data for these scripts?

You can use the scripts in this repository to convert the CODE-15% dataset, the SaMi-Trop dataset, and the PTB-XL dataset to WFDB format.

Please see the data section of the website for more information about the Challenge data.

CODE-15% dataset

These instructions use code15_input as the path for the input data files and code15_output for the output data files, but you can replace them with the absolute or relative paths for the files on your machine.

Download and unzip one or more of the exam_part files and the exams.csv file in the CODE-15% dataset.
Download and unzip the Chagas labels, i.e., the code15_chagas_labels.csv file.

Convert the CODE-15% dataset to WFDB format, with the available demographics information and Chagas labels in the WFDB header file, by running

 python prepare_code15_data.py \
     -i code15_input/exams_part0.hdf5 code15_input/exams_part1.hdf5 \
     -d code15_input/exams.csv \
     -l code15_input/code15_chagas_labels.csv \
     -o code15_output/exams_part0 code15_output/exams_part1

Each exam_part file in the CODE-15% dataset contains approximately 20,000 ECG recordings. You can include more or fewer of these files to increase or decrease the number of ECG recordings, respectively. You may want to start with fewer ECG recordings to debug your code.

SaMi-Trop dataset

These instructions use samitrop_input as the path for the input data files and samitrop_output for the output data files, but you can replace them with the absolute or relative paths for the files on your machine.

Download and unzip exams.zip file and the exams.csv file in the SaMi-Trop dataset.
Convert the SaMi-Trop dataset to WFDB format, with the available demographics information and Chagas labels in the WFDB header file, by running
```
 python prepare_samitrop_data.py \
     -i samitrop_input/exams.hdf5 \
     -d samitrop_input/exams.csv \
     -o samitrop_output
```

PTB-XL dataset

These instructions use ptbxl_input as the path for the input data files and ptbxl_output for the output data files, but you can replace them with the absolute or relative paths for the files on your machine. We are using the records500 folder, which has a 500Hz sampling frequency, but you can also try the records100 folder, which has a 100Hz sampling frequency.

Download and, if necessary, unzip the PTB-XL dataset.

Update the WFDB files with the available demographics information and Chagas labels by running

 python prepare_ptbxl_data.py \
     -i ptbxl_input/records500/ \
     -d ptbxl_input/ptbxl_database.csv \
     -o ptbxl_output

Which scripts I can edit?

Please edit the following script to add your code:

team_code.py is a script with functions for training and running your trained model.

Please do not edit the following scripts. We will use the unedited versions of these scripts when running your code:

train_model.py is a script for training your model.
run_model.py is a script for running your trained model.
helper_code.py is a script with helper functions that we used for our code. You are welcome to use them in your code.

These scripts must remain in the root path of your repository, but you can put other scripts and other files elsewhere in your repository.

How do I train, save, load, and run my model?

To train and save your model, please edit the train_model function in the team_code.py script. Please do not edit the input or output arguments of this function.

To load and run your trained model, please edit the load_model and run_model functions in the team_code.py script. Please do not edit the input or output arguments of these functions.

How do I run these scripts in Docker?

Docker and similar platforms allow you to containerize and package your code with specific dependencies so that your code can be reliably run in other computational environments.

To increase the likelihood that we can run your code, please install Docker, build a Docker image from your code, and run it on the training data. To quickly check your code for bugs, you may want to run it on a small subset of the training data, such as 1000 records.

If you have trouble running your code, then please try the follow steps to run the example code.

Create a folder example in your home directory with several subfolders.

 user@computer:~$ cd ~/
 user@computer:~$ mkdir example
 user@computer:~$ cd example
 user@computer:~/example$ mkdir training_data holdout_data model holdout_outputs

Download the training data from the Challenge website. Put some of the training data in training_data and holdout_data. You can use some of the training data to check your code (and you should perform cross-validation on the training data to evaluate your algorithm).

Download or clone this repository in your terminal.

 user@computer:~/example$ git clone https://github.com/physionetchallenges/python-example-2025.git

Build a Docker image and run the example code in your terminal.

 user@computer:~/example$ ls
 holdout_data  holdout_outputs  model  python-example-2025  training_data

 user@computer:~/example$ cd python-example-2025/

 user@computer:~/example/python-example-2025$ docker build -t image .

 Sending build context to Docker daemon  [...]kB
 [...]
 Successfully tagged image:latest

 user@computer:~/example/python-example-2025$ docker run -it -v ~/example/model:/challenge/model -v ~/example/holdout_data:/challenge/holdout_data -v ~/example/holdout_outputs:/challenge/holdout_outputs -v ~/example/training_data:/challenge/training_data image bash

 root@[...]:/challenge# ls
     Dockerfile             holdout_outputs        run_model.py
     evaluate_model.py      LICENSE                training_data
     helper_code.py         README.md      
     holdout_data           requirements.txt

 root@[...]:/challenge# python train_model.py -d training_data -m model -v

 root@[...]:/challenge# python run_model.py -d holdout_data -m model -o holdout_outputs -v

 root@[...]:/challenge# python evaluate_model.py -d holdout_data -o holdout_outputs
 [...]

 root@[...]:/challenge# exit
 Exit

What else do I need?

This repository does not include code for evaluating your entry. Please see the evaluation code repository for code and instructions for evaluating your entry using the Challenge scoring metric.

How do I learn more? How do I share more?

Please see the Challenge website for more details. Please post questions and concerns on the Challenge discussion forum. Please do not make pull requests, which may share information about your approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python example code for the George B. Moody PhysioNet Challenge 2025

What's in this repository?

How do I run these scripts?

How do I create data for these scripts?

CODE-15% dataset

SaMi-Trop dataset

PTB-XL dataset

Which scripts I can edit?

How do I train, save, load, and run my model?

How do I run these scripts in Docker?

What else do I need?

How do I learn more? How do I share more?

Useful links

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
evaluate_model.py		evaluate_model.py
helper_code.py		helper_code.py
prepare_code15_data.py		prepare_code15_data.py
prepare_ptbxl_data.py		prepare_ptbxl_data.py
prepare_samitrop_data.py		prepare_samitrop_data.py
requirements.txt		requirements.txt
run_model.py		run_model.py
team_code.py		team_code.py
train_model.py		train_model.py

License

physionetchallenges/python-example-2025

Folders and files

Latest commit

History

Repository files navigation

Python example code for the George B. Moody PhysioNet Challenge 2025

What's in this repository?

How do I run these scripts?

How do I create data for these scripts?

CODE-15% dataset

SaMi-Trop dataset

PTB-XL dataset

Which scripts I can edit?

How do I train, save, load, and run my model?

How do I run these scripts in Docker?

What else do I need?

How do I learn more? How do I share more?

Useful links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages