Self-supervised learning for personal sensing in mood disorders

This codebase was developed by Filippo Corponi and Bryan M. Li. It is part of the paper ''Wearable Data From Subjects Playing Super Mario, Taking University Exams, or Performing Physical Exercise Help Detect Acute Mood Disorder Episodes via Self-Supervised Learning: Prospective, Exploratory, Observational Study'', published in JMIR mHealth and uHealth. If you find this code or any of the ideas in the paper useful, please consider citing:

@article{corponi2024wearable,
    author="Corponi, Filippo and Li, Bryan M and Anmella, Gerard and Valenzuela-Pascual, Cl{\`a}udia and Mas, Ariadna and Pacchiarotti, Isabella and Valent{\'i}, Marc and Grande, Iria and Benabarre, Antoni and Garriga, Marina and Vieta, Eduard and Young, Allan H and Lawrie, Stephen M and Whalley, Heather C and Hidalgo-Mazzei, Diego and Vergari, Antonio",
    title="Wearable Data From Subjects Playing Super Mario, Taking University Exams, or Performing Physical Exercise Help Detect Acute Mood Disorder Episodes via Self-Supervised Learning: Prospective, Exploratory, Observational Study",
    journal="JMIR mHealth and uHealth",
    year="2024",
    month="Jul",
    day="17",
    volume="12",
    pages="e55094",
    issn="2291-5222",
    doi="10.2196/55094",
    url="https://mhealth.jmir.org/2024/1/e55094",
}

Setup

Software development environment setup

Create a new conda environment with Python 3.10.
```
conda create -n ssl python=3.10
```
Activate ssl virtual environment
```
conda activate ssl
```
Install all dependencies and packages with setup.sh script.
```
sh setup.sh
```

Data Pre-processing

data/README.md details the structure of the dataset.

On-/off-body & sleep/wake detection

As HR starts being recorded with a 10-second lag with respect to other channels, the first 10 seconds are dropped from channels other than HR. While channels should all stop at the same time, as a failsafe, channels are cropped to the shortest channel duration.

On-/off-body detection

We considered measurements smaller than 0.05 μS as indicative of off-body status. Furthermore, as we noticed occurrences of values greater than the EDA sensor range (i.e., 100 μS), as well as instances of TEMP values outside the physiological range (30-40°C), we set both to off-body. On-body sequences need to last more than a given number of minutes (specified with --wear_minimum_minutes) otherwise they are set to off-body.

Sleep/wake detection

--sleep_algorithm specifies which one of van Hees et al. 2015 and Scripps algorithms to use. Note that these require a minimum of on-body recording time to operate. A mask is returned where wake = 0, sleep = 1, off-body = 2.

python preprocess_ds.py --output_dir data/preprocessed/unsegmented --overwrite

Please see --help for all available options. data/preprocessed/unsegmented contains preprocessed recordings. Specifically, each folder maps to a preprocessed recoding. channels.h5 is a dictionary storing processed E4 channels, i.e. ACC, BVP, EDA, HR, IBI, TEMP as well as the masks computed during the preprocessing.

Segmentation

Segmentation is carried out on sleep/wake sequences independently. Sleep/wake status for a given segment is saved as part of that segment label. Segmentation returns recording segments, whose length is set with --segment_length, containg the following channels: ACC_x, ACC_y, ACC_z, BVP, EDA, TEMP. HR and IBI, since they are both derived from BVP, are not used in deep-learning models. The optional flag --flirt adds an extra channel to the segments named FLIRT, which contains acc, eda, and hrv features extracted on the segment, with feature generation toolkit FLIRT. As FLIRT does not provide any built-in feature extractor for temperature, we extracted the average and standard deviation across the segment. Only a single row of features is derived per segment (in other words, window_length for FLIRT is set equal to segment_length). FLIRT features are only computed on labelled segments, that is segments from recordings collected and annotated at Hospital Clìnic, Barcelona.

python segment.py --output_dir data/preprocessed/sl512_ss128 --segmentation_mode 1 --segment_length 512 --step_size 128 --overwrite

Please see --help for all available options.

Exacerbation vs Euthymia detection

The target task is time-series (binary) classification, specifically identifying whether a recording segment was taken from a subject experiencing an acute mood disorder episode of any polarity (depression, mania, mixed features) or from someone with an historical mood disorder diagnosis but clinically stable at the time of recording (a condition referred to as euthymia in psychiatric parlance).

Fully-supervised learning

Classical Machine Learning (XGboost)

python train_cml.py --dataset data/preprocessed/sl512_ss128 --output_dir runs/sl_xgboost_test --clear_output_dir

Deep-learning

python train_ann.py --task_mode 3 --dataset data/preprocessed/sl512_ss128 runs/sl_ann_test