This codebase was developed by Filippo Corponi and Bryan M. Li. It is part of the paper ''Wearable Data From Subjects Playing Super Mario, Taking University Exams, or Performing Physical Exercise Help Detect Acute Mood Disorder Episodes via Self-Supervised Learning: Prospective, Exploratory, Observational Study'', published in JMIR mHealth and uHealth. If you find this code or any of the ideas in the paper useful, please consider citing:
@article{corponi2024wearable,
author="Corponi, Filippo and Li, Bryan M and Anmella, Gerard and Valenzuela-Pascual, Cl{\`a}udia and Mas, Ariadna and Pacchiarotti, Isabella and Valent{\'i}, Marc and Grande, Iria and Benabarre, Antoni and Garriga, Marina and Vieta, Eduard and Young, Allan H and Lawrie, Stephen M and Whalley, Heather C and Hidalgo-Mazzei, Diego and Vergari, Antonio",
title="Wearable Data From Subjects Playing Super Mario, Taking University Exams, or Performing Physical Exercise Help Detect Acute Mood Disorder Episodes via Self-Supervised Learning: Prospective, Exploratory, Observational Study",
journal="JMIR mHealth and uHealth",
year="2024",
month="Jul",
day="17",
volume="12",
pages="e55094",
issn="2291-5222",
doi="10.2196/55094",
url="https://mhealth.jmir.org/2024/1/e55094",
}
Software development environment setup
- Create a new conda environment with Python 3.10.
conda create -n ssl python=3.10
- Activate
ssl
virtual environmentconda activate ssl
- Install all dependencies and packages with
setup.sh
script.sh setup.sh
data/README.md details the structure of the dataset.
As HR starts being recorded with a 10-second lag with respect to other channels, the first 10 seconds are dropped from channels other than HR. While channels should all stop at the same time, as a failsafe, channels are cropped to the shortest channel duration.
We considered measurements smaller than 0.05 μS as indicative of off-body status. Furthermore, as we noticed occurrences of values greater than the EDA sensor range (i.e., 100 μS), as well as instances of TEMP values outside the physiological range (30-40°C), we set both to off-body. On-body sequences need to last more than a given number of minutes (specified with --wear_minimum_minutes
) otherwise they are set to off-body.
--sleep_algorithm
specifies which one of van Hees et al. 2015 and Scripps algorithms to use. Note that these require a minimum of on-body recording time to operate. A mask is returned where wake = 0, sleep = 1, off-body = 2.
python preprocess_ds.py --output_dir data/preprocessed/unsegmented --overwrite
Please see --help
for all available options. data/preprocessed/unsegmented
contains preprocessed recordings. Specifically, each folder maps to a preprocessed recoding. channels.h5
is a dictionary storing processed E4 channels, i.e. ACC
, BVP
, EDA
, HR
, IBI
, TEMP
as well as the masks computed during the preprocessing.
Segmentation is carried out on sleep/wake sequences independently. Sleep/wake status for a given segment is saved as part of that segment label. Segmentation returns recording segments, whose length is set with --segment_length
, containg the following channels: ACC_x
, ACC_y
, ACC_z
, BVP
, EDA
, TEMP
. HR
and IBI
, since they are both derived from BVP
, are not used in deep-learning models. The optional flag --flirt
adds an extra channel to the segments named FLIRT
, which contains acc, eda, and hrv features extracted on the segment, with feature generation toolkit FLIRT. As FLIRT
does not provide any built-in feature extractor for temperature, we extracted the average and standard deviation across the segment. Only a single row of features is derived per segment (in other words, window_length for FLIRT is set equal to segment_length). FLIRT features are only computed on labelled segments, that is segments from recordings collected and annotated at Hospital Clìnic, Barcelona.
python segment.py --output_dir data/preprocessed/sl512_ss128 --segmentation_mode 1 --segment_length 512 --step_size 128 --overwrite
Please see --help
for all available options.
The target task is time-series (binary) classification, specifically identifying whether a recording segment was taken from a subject experiencing an acute mood disorder episode of any polarity (depression, mania, mixed features) or from someone with an historical mood disorder diagnosis but clinically stable at the time of recording (a condition referred to as euthymia in psychiatric parlance).
python train_cml.py --dataset data/preprocessed/sl512_ss128 --output_dir runs/sl_xgboost_test --clear_output_dir
python train_ann.py --task_mode 3 --dataset data/preprocessed/sl512_ss128 runs/sl_ann_test
python pre_train.py --pretext_task masked_prediction --dataset data/preprocessed/sl512_ss128 --output_dir runs/masked_prediction_test
The --unlabelled_data_resampling_percentage
and --filter_collections
flags are used for ablations analyses.
python train_ann.py --task_mode 1 --path2pretraining_res runs/masked_prediction_test --output_dir runs/masked_prediction_fine_tuning_test
Please see --help
for all available options.
The E4SelfLearning collection is available to download from: huggingface.co/datasets/FcmC/E4SelfLearning. We herewith acknowledge and list the open-access datasets recording with an Empatica E4 comprising the E4SelfLearning collection:
- ADARP by Sah et al. 2022
- BID IDEAS Lab by Bent et al. 2021
- In-Gauge En-Gage by Gao et al. 2022
- Nurses Stress Detection by Hosseini et al. 2022
- PPG-DaLiA by Reiss et al. 2019
- SPS by Iqbal et al. 2022
- Toadstool by Svoren et al. 2020
- UE4W by Hinkle et al. 2022
- WEEE by Gashi et al. 2022
- WESAD by Schmidt et al. 2018
- WESD by Amin et al. 2022