Extended research of 'Toward a Foundation Model for Time Series Data, CIKM'23'
The code has only been tested with the environment list below:
- python=3.8.13
- numpy=1.19.5
- torch=1.10.2+cu102
- tslearn=0.5.2
- scipy=1.6.2
- numba
- sktime
Please follow the steps below to reproduce the experiments.
- download the UCR archive from https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ and unzip the dataset
- change line 16 in
./config/script/script_config_0.py
to the path of the directory that contains the UCR archive - change the current folder to
./config/script/
- run
python script_config_0.py
to generate all the config files - change the current folder back to
.
- run the following commands to fine-tune/test models:
python script_ucr_nn_0.py --method_name trf_tc_c_0000 --is_freeze false --aggregation_mode class_token
for transformer + TimeCLR (the proposed method) with class tokenpython script_ucr_nn_0.py --method_name trf_tc_c_0000 --is_freeze false --aggregation_mode flatten
for transformer + TimeCLR (the proposed method) with flattenpython script_ucr_nn_0.py --method_name trf_tc_c_0000 --is_freeze false --aggregation_mode pooling --pooling_mode gt
for transformer + TimeCLR (the proposed method) with global tokenpython script_ucr_nn_0.py --method_name trf_tc_c_0000 --is_freeze false --aggregation_mode pooling --pooling_mode st
for transformer + TimeCLR (the proposed method) with segment tokenpython script_ucr_nn_0.py --method_name trf_tc_c_0000 --is_freeze true --aggregation_mode class_token
for transformer + TimeCLR (the proposed method) with class tokenpython script_ucr_nn_0.py --method_name trf_tc_c_0000 --is_freeze true --aggregation_mode flatten
for transformer + TimeCLR (the proposed method) with flattenpython script_ucr_nn_0.py --method_name trf_tc_c_0000 --is_freeze true --aggregation_mode pooling --pooling_mode gt
for transformer + TimeCLR (the proposed method) with global tokenpython script_ucr_nn_0.py --method_name trf_tc_c_0000 --is_freeze true --aggregation_mode pooling --pooling_mode st
for transformer + TimeCLR (the proposed method) with segment token
- run
python script_result_0.py
to get the experiment results.
Time-series classification on foundation model
- Good performance in pre-trained domain, but model has difficulties in fine-tuning and overfitting
- Due to lack of time-series data
- Missing the temporal information and local pattern in classification
Time-series data aggregation with temporal information
- ViT(Class token, Representation vector)
- Pooling(Static temporal pooling, Dynamic temporal pooling) For high performance without large computations, the pre-trained backbone is frozen and fine-tuning is only applied to the downstream task
-
Data
- UCR Time series Classification Archive
- Labelled time-series data for a variety of sources including medical, financial, biological, industrial, and environmental
-
Model Architecture
- Foundation model are composed with backbone and head
- Transformer setting is fixed positional encoding, 4 encoder layer, 64 input size, output size and pre-trained by TimeCLR, which is contrastive learning pre-training method extends SimCLR.
- Previous research use class token and ViT based time-series foundation model, which is trained when pre-training process
- Output network, projector, classifier also trained on pre-training process
-
Freeze Backbone
- Use transformer pre-trained by TimeCLR
- Already pre-trained and time-series is aggregated with class token, therefore, experiments setting has advantage for class token
Dynamic temporal pooling is not optimal conditinos, which will have lower gamma value and segments number
- In the freeze case, the flatten method was overfitting even though it had the most parameters, and this was improved by pooling
- Considering temporal information, class token and Static temporal pooling performed the best, which means these aggregate local pattern in time-series well
- As expected, removing the head and attaching the head to the downstream task yielded better results, and class token alone had more limitations, such as overfitting
- pre-trained head is not useful in downstream task because accuracy of separate pre-train head is more improved
- Experiment with other data sets that were not used in the pre-train to further study multi-domain and zero-shot reference
- Research aggregation techniques that better detect temporal information