Skip to content

AdityaLab/Time-MMD

Repository files navigation

Time-MMD: A New Multi-Domain Multimodal Dataset for Time Series Analysis

🚩News (2024.08) Our library MM-TSFlib for multimodal time series forecasting based on Time-MMD has been significantly enhanced, featuring more comprehensive functionality, clearer code, and improved documentation.

🚩News (2024.08) Time-MMD is now supporting the imputation task. Please check Downstream_Tasks/Imputation for use case.

🚩News (2024.08) The dataset description has been enhanced.

Introduction

Time-MMD is the first multi-domain, multimodal time series dataset covering 9 primary data domains. We ensures fine-grained modality alignment with text-numerical series, eliminates data contamination, and provides high usability. (Check our paper for more details!)

Dataset Overview

Time-MMD consists of 1) numerical sequences 2) textual sequences. Binary timestamps (start, end) are occupied which enables the adapatation onto various tasks or demands.

The structure of this repo is:

- Readme.MD
- numerical
    - Agriculture
        - Agriculture.csv
    - (Domain Name)
        - (Domain Name).csv
    ...
- textual
    - (Domain Name)
        - (Domain Name)_report.csv
        - (Domain Name)_search.csv
-- Downstream_Tasks
    - ShortTerm Forecasting
    - LongTerm Forecasting
    - Imputation
    - Anomaly Detection

Here, Downstream_Tasks is used to introduce how Time-MMD supports different downstream tasks. For Short-Term and Long-Term Forecasting, please check our library MM-TSFlib for detailed usage examples. we denote to support more tasks and domains in the future. Please feel free to let us know your demands.

Numerical Data

Numerical data of each domain contains a csv file with has the following format:

start_date, end_date, OT, (other variable 1), (other variable 2), ...

Here, OT represents the default target variable for prediction in each dataset. Its specific meaning is as follows:

For specific data sources, please refer to Appendix C of our paper.

Textual Data

Textual data of each domain contains two csv file, one for report data and another for search data. All data are in a unified format:

start_date, end_date, fact, pred

Visualization of relevant report (a, left) and search (b, right) counts in Time-MMD over time is as follows:

For specific data sources, please refer to Appendix C of our paper.

Use Case

For the multi-modal time-series forecasting task based on the Time-MMD dataset, you may check our library MM-TSFlib for detailed usage examples.

Citation

If you find this repo useful, please cite our paper.

@misc{liu2024timemmd,
      title={Time-MMD: A New Multi-Domain Multimodal Dataset for Time Series Analysis}, 
      author={Haoxin Liu and Shangqing Xu and Zhiyuan Zhao and Lingkai Kong and Harshavardhan Kamarthi and Aditya B. Sasanur and Megha Sharma and Jiaming Cui and Qingsong Wen and Chao Zhang and B. Aditya Prakash},
      year={2024},
      eprint={2406.08627},
      archivePrefix={arXiv},
      primaryClass={id='cs.LG' full_name='Machine Learning' is_active=True alt_name=None in_archive='cs' is_general=False description='Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.'}
}

Contact

If you have any questions or suggestions, feel free to contact: [email protected]