Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reproduce the results as shown in the paper? #102

Closed
liu-jc opened this issue Jun 9, 2024 · 14 comments
Closed

How to reproduce the results as shown in the paper? #102

liu-jc opened this issue Jun 9, 2024 · 14 comments
Labels
question Further information is requested

Comments

@liu-jc
Copy link

liu-jc commented Jun 9, 2024

Hi chronos team,

Thanks for the great work! I would like to know how we can reproduce the results as shown in the paper, e.g., Figure 4. Also could we have some evaluation scripts/code to facilitate the model evaluation?

I am aware that some code snippets are provided at #75. But as mentioned "While many datasets in GluonTS have the same name as the ones used in the paper, they may be different from the evaluation in the paper in crucial aspects such as prediction length and number of rolls.", I wonder if we can have scripts to help us reproduce the results.

@abdulfatir abdulfatir added the question Further information is requested label Jun 9, 2024
@abdulfatir
Copy link
Contributor

@liu-jc We are working towards releasing the evaluation datasets. Once we have that, I will inform you. Please keep an eye out for the update.

@liu-jc
Copy link
Author

liu-jc commented Jun 15, 2024

Hi @abdulfatir, thanks for the reply! I also noticed that in the README, you mentioned "Fixed an off-by-one error in bin indices in the output_transform". Does that mean if we are using the checkpoint on Huggingface, it's the version before fixing this bug?

@abdulfatir
Copy link
Contributor

@liu-jc The issue was not in the model checkpoints themselves but in the inference code. The decodes values were shifted by one which led to some avoidable discrepancy.

@liu-jc
Copy link
Author

liu-jc commented Jun 16, 2024

@abdulfatir thanks for the answer. So, may I confirm that if we are using the latest code for inference, it should not have any problems?

@abdulfatir
Copy link
Contributor

@liu-jc Yes.

@abdulfatir
Copy link
Contributor

abdulfatir commented Jun 27, 2024

Update: We have just open-sourced the datasets used in the paper (thanks @shchur!). Please check the updated README. We have also released an evaluation script and backtest configs to compute the WQL and MASE numbers as reported in the paper. Please follow the instructions in this README to evaluate on the in-domain and zero-shot benchmarks.

@liu-jc
Copy link
Author

liu-jc commented Jun 30, 2024

Hi @abdulfatir,

Thanks for the effort in releasing datasets and evaluation scripts! Those are tremendously helpful for the community. A few more questions I would like to ask:

  1. When we use synthetic data, which frequency should we set here?
    FileDataset(path=Path(data_path), freq="h"),
    Or the frequency set here would not affect for training?
  2. For the datasets, do you have any plan to release the splits and also the script to conduct TSMixup for training chronos? In my view, the real training data is not equivalent to the datasets you provided on huggingface. We're interested in how we can get the real training data if it's possible to be shared.

@abdulfatir
Copy link
Contributor

  1. freq has no meaning for Chronos, as it is not used by the model in any way.
  2. We have added the exact training corpus used in the paper here.

@liu-jc
Copy link
Author

liu-jc commented Jul 8, 2024

Hi @abdulfatir,

Thanks for the clarification. That would be super helpful for having the training corpus. I wonder if you have a plan to provide some code snippets/guide to use datasets.load_dataset to train the model. The current train script uses gluonts FileDataset. I am thinking about directly changing all training_data_paths to the ones for TSMixup & KernelSynth in the yaml file and replace FileDataset with datasets.load_dataset. I'm not sure if it will have some problems. Could you provide any suggestions here?

@abdulfatir
Copy link
Contributor

@liu-jc We have no such plan at the moment due to other priorities. My guess is that you should be able to use load_dataset after some tweaking but I haven't really tried it, so cannot say for sure how it should be done. The most straightforward way would be to convert the huggingface dataset into GluonTS format. You can modify this snippet to do that:

from pathlib import Path
from typing import List, Optional, Union

import numpy as np
from gluonts.dataset.arrow import ArrowWriter


def convert_to_arrow(
    path: Union[str, Path],
    time_series: Union[List[np.ndarray], np.ndarray],
    start_times: Optional[Union[List[np.datetime64], np.ndarray]] = None,
    compression: str = "lz4",
):
    if start_times is None:
        # Set an arbitrary start time
        start_times = [np.datetime64("2000-01-01 00:00", "s")] * len(time_series)

    assert len(time_series) == len(start_times)

    dataset = [
        {"start": start, "target": ts} for ts, start in zip(time_series, start_times)
    ]
    ArrowWriter(compression=compression).write_to_file(
        dataset,
        path=path,
    )


if __name__ == "__main__":
    # Generate 20 random time series of length 1024
    time_series = [np.random.randn(1024) for i in range(20)]

    # Convert to GluonTS arrow format
    convert_to_arrow("./noise-data.arrow", time_series=time_series)

@liu-jc
Copy link
Author

liu-jc commented Jul 9, 2024

Hi @abdulfatir,

Thanks for pointing out this snippet. I asked that question because previously I tried a bit to directly replace FileDataset with datasets.load_dataset, but I found that I would incur some problems though they are both supposed to work for arrow format. Then, I found out GluonTS arrow format does not seem the same as Huggingface format with arrow. I think the provided snippet should do the work for this.

Another question: if we want to stick with GluonTS FileDataset, is there a way that we can do it in a streaming way for large datasets? (Similar to streaming=True for datasets.load_dataset). The reason I ask is that I found the RAM consumption is much larger with FileDataset. For some large datasets, we even faced OOM problems.

@abdulfatir
Copy link
Contributor

ArrowWriter(compression=compression).write_to_file can take a generator as input. It doesn't have to be a materialized iterable like a list. You can create a function that yields one item at a time from the HF dataset and pass this generator into write_to_file.

@abdulfatir
Copy link
Contributor

In general though, I would recommend a machine with large RAM for pretraining.

@lostella
Copy link
Contributor

Closing in favor of #150

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants