Add AutoregressionTask #2639

keves1 · 2025-03-12T23:01:42Z

This PR adds the following:

Autoregression Trainer (see Time Series Support #2382)
LSTM Sequence-to-Sequence (Seq2Seq) model for time series forecasting
Air Quality time series data set from the UCI Machine Learning Repository

Together these additions enable training air quality forecasting models as well as other autoregression tasks as additional datasets and models are added.

…tamodule.

keves1 · 2025-03-13T03:01:08Z

@adamjstewart See below, this test passes when I run it locally but it is failing here. I don't understand this error message, do you know how to resolve this?

FAILED tests/trainers/test_autoregression.py::TestAutoregressionTask::test_trainer[True-air_quality] - ValueError: Does not validate against any of the Union subtypes
Subtypes: [<class 'NoneType'>, <class 'torchgeo.trainers.base.BaseTask'>]
Errors:
  - Expected a <class 'NoneType'>
  - Unexpected keyword arguments: `num_outputs`
Given value type: <class 'jsonargparse._namespace.Namespace'>
Given value: Namespace(class_path='torchgeo.trainers.AutoregressionTask', init_args=Namespace(model='lstm_seq2seq', input_size=3, input_size_decoder=1, hidden_size=1, output_size=1, target_indices=[2], encoder_indices=[2, 12, 13], decoder_indices=[2], timesteps_ahead=1, num_layers=1, loss='mse', lr=0.001, patience=10, teacher_force_prob=None))

adamjstewart

Looks like a great start! For datasets, I'm wondering if we also want to consider spatiotemporal datasets like the one you found and whether we should add GeoDatasets. For models, I'm wondering if we should also add a GNN since that's commonly used for non-gridded data. Some kind of model that combines both and can make spatiotemporal predictions would be cool. But none of those are required for a first pass. Let's first clean this up and get it merged.

I don't know why the tests are failing, but it's only the minimum version tests, so worst case scenario we can bump the minimum supported versions. We're already doing that when we drop Python 3.10 support anyway, so I would ignore them for now and focus on everything else.

adamjstewart · 2025-03-13T13:22:04Z

docs/api/datasets/non_geo_datasets.csv

@@ -1,5 +1,6 @@
 Dataset,Task,Source,License,# Samples,# Classes,Size (px),Resolution (m),Bands
 `ADVANCE`_,C,"Google Earth, Freesound","CC-BY-4.0","5,075",13,512x512,0.5,RGB
+`Air Quality`_,"R,T","UCI Machine Learning Repository","CC-BY-4.0","9,358",,,,


Confirmed the license from https://archive.ics.uci.edu/dataset/360/air+quality

tests/datasets/test_air_quality.py

adamjstewart · 2025-03-13T13:24:50Z

torchgeo/datamodules/air_quality.py

+        self.val_dataset = Subset(dataset, val_indices)
+        self.test_dataset = Subset(dataset, test_indices)
+
+    def on_after_batch_transfer(


Why is this needed? Is there a problem with the base class implementation?

I don't think we want to be applying a Kornia augmentation (self.aug) to this data since it is not an image, so this was my way of avoiding that. I think your suggestion for a no-op data augmentation would make this override unnecessary.

adamjstewart · 2025-03-13T13:26:14Z

torchgeo/datamodules/air_quality.py

+        """
+        super().__init__(AirQuality, batch_size, num_workers, **kwargs)
+        self.val_split_pct = val_split_pct
+        self.test_split_pct = test_split_pct


Should we add some kind of no-op data augmentation to override the base class?

That would be great, is there something like that in Kornia? I didn't see anything when I looked through the docs. Or is there another way to do it?

No not really. We would have to write our own, or use one that won't apply like normalization.

torchgeo/datasets/air_quality.py

adamjstewart · 2025-03-13T13:44:17Z

torchgeo/models/seq2seq.py

+            output_sequence_len: The number of steps to predict forward. Defaults to 1.
+            num_layers: Number of LSTM layers in the encoder and decoder. Defaults to 1.
+            teacher_force_prob: Probability of using teacher forcing. If None, does not
+                use teacher forcing. Defaults to None.


Instead of documenting and passing all of these arguments one at a time, maybe we could just have an extra **kwargs argument that gets passed to the LSTM class?

That sounds great, I'll work on that.

adamjstewart · 2025-03-13T13:44:42Z

torchgeo/trainers/autoregression.py

+        decoder_indices: list[int] | None = None,
+        timesteps_ahead: int = 1,
+        num_layers: int = 1,
+        loss: str = 'mse',


Suggested change

loss: str = 'mse',

loss: Literal['mse', 'mae'] = 'mse',

adamjstewart · 2025-03-13T13:45:24Z

torchgeo/trainers/autoregression.py

+            batch: The output of your DataLoader.
+            batch_idx: Integer displaying index of this batch.
+        """
+        self._shared_step(batch, batch_idx, 'val')


Can we add plotting here?

adamjstewart · 2025-03-13T13:46:20Z

torchgeo/trainers/autoregression.py

+        metrics = getattr(self, f'{stage}_metrics', None)
+        if metrics:
+            metrics(y_hat, future_steps)
+            self.log_dict({f'{k}': v for k, v in metrics.compute().items()})


Do we somewhere revert normalization before calculating metrics? Are we normalizing both inputs and outputs or only inputs?

adamjstewart · 2025-03-13T13:47:12Z

tests/models/test_seq2seq.py

+
+
+class TestLSTMSeq2Seq:
+    @torch.no_grad()


I wonder if we should be adding torch.no_grad to more of our tests to speed them up...

Co-authored-by: Adam J. Stewart <[email protected]>

keves1 added 10 commits February 21, 2025 20:06

air quality dataset

bd09052

seq2seq model and tests initial commit

ee87638

additions to seq2seq model

3b754ad

autoregression trainer initial commit

3040706

further additions to trainer, seq2seq, and air quality dataset and da…

a728443

…tamodule.

more autoregression tests

537062d

added more seq2seq tests

03c4372

added docstrings.

a1d55b4

added air quality dataset to docs

382f72f

make variable name consistent

075be68

github-actions bot added documentation Improvements or additions to documentation datasets Geospatial or benchmark datasets models Models and pretrained weights testing Continuous integration testing trainers PyTorch Lightning trainers datamodules PyTorch Lightning datamodules labels Mar 12, 2025

keves1 and others added 3 commits March 12, 2025 23:05

yaml format

6ed3285

fixed air quality dataset tests

46bed02

Merge branch 'main' into add-autoregression-task

3c5a00c

adamjstewart added this to the 0.7.0 milestone Mar 13, 2025

adamjstewart mentioned this pull request Feb 21, 2025

Time Series Support #2382

Open

29 tasks

adamjstewart requested changes Mar 13, 2025

View reviewed changes

Apply suggestions from code review

3de841c

Co-authored-by: Adam J. Stewart <[email protected]>

adamjstewart removed this from the 0.7.0 milestone Mar 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AutoregressionTask #2639

Add AutoregressionTask #2639

keves1 commented Mar 12, 2025

keves1 commented Mar 13, 2025

adamjstewart left a comment

adamjstewart Mar 13, 2025

adamjstewart Mar 13, 2025

keves1 Mar 14, 2025

adamjstewart Mar 13, 2025

keves1 Mar 14, 2025

adamjstewart Mar 15, 2025

adamjstewart Mar 13, 2025

keves1 Mar 14, 2025

adamjstewart Mar 13, 2025

adamjstewart Mar 13, 2025

adamjstewart Mar 13, 2025

adamjstewart Mar 13, 2025

Add AutoregressionTask #2639

Are you sure you want to change the base?

Add AutoregressionTask #2639

Conversation

keves1 commented Mar 12, 2025

keves1 commented Mar 13, 2025

adamjstewart left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment