Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable sequence length #139

Open
GrgicevicLukaNTNU opened this issue May 29, 2023 · 2 comments
Open

Variable sequence length #139

GrgicevicLukaNTNU opened this issue May 29, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request new feature Proposing to add a new feature

Comments

@GrgicevicLukaNTNU
Copy link

1. Feature description

To enable variable 'sequence length' of the input data.

2. Motivation

Some of the input training data are composed of multiple concatenated time series of different lengths.

3. Your contribution

I will try to help

@GrgicevicLukaNTNU GrgicevicLukaNTNU added enhancement New feature or request new feature Proposing to add a new feature labels May 29, 2023
@WenjieDu
Copy link
Owner

Many thanks Luka!

Adding some info here as we discussed on Slack:

In your dataset, every sample may have different sequence lengths. Yes, as you said, RNN cells accept variable seq len. But to some other models like attention-based ones, the sizes of their attention matrixes are fixed. Therefore, to accomplish this, you should find the max length of your samples and pad other samples with shorter sizes to the same length, i.e. (the max one). You can add a mask array called padding_mask to indicate which parts in samples are padded.

@WenjieDu
Copy link
Owner

WenjieDu commented Jul 8, 2023

I did give a thought to this feature request. From my personal view, so far we cannot make out a general method or function for all models to use. Because this feature for each model is very specific and we have to make specific implementation for each model (at least for each kind of model), e.g. to self-attention models like SAITS, we can add additional attention masks for the padded parts to enable model training on variable-length input, and to RNN models like BRITS, we can leverage torch.nn.utils.rnn.pack_padded_sequence() and torch.nn.utils.rnn.pad_packed_sequence() to help with it.

Therefore, the workload will be very large. We can start with models we're familiar with to handle them one by one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request new feature Proposing to add a new feature
Projects
None yet
Development

No branches or pull requests

2 participants