Is there any solution to support variable sequence length during training? #5

ruoyxue · 2024-03-11T12:05:51Z

No description provided.

proger · 2024-03-19T12:56:21Z

Hi! The current workaround is to pad the output to the nearest power of two before scanning. Could you tell more about your use case?

kklemon · 2024-10-10T20:25:56Z

Not the original author here, but my specific use case are linear RNNs. These will have an initial hidden state $h_0$ which can be provided as the first token and setting gate[0] = 0, in case an implementation does not allow providing an initial state explicitly, such as for this one. However, this will shorten the length of the remaining actual sequence to process to $2^{n}-1$. This constraint may seem odd from an outsider's perspective who is not familiar with the details of the underlying parallel scan implementation.

I would therefore suggest two solutions for improving this situation:

Allow providing an initial element explicitly. From my understanding of the underlying CUDA implementation, this should be relatively easy to do.
Drop the power of two-sequence length constraint. Probably more difficult to implement and may come with a slight performance penalty but would also cover more use cases should as variable-length inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any solution to support variable sequence length during training? #5

Is there any solution to support variable sequence length during training? #5

ruoyxue commented Mar 11, 2024

proger commented Mar 19, 2024 •

edited

Loading

kklemon commented Oct 10, 2024

Is there any solution to support variable sequence length during training? #5

Is there any solution to support variable sequence length during training? #5

Comments

ruoyxue commented Mar 11, 2024

proger commented Mar 19, 2024 • edited Loading

kklemon commented Oct 10, 2024

proger commented Mar 19, 2024 •

edited

Loading