Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any solution to support variable sequence length during training? #5

Open
ruoyxue opened this issue Mar 11, 2024 · 2 comments

Comments

@ruoyxue
Copy link

ruoyxue commented Mar 11, 2024

No description provided.

@proger
Copy link
Owner

proger commented Mar 19, 2024

Hi! The current workaround is to pad the output to the nearest power of two before scanning. Could you tell more about your use case?

@kklemon
Copy link

kklemon commented Oct 10, 2024

Not the original author here, but my specific use case are linear RNNs. These will have an initial hidden state $h_0$ which can be provided as the first token and setting gate[0] = 0, in case an implementation does not allow providing an initial state explicitly, such as for this one. However, this will shorten the length of the remaining actual sequence to process to $2^{n}-1$. This constraint may seem odd from an outsider's perspective who is not familiar with the details of the underlying parallel scan implementation.

I would therefore suggest two solutions for improving this situation:

  • Allow providing an initial element explicitly. From my understanding of the underlying CUDA implementation, this should be relatively easy to do.
  • Drop the power of two-sequence length constraint. Probably more difficult to implement and may come with a slight performance penalty but would also cover more use cases should as variable-length inference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants