-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Details about text generation #4
Comments
I am sorry that I found your generation scripts just now. When the input length reaches the maximum, the next input will be only the last word. However, I still have some confusion about the overlapping of input and memory because the left-side part of the current input may have already been stored in the memory. Could you tell me why you generate in this way? |
@theseventhflow Hello! This confused me as well initially, but after some research, I gathered that you always decode up until the maximum sequence length until actually doing the action of shifting to memory (and compressed memory, if it overflows) |
so, for example, if you maximum sequence length is 4, and your prime is say 2 (t h), you would do:
and so on |
Hi lucidrains,
Thank you for your excellent code.
I am curious about the generation scripts. Could you tell me how to generate text with the compressive transformer? Because it has the compressive memory, maybe we cannot use the current predicted word as the input for the next generation (input length ==1). In addition, if the prompt has 100 words and we use tokens [0:100], tokens[1:101], tokens[2:102]... as the input for the following timesteps, the tokens[1:100] may overlap with the memory, because the memory already contains hidden states for tokens[1:100].
I would be very appeciated if you can provide the generation scripts!
Thank you
The text was updated successfully, but these errors were encountered: