Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Details about text generation #4

Open
theseventhflow opened this issue Dec 22, 2020 · 3 comments
Open

Details about text generation #4

theseventhflow opened this issue Dec 22, 2020 · 3 comments

Comments

@theseventhflow
Copy link

Hi lucidrains,
Thank you for your excellent code.
I am curious about the generation scripts. Could you tell me how to generate text with the compressive transformer? Because it has the compressive memory, maybe we cannot use the current predicted word as the input for the next generation (input length ==1). In addition, if the prompt has 100 words and we use tokens [0:100], tokens[1:101], tokens[2:102]... as the input for the following timesteps, the tokens[1:100] may overlap with the memory, because the memory already contains hidden states for tokens[1:100].

I would be very appeciated if you can provide the generation scripts!

Thank you

@theseventhflow
Copy link
Author

I am sorry that I found your generation scripts just now. When the input length reaches the maximum, the next input will be only the last word. However, I still have some confusion about the overlapping of input and memory because the left-side part of the current input may have already been stored in the memory. Could you tell me why you generate in this way?

@lucidrains
Copy link
Owner

@theseventhflow Hello! This confused me as well initially, but after some research, I gathered that you always decode up until the maximum sequence length until actually doing the action of shifting to memory (and compressed memory, if it overflows)

@lucidrains
Copy link
Owner

lucidrains commented Dec 23, 2020

so, for example, if you maximum sequence length is 4, and your prime is say 2 (t h), you would do:

  1. t h e
  2. t h e y -> filled, so shift to hidden states [ t h e y ]
  3. [t h e y] c
  4. [t h e y] c a
  5. [t h e y] c a m e -> filled, so shift hidden states, compressed [t h e y] into compressed memory {x} assuming compression ratio of 4
  6. {x} [c a m e] t
  7. {x} [c a m e] t o

and so on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants