Inference only uses last token except in the first forward pass #565

tom-huntington · 2022-11-21T05:43:29Z

tom-huntington
Nov 21, 2022

Edit: Woops ignore this issue, this is just how key value caching is implemented.

Lines 141 to 143 in eff383b

    
           if tokens.shape[-1] > self.initial_token_length: 
        
               # only need to use the last token except in the first forward pass 
        
               tokens = tokens[:, -1:]

This is probably much more efficient. Although I'm surprised. I though whisper would be using the power autoregressive language models, but it doesn't.

So this must mean there is no control over where the timestamps happen.

They just get filtered out here

whisper/whisper/transcribe.py

Line 195 in eff383b

timestamp_tokens: torch.Tensor = tokens.ge(tokenizer.timestamp_begin)

Actually, you can just take the argmax of the the timestamp logits to get the timestamps for each word #3 (comment)

tom-huntington · 2022-11-21T23:59:52Z

tom-huntington
Nov 21, 2022
Author

whisper/whisper/model.py

Lines 75 to 79 in eff383b

    
           if kv_cache is None or xa is None or self.key not in kv_cache: 
        
               # hooks, if installed (i.e. kv_cache is not None), will prepend the cached kv tensors; 
        
               # otherwise, perform key/value projections for self- or cross-attention as usual. 
        
               k = self.key(x if xa is None else xa) 
        
               v = self.value(x if xa is None else xa)

so after forward is called on self.k and self.v the following hook is called to append the previously cached results, just as the comment says.

whisper/whisper/model.py

Line 254 in eff383b

cache[module] = torch.cat([cache[module], output], dim=1).detach()

The keys and values for the audio features are also cached, but a hook is not used for these.

I guess masking makes the caching possible

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference only uses last token except in the first forward pass #565

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Inference only uses last token except in the first forward pass #565

tom-huntington Nov 21, 2022

Replies: 1 comment

tom-huntington Nov 21, 2022 Author

tom-huntington
Nov 21, 2022

tom-huntington
Nov 21, 2022
Author