You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to make visualization systems for visualizing transformers, specifically self-attention. It would be nice if it worked for Vision Transformers as well as Language Models.
The text was updated successfully, but these errors were encountered:
This is the most important component (imo) to visualize property in the Transformer Architecture. I can think of two levels of visualization for this.
In-depth visualization
This visualization will show (1) the Key, Value, and Query feed-forward layers, (2) the matrices returned by these layers that are then multiplied, (3) the softmax operation combining the Key and Query into a score (4) the linear combination of the values into final values.
It should take in either text (broken down into tokens), an image (broken into patches), or vectors (output of a feed forward layer). These should then be passed into a self-attention layer. This layer should put the tokens (whatever type) onto the left and top side of a matrix visualization. The matrix visualization should be a 2D heatmap of the softmaxed (normalized) attention scores. Finally, the scores should be combined with the values to form the output of the self-attention module.
ImageToPatches
I will need to make a layer for splitting up an image into patches. The patches are necessary to represent the image as a sequence.
I want to make visualization systems for visualizing transformers, specifically self-attention. It would be nice if it worked for Vision Transformers as well as Language Models.
The text was updated successfully, but these errors were encountered: