Skip to content

Commit

Permalink
Update chapters/en/unit7/video-processing/transformers-based-models.mdx
Browse files Browse the repository at this point in the history
Co-authored-by: Woojun Jung <[email protected]>
  • Loading branch information
mreraser and jungnerd authored Oct 8, 2024
1 parent 8faa705 commit 48f7543
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ The approach in Model 1 was somewhat inefficient, as it contextualized all patch
</div>
<small>Factorised encoder (Model 2). Taken from the <a href = "https://arxiv.org/abs/2103.15691">original paper</a>.</small>

First, only spatial interactions are contextualized through Spatial Transformer Encoder (=ViT). Then, each frame is encoded to a single embedding, fed into the Temporal Transformer Encoder(=general transformer).
First, only spatial interactions are contextualized through Spatial Transformer Encoder(=ViT). Then, each frame is encoded to a single embedding, fed into the Temporal Transformer Encoder(=general transformer).

**complexity : O(n_h^2 x n_w^2 + n_t^2)**

Expand Down

0 comments on commit 48f7543

Please sign in to comment.