You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But after a brief reading of the VisionTransformerDiffPruning model code under vit_l2_3keep_senet.py, I was puzzled by the token pruning. Token pruning implies a reduction in the number of tokens (Figure 2 in the paper). I didn't find any reduction in the number of tokens in VisionTransformerDiffPruning, but rather the mask of informative token and placeholder. then the representive token is obtained based on the mask and then concatenation with x (x = torch.cat((x,represent_token), dim=1)). Here I am confused, the number of tokens is not reduced under feature x. Does this affect the efficiency?
Maybe I misunderstood, and I hope you can give a detailed explanation.
I look forward to your reply.
The text was updated successfully, but these errors were encountered:
Very good work!
But after a brief reading of the VisionTransformerDiffPruning model code under vit_l2_3keep_senet.py, I was puzzled by the token pruning. Token pruning implies a reduction in the number of tokens (Figure 2 in the paper). I didn't find any reduction in the number of tokens in VisionTransformerDiffPruning, but rather the mask of informative token and placeholder. then the representive token is obtained based on the mask and then concatenation with x (x = torch.cat((x,represent_token), dim=1)). Here I am confused, the number of tokens is not reduced under feature x. Does this affect the efficiency?
Maybe I misunderstood, and I hope you can give a detailed explanation.
I look forward to your reply.
The text was updated successfully, but these errors were encountered: