Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why rwkv6 has a similar GPU memory occupancy on lower resolution images with ViT methods? #33

Closed
thucz opened this issue Jul 19, 2024 · 1 comment

Comments

@thucz
Copy link

thucz commented Jul 19, 2024

I try to train the RWKV6 block on 256x256 images. However, I found there is almost no GPU reduction relative to ViT. So what is the advantage of Vision-RWKV6 in this setting?

@duanduanduanyuchen
Copy link
Collaborator

Hi, thanks for your attention to VRWKV!
The reduction of GPU memory of VRWKV/VRWKV6 on low resolution is minimal (vrwkv may even cost more VRAMs or have a slower speed, as the original attention mechanism is optimized over several versions in common DL Frameworks). VRWKV will show its advantages in higher-resolution scenarios. In low-resolution cases, we show the VRWKV has comparable performance to ViT(so that it has the potential to replace ViT in most tasks currently and shows its advantages in high resolutions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants