Why rwkv6 has a similar GPU memory occupancy on lower resolution images with ViT methods? #33

thucz · 2024-07-19T15:20:30Z

I try to train the RWKV6 block on 256x256 images. However, I found there is almost no GPU reduction relative to ViT. So what is the advantage of Vision-RWKV6 in this setting?

duanduanduanyuchen · 2024-07-26T18:26:35Z

Hi, thanks for your attention to VRWKV!
The reduction of GPU memory of VRWKV/VRWKV6 on low resolution is minimal (vrwkv may even cost more VRAMs or have a slower speed, as the original attention mechanism is optimized over several versions in common DL Frameworks). VRWKV will show its advantages in higher-resolution scenarios. In low-resolution cases, we show the VRWKV has comparable performance to ViT(so that it has the potential to replace ViT in most tasks currently and shows its advantages in high resolutions).

duanduanduanyuchen closed this as completed Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why rwkv6 has a similar GPU memory occupancy on lower resolution images with ViT methods? #33

Why rwkv6 has a similar GPU memory occupancy on lower resolution images with ViT methods? #33

thucz commented Jul 19, 2024

duanduanduanyuchen commented Jul 26, 2024

Why rwkv6 has a similar GPU memory occupancy on lower resolution images with ViT methods? #33

Why rwkv6 has a similar GPU memory occupancy on lower resolution images with ViT methods? #33

Comments

thucz commented Jul 19, 2024

duanduanduanyuchen commented Jul 26, 2024