If it is possible to run inference with OVIS 1.6 on a single 4090 GPU? #22

Raven625 · 2024-09-23T06:36:00Z

Could anyone please advise if it is possible to run inference with OVIS 1.6 on a single 4090 GPU? After loading the model, it appears to consume approximately 20GB of VRAM. I attempted an inference, but the demo exited due to insufficient memory. Are there any solutions to this issue?

leave-zym · 2024-09-24T08:19:43Z

The same question, is there a quantitative way of reasoning

thunder95 · 2024-09-25T03:17:33Z

same issue

FennelFetish · 2024-09-27T17:38:09Z

Offload some layers of the visual tokenizer to the CPU using a device map.
I use this function to generate the device map:

    def makeDeviceMap(llmGpuLayers: int, visGpuLayers: int) -> dict:
        llmGpuLayers = min(llmGpuLayers, 41)
        visGpuLayers = min(visGpuLayers, 26)

        deviceMap = dict()
        cpu = "cpu"
        cuda = 0

        deviceMap["llm.model.embed_tokens"] = cuda
        deviceMap["llm.model.norm"] = cuda
        deviceMap["llm.lm_head.weight"] = cuda
        deviceMap["vte.weight"] = cuda

        deviceMap["llm.model.layers.0"] = cuda
        for l in range(1, llmGpuLayers):
            deviceMap[f"llm.model.layers.{l}"] = cuda
        for l in range(llmGpuLayers, 41):
            deviceMap[f"llm.model.layers.{l}"] = cpu
        deviceMap["llm.model.layers.41"] = cuda

        deviceMap["visual_tokenizer"] = cuda
        deviceMap["visual_tokenizer.backbone.vision_model.encoder.layers.0"] = cuda
        for l in range(1, visGpuLayers):
            deviceMap[f"visual_tokenizer.backbone.vision_model.encoder.layers.{l}"] = cuda
        for l in range(visGpuLayers, 26):
            deviceMap[f"visual_tokenizer.backbone.vision_model.encoder.layers.{l}"] = cpu
        deviceMap["visual_tokenizer.backbone.vision_model.encoder.layers.26"] = cuda

        # print("mkDeviceMap:")
        # for k, v in device_map.items():
        #     print(f"{k} -> {v}")

        return deviceMap

It works on my 4090 with arguments of 41 and 6:

        self.model = AutoModelForCausalLM.from_pretrained(
            modelPath,
            torch_dtype=torch.bfloat16,
            multimodal_max_length=8192,
            #attn_implementation='flash_attention_2',
            device_map=self.makeDeviceMap(41, 6),
            trust_remote_code=True
        )

nmandic78 · 2024-09-28T19:16:14Z

I run their HF demo snippet (https://huggingface.co/AIDC-AI/Ovis1.6-Gemma2-9B) on 3090 without issues. Ubuntu, ~500Mb VRAM in use before loading the model. ~21.7Gb during inference.

And it is very good!

dustinjoe · 2024-10-06T13:23:37Z

I am trying to run it on a single 3090
The model itself seems very good but I can only infer once and then would run into the error:
'HybridCache' object has no attribute 'max_batch_size' error when doing inference

Also add some details here: #31

Thanks for FennelFetish's comment, this issue mentioned has been solved.

aceliuchanghong · 2024-10-14T09:17:36Z

the same
只能推理一次，然后就会遇到错误：
AttributeError: 'HybridCache' object has no attribute 'max_batch_size'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If it is possible to run inference with OVIS 1.6 on a single 4090 GPU? #22

If it is possible to run inference with OVIS 1.6 on a single 4090 GPU? #22

Raven625 commented Sep 23, 2024

leave-zym commented Sep 24, 2024

thunder95 commented Sep 25, 2024

FennelFetish commented Sep 27, 2024

nmandic78 commented Sep 28, 2024 •

edited

Loading

dustinjoe commented Oct 6, 2024 •

edited

Loading

aceliuchanghong commented Oct 14, 2024

If it is possible to run inference with OVIS 1.6 on a single 4090 GPU? #22

If it is possible to run inference with OVIS 1.6 on a single 4090 GPU? #22

Comments

Raven625 commented Sep 23, 2024

leave-zym commented Sep 24, 2024

thunder95 commented Sep 25, 2024

FennelFetish commented Sep 27, 2024

nmandic78 commented Sep 28, 2024 • edited Loading

dustinjoe commented Oct 6, 2024 • edited Loading

Also add some details here: #31

aceliuchanghong commented Oct 14, 2024

nmandic78 commented Sep 28, 2024 •

edited

Loading

dustinjoe commented Oct 6, 2024 •

edited

Loading