Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we use Multi-LORA CPU #128

Open
AndrewNgo-ini opened this issue Dec 5, 2024 · 3 comments
Open

Can we use Multi-LORA CPU #128

AndrewNgo-ini opened this issue Dec 5, 2024 · 3 comments
Assignees
Labels

Comments

@AndrewNgo-ini
Copy link

Hi,

Im currently following this doc: https://huggingface.co/docs/google-cloud/en/examples/gke-tgi-multi-lora-deployment

After got a bug: "Can’t scale up due to exceeded quota" and do some research, I suspect that my free trial (300$) account is not able to increase GPU quota (even I have activated my account to not be trial anymore and have to contact sale)

Is there anyway I can run this with cpu instead.

Thank you

@alvarobartt
Copy link
Member

Hi here @AndrewNgo-ini I'm afraid you won't be able to run TGI on CPUs with the current container on Google Cloud, as that's only for GPU (and coming for TPU too).

Anyway, you should be able to run TGI on Intel CPUs as of https://huggingface.co/docs/text-generation-inference/installation_intel#using-tgi-with-intel-cpus, even if it's not the recommended hardware. You should be able to re-use the container at ghcr.io/huggingface/text-generation-inference:2.4.1-intel-cpu, let me know if that works.

Hope that helps 🤗 Also happy to know more about your use-case / needs to see how we can support those better in the coming months!

@alvarobartt alvarobartt self-assigned this Dec 5, 2024
@AndrewNgo-ini
Copy link
Author

Good day @alvarobartt
My use case is to demo for devfest day of GDG. so latency wouldnt be a problem.

I’ll try to try out cpu approach and would not mind to have a PR to adjust this problem. Any suggestion I can do this properly

@alvarobartt
Copy link
Member

Great, so atm I'm afraid that we're not shipping the Text Generation Inference (TGI) DLC for any other hardware than NVIDIA GPUs and soon to come Google TPUs, so you could still demo it, but without the container being officially hosted on Google Cloud; anyway, let me know if you run into any issues when running the demo, happy to help! 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants