Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

entrypoint.sh for TGI does not implemented requirements.txt installation process #138

Open
jk1333 opened this issue Jan 9, 2025 · 6 comments
Assignees
Labels

Comments

@jk1333
Copy link

jk1333 commented Jan 9, 2025

Hello team,

Like this sample, https://github.com/huggingface/Google-Cloud-Containers/blob/main/containers/pytorch/inference/gpu/2.3.1/transformers/4.46.1/py311/entrypoint.sh

The entrypoint needs requirements.txt provisioning process.

But in this TGI sample does not contains these procedure.
https://github.com/huggingface/Google-Cloud-Containers/blob/main/containers/tgi/gpu/3.0.1/entrypoint.sh

Is it missing or handled by text_generation_launcher process internally ?

@jk1333
Copy link
Author

jk1333 commented Jan 9, 2025

And it seems custom handler, handler.py is not recognized by TGI container.
The Dockerfile does not have process of installing HF Inference Toolkit
Is this intended and TGI does not support custom handler ?

@alvarobartt
Copy link
Member

Hi here @jk1333, sorry in advance for the misunderstanding if any!

Indeed the custom handler just applies to the PyTorch Inference DLC which is powered internally by the huggingface-inference-toolkit; meanind that TGI does not need nor require you to define neither a handler.py nor the requirements.txt, as every dependency already comes installed within the DLC.

Are you asking because you faced any problem when serving a model? Is there anything I can help you with? Please let me know 🤗

@alvarobartt alvarobartt self-assigned this Jan 9, 2025
@jk1333
Copy link
Author

jk1333 commented Jan 10, 2025

Thanks for confirm @alvarobartt :) What we are wanting to achieve is collecting /metrics values to VertexAI's cloud monitoring what we can see through prometheus. And also wondered we can utilize loading model and customize handling. But it seems quite complicate linking it to custom handler. Will it be a way to bring the /metrics values to other monitoring apis like pushing ? It seems current dlc architecture once hosted on vertex ai can not utilize the values.

@alvarobartt
Copy link
Member

AFAIK that's one of the Vertex AI constraints, as the endpoints other than /predict are not accessible, so if you want to access the /metrics endpoints I'd recommend you to use either Google Kubernetes Engine (GKE) or even Cloud Run too (as recently added GPU support).

@jk1333
Copy link
Author

jk1333 commented Jan 17, 2025

Thanks for discussion!
I found one small issues and wanting to share.
I'm testing both TGI and vllm container using same model (Llama3.1-8b).
I found that TGI container is not supporting streaming and /chat/completions on vertex ai endpoints.
Maybe this is because of configuration or integration parts.
I'm leaving my test code here if you want to try.
https://github.com/jk1333/mlops/blob/main/TGI_CustomHandler.ipynb

Thanks for interests!

@alvarobartt
Copy link
Member

alvarobartt commented Jan 17, 2025

Hi here @jk1333 thanks for sharing, I'll have a look at your code; for Vertex AI we're exposing the MESSAGES_API_ENABLED=true environment variable that needs to be provided to the DLC when running on Vertex AI if you want to enable the /v1/chat/completions endpoint instead of the default /generate endpoint. Enabling the chat completions API would let you send messages formatted using the OpenAI OpenAPI specification, but I'm afraid that streaming won't work as Vertex AI Endpoints don't support streaming AFAIK, so you can benefit from the OpenAI-compatible API via Vertex AI, but not for streaming (stream=false within the completions payload).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants