entrypoint.sh for TGI does not implemented requirements.txt installation process #138

jk1333 · 2025-01-09T08:09:14Z

Hello team,

Like this sample, https://github.com/huggingface/Google-Cloud-Containers/blob/main/containers/pytorch/inference/gpu/2.3.1/transformers/4.46.1/py311/entrypoint.sh

The entrypoint needs requirements.txt provisioning process.

But in this TGI sample does not contains these procedure.
https://github.com/huggingface/Google-Cloud-Containers/blob/main/containers/tgi/gpu/3.0.1/entrypoint.sh

Is it missing or handled by text_generation_launcher process internally ?

jk1333 · 2025-01-09T08:30:24Z

And it seems custom handler, handler.py is not recognized by TGI container.
The Dockerfile does not have process of installing HF Inference Toolkit
Is this intended and TGI does not support custom handler ?

alvarobartt · 2025-01-09T16:33:01Z

Hi here @jk1333, sorry in advance for the misunderstanding if any!

Indeed the custom handler just applies to the PyTorch Inference DLC which is powered internally by the huggingface-inference-toolkit; meanind that TGI does not need nor require you to define neither a handler.py nor the requirements.txt, as every dependency already comes installed within the DLC.

Are you asking because you faced any problem when serving a model? Is there anything I can help you with? Please let me know 🤗

jk1333 · 2025-01-10T01:14:14Z

Thanks for confirm @alvarobartt :) What we are wanting to achieve is collecting /metrics values to VertexAI's cloud monitoring what we can see through prometheus. And also wondered we can utilize loading model and customize handling. But it seems quite complicate linking it to custom handler. Will it be a way to bring the /metrics values to other monitoring apis like pushing ? It seems current dlc architecture once hosted on vertex ai can not utilize the values.

alvarobartt · 2025-01-13T16:10:12Z

AFAIK that's one of the Vertex AI constraints, as the endpoints other than /predict are not accessible, so if you want to access the /metrics endpoints I'd recommend you to use either Google Kubernetes Engine (GKE) or even Cloud Run too (as recently added GPU support).

jk1333 · 2025-01-17T05:02:56Z

Thanks for discussion!
I found one small issues and wanting to share.
I'm testing both TGI and vllm container using same model (Llama3.1-8b).
I found that TGI container is not supporting streaming and /chat/completions on vertex ai endpoints.
Maybe this is because of configuration or integration parts.
I'm leaving my test code here if you want to try.
https://github.com/jk1333/mlops/blob/main/TGI_CustomHandler.ipynb

Thanks for interests!

alvarobartt · 2025-01-17T08:15:50Z

Hi here @jk1333 thanks for sharing, I'll have a look at your code; for Vertex AI we're exposing the MESSAGES_API_ENABLED=true environment variable that needs to be provided to the DLC when running on Vertex AI if you want to enable the /v1/chat/completions endpoint instead of the default /generate endpoint. Enabling the chat completions API would let you send messages formatted using the OpenAI OpenAPI specification, but I'm afraid that streaming won't work as Vertex AI Endpoints don't support streaming AFAIK, so you can benefit from the OpenAI-compatible API via Vertex AI, but not for streaming (stream=false within the completions payload).

alvarobartt self-assigned this Jan 9, 2025

alvarobartt added the question label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

entrypoint.sh for TGI does not implemented requirements.txt installation process #138

entrypoint.sh for TGI does not implemented requirements.txt installation process #138

jk1333 commented Jan 9, 2025

jk1333 commented Jan 9, 2025

alvarobartt commented Jan 9, 2025

jk1333 commented Jan 10, 2025

alvarobartt commented Jan 13, 2025

jk1333 commented Jan 17, 2025

alvarobartt commented Jan 17, 2025 •

edited

Loading

entrypoint.sh for TGI does not implemented requirements.txt installation process #138

entrypoint.sh for TGI does not implemented requirements.txt installation process #138

Comments

jk1333 commented Jan 9, 2025

jk1333 commented Jan 9, 2025

alvarobartt commented Jan 9, 2025

jk1333 commented Jan 10, 2025

alvarobartt commented Jan 13, 2025

jk1333 commented Jan 17, 2025

alvarobartt commented Jan 17, 2025 • edited Loading

alvarobartt commented Jan 17, 2025 •

edited

Loading