-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: TGI Generator refactoring #7412
Conversation
Pull Request Test Coverage Report for Build 8528725379Details
💛 - Coveralls |
|
|
||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
# TODO: remove the default model in Haystack 2.3.0, as explained in the deprecation warning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's instead open an issue for this (and the other deprecated changes) and add it to the 2.3.0 milestone (after creating it). We can add a link to this issue here.
Key Features and Compatibility: | ||
- Primary Compatibility: designed to work seamlessly with any non-based model deployed using the TGI | ||
- Primary Compatibility: designed to work seamlessly with models deployed using the TGI | ||
framework. For more information on TGI, visit [text-generation-inference](https://github.com/huggingface/text-generation-inference) | ||
|
||
- Hugging Face Inference Endpoints: Supports inference of TGI chat LLMs deployed on Hugging Face | ||
- Hugging Face Inference Endpoints: Supports inference of LLMs deployed on Hugging Face | ||
inference endpoints. For more details, refer to [inference-endpoints](https://huggingface.co/inference-endpoints) | ||
|
||
- Inference API Support: supports inference of TGI LLMs hosted on the rate-limited Inference | ||
- Inference API Support: supports inference of LLMs hosted on the rate-limited Inference | ||
API tier. Learn more about the Inference API at [inference-api](https://huggingface.co/inference-api). | ||
Discover available chat models using the following command: `wget -qO- https://api-inference.huggingface.co/framework/text-generation-inference | grep chat` | ||
and simply use the model ID as the model parameter for this component. You'll also need to provide a valid | ||
Hugging Face API token as the token parameter. | ||
In this case, you need to provide a valid Hugging Face token. | ||
|
||
- Custom TGI Endpoints: supports inference of TGI chat LLMs deployed on custom TGI endpoints. Anyone can | ||
deploy their own TGI endpoint using the TGI framework. For more details, refer to [inference-endpoints](https://huggingface.co/inference-endpoints) | ||
|
||
Input and Output Format: | ||
- String Format: This component uses the str format for structuring both input and output, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this "market'y"-sounding docstring and merge the 3 links into the sentence above, similar to:
This component can be used with the HuggingFace TGI framework, Inference Endpoints and Inference API
- | | ||
- The HuggingFaceTGIGenerator component requires specifying either a `url` or `model` parameter. | ||
Starting from Haystack 2.3.0, the component will raise an error if neither parameter is provided. | ||
- The `warm_up` method of the HuggingFaceTGIGenerator component is deprecated and will be removed in 2.3.0 release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also mention the removal of the keys in the usage
dict.
"total_tokens": prompt_token_count + chunks[-1].meta.get("generated_tokens", 0), | ||
}, | ||
"model": self._client.model, | ||
"usage": {"completion_tokens": chunks[-1].meta.get("generated_tokens", 0)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the keys with values of zero until 2.3.0.
"prompt_tokens": prompt_token_count, | ||
"total_tokens": prompt_token_count + len(tgr.details.tokens), | ||
}, | ||
"usage": {"completion_tokens": len(tgr.details.tokens)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
Superseded by #7464 |
Related Issues
Proposed Changes:
clearer: the user must specify either
model
orurl
.Previously, we asked to specify the model even if not used and this required accessing the web (see Using local models for TGI #7087).
The change is not breaking because temporary deprecation and fallback mechanisms have been implemented.
the component no longer depends on
transformers
, but only on the much lighterhuggingface_hub
remove the tokenizer: it was dependent on
transformers
, required access to the network and was only used to count prompt tokens: when using this component, the users never pay per used tokenif the user specifies a
url
, this component can perfectly run on a local network and does not require access to HF Hub (requested in tokenizer kwarg for HuggingFaceTGIGenerator and HuggingFaceTGIChatGenerator #7229)validation is only performed in
__init__
(not inwarm_up
)removed the too restrictive check described in HF TGI generators are restricting too much the available models #7384
the applied changes seem compatible with a similar refactoring of the ChatGenerator
How did you test it?
CI, adapted tests
extensive manual tests: with HF Inference API, local TGI container and paid HF Inference Endpoint
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.