Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: TGI Generator refactoring #7412

Closed
wants to merge 10 commits into from
Closed

refactor: TGI Generator refactoring #7412

wants to merge 10 commits into from

Conversation

anakin87
Copy link
Member

@anakin87 anakin87 commented Mar 22, 2024

Related Issues

Proposed Changes:

  • clearer: the user must specify either model or url.
    Previously, we asked to specify the model even if not used and this required accessing the web (see Using local models for TGI #7087).
    The change is not breaking because temporary deprecation and fallback mechanisms have been implemented.

  • the component no longer depends on transformers, but only on the much lighter huggingface_hub

  • remove the tokenizer: it was dependent on transformers, required access to the network and was only used to count prompt tokens: when using this component, the users never pay per used token

  • if the user specifies a url, this component can perfectly run on a local network and does not require access to HF Hub (requested in tokenizer kwarg for HuggingFaceTGIGenerator and HuggingFaceTGIChatGenerator #7229)

  • validation is only performed in __init__ (not in warm_up)

  • removed the too restrictive check described in HF TGI generators are restricting too much the available models #7384

  • the applied changes seem compatible with a similar refactoring of the ChatGenerator

How did you test it?

CI, adapted tests
extensive manual tests: with HF Inference API, local TGI container and paid HF Inference Endpoint

Checklist

@github-actions github-actions bot added type:documentation Improvements on the docs topic:tests 2.x Related to Haystack v2.0 and removed type:documentation Improvements on the docs labels Mar 22, 2024
@coveralls
Copy link
Collaborator

coveralls commented Mar 22, 2024

Pull Request Test Coverage Report for Build 8528725379

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 4 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.01%) to 89.492%

Files with Coverage Reduction New Missed Lines %
components/generators/hugging_face_tgi.py 4 95.45%
Totals Coverage Status
Change from base Build 8522427635: 0.01%
Covered Lines: 5570
Relevant Lines: 6224

💛 - Coveralls

@anakin87
Copy link
Member Author

anakin87 commented Mar 27, 2024

TODO:
- integrate the recent refactoring (#7425)
- remove too restrictive model checking (no longer needed)
- warm_up: make it pass with a deprecation warning
- reorganize the conditions in __init__
- if not model and not URL: use a default model but raise a deprecation warning

@github-actions github-actions bot added the type:documentation Improvements on the docs label Apr 2, 2024
@anakin87 anakin87 marked this pull request as ready for review April 2, 2024 16:52
@anakin87 anakin87 requested review from a team as code owners April 2, 2024 16:52
@anakin87 anakin87 requested review from dfokina, shadeMe and masci and removed request for a team April 2, 2024 16:52
@anakin87 anakin87 changed the title TGI Generator refactoring refactor: TGI Generator refactoring Apr 2, 2024


logger = logging.getLogger(__name__)


# TODO: remove the default model in Haystack 2.3.0, as explained in the deprecation warning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's instead open an issue for this (and the other deprecated changes) and add it to the 2.3.0 milestone (after creating it). We can add a link to this issue here.

Comment on lines 36 to 49
Key Features and Compatibility:
- Primary Compatibility: designed to work seamlessly with any non-based model deployed using the TGI
- Primary Compatibility: designed to work seamlessly with models deployed using the TGI
framework. For more information on TGI, visit [text-generation-inference](https://github.com/huggingface/text-generation-inference)

- Hugging Face Inference Endpoints: Supports inference of TGI chat LLMs deployed on Hugging Face
- Hugging Face Inference Endpoints: Supports inference of LLMs deployed on Hugging Face
inference endpoints. For more details, refer to [inference-endpoints](https://huggingface.co/inference-endpoints)

- Inference API Support: supports inference of TGI LLMs hosted on the rate-limited Inference
- Inference API Support: supports inference of LLMs hosted on the rate-limited Inference
API tier. Learn more about the Inference API at [inference-api](https://huggingface.co/inference-api).
Discover available chat models using the following command: `wget -qO- https://api-inference.huggingface.co/framework/text-generation-inference | grep chat`
and simply use the model ID as the model parameter for this component. You'll also need to provide a valid
Hugging Face API token as the token parameter.
In this case, you need to provide a valid Hugging Face token.

- Custom TGI Endpoints: supports inference of TGI chat LLMs deployed on custom TGI endpoints. Anyone can
deploy their own TGI endpoint using the TGI framework. For more details, refer to [inference-endpoints](https://huggingface.co/inference-endpoints)

Input and Output Format:
- String Format: This component uses the str format for structuring both input and output,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this "market'y"-sounding docstring and merge the 3 links into the sentence above, similar to:

This component can be used with the HuggingFace TGI framework, Inference Endpoints and Inference API

- |
- The HuggingFaceTGIGenerator component requires specifying either a `url` or `model` parameter.
Starting from Haystack 2.3.0, the component will raise an error if neither parameter is provided.
- The `warm_up` method of the HuggingFaceTGIGenerator component is deprecated and will be removed in 2.3.0 release.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also mention the removal of the keys in the usage dict.

"total_tokens": prompt_token_count + chunks[-1].meta.get("generated_tokens", 0),
},
"model": self._client.model,
"usage": {"completion_tokens": chunks[-1].meta.get("generated_tokens", 0)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the keys with values of zero until 2.3.0.

"prompt_tokens": prompt_token_count,
"total_tokens": prompt_token_count + len(tgr.details.tokens),
},
"usage": {"completion_tokens": len(tgr.details.tokens)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@masci masci removed their request for review April 3, 2024 13:47
@anakin87
Copy link
Member Author

anakin87 commented Apr 4, 2024

Superseded by #7464

@anakin87 anakin87 closed this Apr 4, 2024
@anakin87 anakin87 deleted the tgi-refactor branch October 14, 2024 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants