refactor: TGI Generator refactoring #7412

anakin87 · 2024-03-22T19:35:52Z

Related Issues

Proposed Changes:

clearer: the user must specify either model or url.
Previously, we asked to specify the model even if not used and this required accessing the web (see Using local models for TGI #7087).
The change is not breaking because temporary deprecation and fallback mechanisms have been implemented.
the component no longer depends on transformers, but only on the much lighter huggingface_hub
remove the tokenizer: it was dependent on transformers, required access to the network and was only used to count prompt tokens: when using this component, the users never pay per used token
if the user specifies a url, this component can perfectly run on a local network and does not require access to HF Hub (requested in tokenizer kwarg for HuggingFaceTGIGenerator and HuggingFaceTGIChatGenerator #7229)
validation is only performed in __init__ (not in warm_up)
removed the too restrictive check described in HF TGI generators are restricting too much the available models #7384
the applied changes seem compatible with a similar refactoring of the ChatGenerator

How did you test it?

CI, adapted tests
extensive manual tests: with HF Inference API, local TGI container and paid HF Inference Endpoint

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

coveralls · 2024-03-22T19:50:17Z

Pull Request Test Coverage Report for Build 8528725379

Details

0 of 0 changed or added relevant lines in 0 files are covered.
4 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.01%) to 89.492%

Files with Coverage Reduction	New Missed Lines	%
components/generators/hugging_face_tgi.py	4	95.45%

Totals
Change from base Build 8522427635:	0.01%
Covered Lines:	5570
Relevant Lines:	6224

💛 - Coveralls

anakin87 · 2024-03-27T15:53:03Z

~~TODO:~~
~~- integrate the recent refactoring (#7425)~~
~~- remove too restrictive model checking (no longer needed)~~
~~- warm_up: make it pass with a deprecation warning~~
~~- reorganize the conditions in __init__~~
~~- if not model and not URL: use a default model but raise a deprecation warning~~

shadeMe · 2024-04-03T10:28:13Z

haystack/components/generators/hugging_face_tgi.py



 logger = logging.getLogger(__name__)


+# TODO: remove the default model in Haystack 2.3.0, as explained in the deprecation warning


Let's instead open an issue for this (and the other deprecated changes) and add it to the 2.3.0 milestone (after creating it). We can add a link to this issue here.

shadeMe · 2024-04-03T10:31:14Z

haystack/components/generators/hugging_face_tgi.py

    Key Features and Compatibility:
-     - Primary Compatibility: designed to work seamlessly with any non-based model deployed using the TGI
+    - Primary Compatibility: designed to work seamlessly with models deployed using the TGI
       framework. For more information on TGI, visit [text-generation-inference](https://github.com/huggingface/text-generation-inference)

-    - Hugging Face Inference Endpoints: Supports inference of TGI chat LLMs deployed on Hugging Face
+    - Hugging Face Inference Endpoints: Supports inference of LLMs deployed on Hugging Face
       inference endpoints. For more details, refer to [inference-endpoints](https://huggingface.co/inference-endpoints)

-    - Inference API Support: supports inference of TGI LLMs hosted on the rate-limited Inference
+    - Inference API Support: supports inference of LLMs hosted on the rate-limited Inference
      API tier. Learn more about the Inference API at [inference-api](https://huggingface.co/inference-api).
-      Discover available chat models using the following command: `wget -qO- https://api-inference.huggingface.co/framework/text-generation-inference | grep chat`
-      and simply use the model ID as the model parameter for this component. You'll also need to provide a valid
-      Hugging Face API token as the token parameter.
+      In this case, you need to provide a valid Hugging Face token.

-    - Custom TGI Endpoints: supports inference of TGI chat LLMs deployed on custom TGI endpoints. Anyone can
-      deploy their own TGI endpoint using the TGI framework. For more details, refer to [inference-endpoints](https://huggingface.co/inference-endpoints)

     Input and Output Format:
      - String Format: This component uses the str format for structuring both input and output,


Can we remove this "market'y"-sounding docstring and merge the 3 links into the sentence above, similar to:

This component can be used with the HuggingFace TGI framework, Inference Endpoints and Inference API

shadeMe · 2024-04-03T10:39:38Z

releasenotes/notes/tgi-refactoring-62885781f81e18d1.yaml

+  - |
+    - The HuggingFaceTGIGenerator component requires specifying either a `url` or `model` parameter.
+      Starting from Haystack 2.3.0, the component will raise an error if neither parameter is provided.
+    - The `warm_up` method of the HuggingFaceTGIGenerator component is deprecated and will be removed in 2.3.0 release.


We should also mention the removal of the keys in the usage dict.

shadeMe · 2024-04-03T10:40:51Z

haystack/components/generators/hugging_face_tgi.py

-                "total_tokens": prompt_token_count + chunks[-1].meta.get("generated_tokens", 0),
-            },
+            "model": self._client.model,
+            "usage": {"completion_tokens": chunks[-1].meta.get("generated_tokens", 0)},


Let's keep the keys with values of zero until 2.3.0.

shadeMe · 2024-04-03T10:40:59Z

haystack/components/generators/hugging_face_tgi.py

-                        "prompt_tokens": prompt_token_count,
-                        "total_tokens": prompt_token_count + len(tgr.details.tokens),
-                    },
+                    "usage": {"completion_tokens": len(tgr.details.tokens)},


Same as above.

anakin87 · 2024-04-04T10:31:39Z

Superseded by #7464

anakin87 added 3 commits March 22, 2024 18:52

some progress

4969ae4

more progress

cebc8a9

improve tests

75a4e99

github-actions bot added type:documentation Improvements on the docs topic:tests 2.x Related to Haystack v2.0 and removed type:documentation Improvements on the docs labels Mar 22, 2024

anakin87 added 2 commits April 2, 2024 17:28

Merge branch 'main' into tgi-refactor

f3b3d59

non-breaking changes

647fbcb

github-actions bot added the type:documentation Improvements on the docs label Apr 2, 2024

anakin87 added 3 commits April 2, 2024 18:28

rm import

b8f0195

rm another import

f1effa1

release note

6115b18

anakin87 marked this pull request as ready for review April 2, 2024 16:52

anakin87 requested review from a team as code owners April 2, 2024 16:52

anakin87 requested review from dfokina, shadeMe and masci and removed request for a team April 2, 2024 16:52

anakin87 changed the title ~~TGI Generator refactoring~~ refactor: TGI Generator refactoring Apr 2, 2024

anakin87 added 2 commits April 2, 2024 18:59

improve condition

6897afb

improve docstrings

04ace00

shadeMe suggested changes Apr 3, 2024

View reviewed changes

shadeMe mentioned this pull request Apr 3, 2024

refactor: TEI embedders refactoring #7456

Closed

masci removed their request for review April 3, 2024 13:47

anakin87 closed this Apr 4, 2024

anakin87 deleted the tgi-refactor branch October 14, 2024 15:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: TGI Generator refactoring #7412

refactor: TGI Generator refactoring #7412

anakin87 commented Mar 22, 2024 •

edited

Loading

coveralls commented Mar 22, 2024 •

edited

Loading

anakin87 commented Mar 27, 2024 •

edited

Loading

shadeMe Apr 3, 2024

shadeMe Apr 3, 2024

shadeMe Apr 3, 2024

shadeMe Apr 3, 2024

shadeMe Apr 3, 2024

anakin87 commented Apr 4, 2024



		logger = logging.getLogger(__name__)


		# TODO: remove the default model in Haystack 2.3.0, as explained in the deprecation warning

refactor: TGI Generator refactoring #7412

refactor: TGI Generator refactoring #7412

Conversation

anakin87 commented Mar 22, 2024 • edited Loading

Related Issues

Proposed Changes:

How did you test it?

Checklist

coveralls commented Mar 22, 2024 • edited Loading

Pull Request Test Coverage Report for Build 8528725379

Details

💛 - Coveralls

anakin87 commented Mar 27, 2024 • edited Loading

shadeMe Apr 3, 2024

Choose a reason for hiding this comment

shadeMe Apr 3, 2024

Choose a reason for hiding this comment

shadeMe Apr 3, 2024

Choose a reason for hiding this comment

shadeMe Apr 3, 2024

Choose a reason for hiding this comment

shadeMe Apr 3, 2024

Choose a reason for hiding this comment

anakin87 commented Apr 4, 2024

anakin87 commented Mar 22, 2024 •

edited

Loading

coveralls commented Mar 22, 2024 •

edited

Loading

anakin87 commented Mar 27, 2024 •

edited

Loading