Releases: huggingface/huggingface_hub
[v0.23.3] Patch: fix details not returned in `InferenceClient.text_generation`
Release 0.23.0 introduced a breaking change in InferenceClient.text_generation
. When details=True
is passed, the details
attribute in the output is always None. The patch release fixes this. See #2316 for more details.
Full Changelog: v0.23.2...v0.23.3
[v0.23.2] Patch: Support `max_shard_size` as string in `split_state_dict_into_shards_factory`
split_state_dict_into_shards_factory
now accepts string values as max_shard_size
(ex: "5MB"
), in addition to integer values. Related PR: #2286.
Full Changelog: v0.23.1...v0.23.2
v0.23.1 hot-fix: optimize HTTP calls in `HfFileSystem`
See #2271 for more details.
Full Changelog: v0.23.0...v0.23.1
v0.23.0: LLMs with tools, seamless downloads, and much more!
📁 Seamless download to local dir
The 0.23.0
release comes with a big revamp of the download process, especially when it comes to downloading to a local directory. Previously the process was still involving the cache directory and symlinks which led to misconceptions and a suboptimal user experience. The new workflow involves a .cache/huggingface/
folder, similar to the .git/
one, that keeps track of the progress of a download. The main features are:
- no symlinks
- no local copy
- don't re-download when not necessary
- same behavior on both Unix and Windows
- unrelated to cache-system
Example to download q4 GGUF file for microsoft/Phi-3-mini-4k-instruct-gguf:
# Download q4 GGUF file from
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf Phi-3-mini-4k-instruct-q4.gguf --local-dir=data/phi3
With this addition, interrupted downloads are now resumable! This applies both for downloads in local and cache directories which should greatly improve UX for users with slow/unreliable connections. In this regard, the resume_download
parameter is now deprecated (not relevant anymore).
- Revamp download to local dir process by @Wauplin in #2223
- Rename
.huggingface/
folder to.cache/huggingface/
by @Wauplin in #2262
💡 Grammar and Tools in InferenceClient
It is now possible to provide a list of tools when chatting with a model using the InferenceClient
! This major improvement has been made possible thanks to TGI that handle them natively.
>>> from huggingface_hub import InferenceClient
# Ask for weather in the next days using tools
>>> client = InferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
>>> messages = [
... {"role": "system", "content": "Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous."},
... {"role": "user", "content": "What's the weather like the next 3 days in San Francisco, CA?"},
... ]
>>> tools = [
... {
... "type": "function",
... "function": {
... "name": "get_current_weather",
... "description": "Get the current weather",
... "parameters": {
... "type": "object",
... "properties": {
... "location": {
... "type": "string",
... "description": "The city and state, e.g. San Francisco, CA",
... },
... "format": {
... "type": "string",
... "enum": ["celsius", "fahrenheit"],
... "description": "The temperature unit to use. Infer this from the users location.",
... },
... },
... "required": ["location", "format"],
... },
... },
... },
... ...
... ]
>>> response = client.chat_completion(
... model="meta-llama/Meta-Llama-3-70B-Instruct",
... messages=messages,
... tools=tools,
... tool_choice="auto",
... max_tokens=500,
... )
>>> response.choices[0].message.tool_calls[0].function
ChatCompletionOutputFunctionDefinition(
arguments={
'location': 'San Francisco, CA',
'format': 'fahrenheit',
'num_days': 3
},
name='get_n_day_weather_forecast',
description=None
)
It is also possible to provide grammar rules to the text_generation
task. This ensures that the output follows a precise JSON Schema specification or matches a regular expression. For more details about it, check out the Guidance guide from Text-Generation-Inference docs.
⚙️ Other
Mention more chat-completion
task instead of conversation
in documentation.
chat-completion
relies on server-side rendering in all cases, including when model is transformers
-backed. Previously it was only the case for TGI-backed models and templates were rendered client-side otherwise.
Improved logic to determine whether a model is served via TGI or transformers
.
🌐 📚 Korean community is on fire!
The PseudoLab team is a non-profit dedicated to make AI more accessible in the Korean-speaking community. In the past few weeks, their team of contributors managed to translated (almost) entirely the huggingface_hub
documentation. Huge shout-out to the coordination on this task! Documentation can be accessed here.
- 🌐 [i18n-KO] Translated
guides/webhooks_server.md
to Korean by @nuatmochoi in #2145 - 🌐 [i18n-KO] Translated
reference/login.md
to Korean by @SeungAhSon in #2151 - 🌐 [i18n-KO] Translated
package_reference/tensorboard.md
to Korean by @fabxoe in #2173 - 🌐 [i18n-KO] Translated
package_reference/inference_client.md
to Korean by @cjfghk5697 in #2178 - 🌐 [i18n-KO] Translated
reference/inference_endpoints.md
to Korean by @harheem in #2180 - 🌐 [i18n-KO] Translated
package_reference/file_download.md
to Korean by @seoyoung-3060 in #2184 - 🌐 [i18n-KO] Translated
package_reference/cache.md
to Korean by @nuatmochoi in #2191 - 🌐 [i18n-KO] Translated
package_reference/collections.md
to Korean by @boyunJang in #2214 - 🌐 [i18n-KO] Translated
package_reference/inference_types.md
to Korean by @fabxoe in #2171 - 🌐 [i18n-KO] Translated
guides/upload.md
to Korean by @junejae in #2139 - 🌐 [i18n-KO] Translated
reference/repository.md
to Korean by @junejae in #2189 - 🌐 [i18n-KO] Translated
package_reference/space_runtime.md
to Korean by @boyunJang in #2213 - 🌐 [i18n-KO] Translated
guides/repository.md
to Korean by @cjfghk5697 in #2124 - 🌐 [i18n-KO] Translated
guides/model_cards.md
to Korean" by @SeungAhSon in #2128 - 🌐 [i18n-KO] Translated
guides/community.md
to Korean by @seoulsky-field in #2126 - 🌐 [i18n-KO] Translated
guides/cli.md
to Korean by @harheem in #2131 - 🌐 [i18n-KO] Translated
guides/search.md
to Korean by @seoyoung-3060 in #2134 - 🌐 [i18n-KO] Translated
guides/inference.md
to Korean by @boyunJang in #2130 - 🌐 [i18n-KO] Translated
guides/manage-spaces.md
to Korean by @boyunJang in #2220 - 🌐 [i18n-KO] Translating
guides/hf_file_system.md
to Korean by @heuristicwave in #2146 - 🌐 [i18n-KO] Translated
package_reference/hf_api.md
to Korean by @fabxoe in #2165 - 🌐 [i18n-KO] Translated
package_reference/mixins.md
to Korean by @fabxoe in #2166 - 🌐 [i18n-KO] Translated
guides/inference_endpoints.md
to Korean by @usr-bin-ksh in #2164 - 🌐 [i18n-KO] Translated
package_reference/utilities.md
to Korean by @cjfghk5697 in #2196 - fix ko docs by @Wauplin (direct commit on main)
- 🌐 [i18n-KO] Translated package_reference/serialization.md to Korean by @seoyoung-3060 in #2233
- 🌐 [i18n-KO] Translated package_reference/hf_file_system.md to Korean by @SeungAhSon in #2174
🛠️ Misc improvements
User API
@bilgehanertan added support for 2 new routes:
get_user_overview
to retrieve high-level information about a user: username, avatar, number of models/datasets/Spaces, number of likes and upvotes, number of interactions in discussion, etc.
- User API endpoints by @bilgehanertan in #2147
CLI tag
@bilgehanertan added a new command to the CLI to handle tags. It is now possible to:
- tag a repo
>>> huggingface-cli tag Wauplin/my-cool-model v1.0
You are about to create tag v1.0 on model Wauplin/my-cool-model
Tag v1.0 created on Wauplin/my-cool-model
- retrieve the list of tags for a repo
>>> huggingface-cli tag Wauplin/gradio-space-ci -l --repo-type space
Tags for space Wauplin/gradio-space-ci:
0.2.2
0.2.1
0.2.0
0.1.2
0.0.2
0.0.1
- delete a tag on a repo
>>> huggingface-cli tag -d Wauplin/my-cool-model v1.0
You are about to delete tag v1.0 on model Wauplin/my-cool-model
Proceed? [Y/n] y
Tag v1.0 deleted on Wauplin/my-cool-model
For more details, check out the CLI guide.
- CLI Tag Functionality by @bilgehanertan in #2172
🧩 ModelHubMixin
This ModelHubMixin
got a set of nice improvement to generate model cards and handle custom data types in the config.json
file. More info in the integration guide.
ModelHubMixin
: more metadata + arbitrary config types + proper guide by @Wauplin in #2230- Fix ModelHubMixin when class is a dataclass by @Wauplin in #2159
- Do not document private attributes of ModelHubMixin by @Wauplin in #2216
- Add support for pipeline_tag in ModelHubMixin by @Wauplin in #2228
⚙️ Other
In a shared environment, it is now possible to set a custom path HF_TOKEN_PATH
as environment variable so that each user of the cluster has their own access token.
Thanks to @Y4suyuki and @lappemic, most custom errors defined in huggingface_hub
are now aggregated in the same module. This makes it very easy...
[v0.22.2] Hot-fix: correctly handle proxies
Full Changelog: v0.22.1...v0.22.2
[v0.22.1] Hot-fix: correctly handle dataclasses in ModelHubMixin
Fixed a bug breaking the SetFit integration.
What's Changed
- Fix use other chat completion providers by @Wauplin in #2153
- Fix ModelHubMixin when class is a dataclass by @Wauplin in #2159
Full Changelog: v0.22.0...v0.22.1
v0.22.0: Chat completion, inference types and hub mixins!
Discuss about the release in our Community Tab. Feedback is welcome!! 🤗
✨ InferenceClient
Support for inference tools continues to improve in huggingface_hub
. At the menu in this release? A new chat_completion
API and fully typed inputs/outputs!
Chat-completion API!
A long-awaited API has just landed in huggingface_hub
! InferenceClient.chat_completion
follows most of OpenAI's API, making it much easier to integrate with existing tools.
Technically speaking it uses the same backend as the text-generation
task but requires a preprocessing step to format the list of messages into a single text prompt. The chat template is rendered server-side when models are powered by TGI, which is the case for most LLMs: Llama, Zephyr, Mistral, Gemma, etc. Otherwise, the templating happens client-side which requires minijinja
package to be installed. We are actively working on bridging this gap, aiming at rendering all templates server-side in the future.
>>> from huggingface_hub import InferenceClient
>>> messages = [{"role": "user", "content": "What is the capital of France?"}]
>>> client = InferenceClient("HuggingFaceH4/zephyr-7b-beta")
# Batch completion
>>> client.chat_completion(messages, max_tokens=100)
ChatCompletionOutput(
choices=[
ChatCompletionOutputChoice(
finish_reason='eos_token',
index=0,
message=ChatCompletionOutputChoiceMessage(
content='The capital of France is Paris. The official name of the city is "Ville de Paris" (City of Paris) and the name of the country\'s governing body, which is located in Paris, is "La République française" (The French Republic). \nI hope that helps! Let me know if you need any further information.'
)
)
],
created=1710498360
)
# Stream new tokens one by one
>>> for token in client.chat_completion(messages, max_tokens=10, stream=True):
... print(token)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content='The', role='assistant'), index=0, finish_reason=None)], created=1710498504)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=' capital', role='assistant'), index=0, finish_reason=None)], created=1710498504)
(...)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=' may', role='assistant'), index=0, finish_reason=None)], created=1710498504)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=None, role=None), index=0, finish_reason='length')], created=1710498504)
- Implement
InferenceClient.chat_completion
+ use new types for text-generation by @Wauplin in #2094 - Fix InferenceClient.text_generation for non-tgi models by @Wauplin in #2136
- #2153 by @Wauplin in #2153
Inference types
We are currently working towards more consistency in tasks definitions across the Hugging Face ecosystem. This is no easy job but a major milestone has recently been achieved! All inputs and outputs of the main ML tasks are now fully specified as JSONschema objects. This is the first brick needed to have consistent expectations when running inference across our stack: transformers (Python), transformers.js (Typescript), Inference API (Python), Inference Endpoints (Python), Text Generation Inference (Rust), Text Embeddings Inference (Rust), InferenceClient (Python), Inference.js (Typescript), etc.
Integrating those definitions will require more work but huggingface_hub
is one of the first tools to integrate them. As a start, all InferenceClient
return values are now typed dataclasses. Furthermore, typed dataclasses have been generated for all tasks' inputs and outputs. This means you can now integrate them in your own library to ensure consistency with the Hugging Face ecosystem. Specifications are open-source (see here) meaning anyone can access and contribute to them. Python's generated classes are documented here.
Here is a short example showcasing the new output types:
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.object_detection("people.jpg"):
[
ObjectDetectionOutputElement(
score=0.9486683011054993,
label='person',
box=ObjectDetectionBoundingBox(xmin=59, ymin=39, xmax=420, ymax=510)
),
...
]
Note that those dataclasses are backward-compatible with the dict-based interface that was previously in use. In the example above, both ObjectDetectionBoundingBox(...).xmin
and ObjectDetectionBoundingBox(...)["xmin"]
are correct, even though the former should be the preferred solution from now on.
- Generate inference types + start using output types by @Wauplin in #2036
- Add = None at optional parameters by @LysandreJik in #2095
- Fix inference types shared between tasks by @Wauplin in #2125
🧩 ModelHubMixin
ModelHubMixin
is an object that can be used as a parent class for the objects in your library in order to provide built-in serialization methods to upload and download pretrained models from the Hub. This mixin is adapted into a PyTorchHubMixin
that can serialize and deserialize any Pytorch model. The 0.22 release brings its share of improvements to these classes:
- Better support of init values. If you instantiate a model with some custom arguments, the values will be automatically stored in a config.json file and restored when reloading the model from pretrained weights. This should unlock integrations with external libraries in a much smoother way.
- Library authors integrating the hub mixin can now define custom metadata for their library: library name, tags, document url and repo url. These are to be defined only once when integrating the library. Any model pushed to the Hub using the library will then be easily discoverable thanks to those tags.
- A base modelcard is generated for each saved model. This modelcard includes default tags (e.g.
model_hub_mixin
) and custom tags from the library (see 2.). You can extend/modify this modelcard by overwriting thegenerate_model_card
method.
>>> import torch
>>> import torch.nn as nn
>>> from huggingface_hub import PyTorchModelHubMixin
# Define your Pytorch model exactly the same way you are used to
>>> class MyModel(
... nn.Module,
... PyTorchModelHubMixin, # multiple inheritance
... library_name="keras-nlp",
... tags=["keras"],
... repo_url="https://github.com/keras-team/keras-nlp",
... docs_url="https://keras.io/keras_nlp/",
... # ^ optional metadata to generate model card
... ):
... def __init__(self, hidden_size: int = 512, vocab_size: int = 30000, output_size: int = 4):
... super().__init__()
... self.param = nn.Parameter(torch.rand(hidden_size, vocab_size))
... self.linear = nn.Linear(output_size, vocab_size)
... def forward(self, x):
... return self.linear(x + self.param)
# 1. Create model
>>> model = MyModel(hidden_size=128)
# Config is automatically created based on input + default values
>>> model._hub_mixin_config
{"hidden_size": 128, "vocab_size": 30000, "output_size": 4}
# 2. (optional) Save model to local directory
>>> model.save_pretrained("path/to/my-awesome-model")
# 3. Push model weights to the Hub
>>> model.push_to_hub("my-awesome-model")
# 4. Initialize model from the Hub => config has been preserved
>>> model = MyModel.from_pretrained("username/my-awesome-model")
>>> model._hub_mixin_config
{"hidden_size": 128, "vocab_size": 30000, "output_size": 4}
# Model card has been correctly populated
>>> from huggingface_hub import ModelCard
>>> card = ModelCard.load("username/my-awesome-model")
>>> card.data.tags
["keras", "pytorch_model_hub_mixin", "model_hub_mixin"]
>>> card.data.library_name
"keras-nlp"
For more details on how to integrate these classes, check out the integration guide.
- Fix
ModelHubMixin
: pass config when__init__
accepts **kwargs by @Wauplin in #2058 - [PyTorchModelHubMixin] Fix saving model with shared tensors by @NielsRogge in #2086
- Correctly inject config in
PytorchModelHubMixin
by @Wauplin in #2079 - Fix passing kwargs in PytorchHubMixin by @Wauplin in #2093
- Generate modelcard in
ModelHubMixin
by @Wauplin in #2080 - Fix ModelHubMixin: save config only if doesn't exist by @Wauplin in [#2105...
[v0.21.4] Hot-fix: Fix saving model with shared tensors
Release v0.21 introduced a breaking change make it impossible to save a PytorchModelHubMixin
-based model that has shared tensors. This has been fixed in #2086.
Full Changelog: v0.21.3...v0.21.4
[v0.21.3] Hot-fix: ModelHubMixin pass config when `__init__` accepts `**kwargs`
More details in #2058.
Full Changelog: v0.21.2...v0.21.3
v0.21.2: hot-fix: [HfFileSystem] Fix glob with pattern without wildcards
See #2056. (+#2050 shipped as v0.21.1).
Full Changelog: v0.21.0...v0.21.2