Add pytorch/inference/{cpu,gpu}/2.3.1/transformers/4.48.0/py311
#135
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR bumps the version of the
huggingface-inference-toolkit
to 0.5.4 to release the latest PyTorch DLC for Inference, that comes with bumped versions fortransformers
,diffusers
,huggingface_hub
andaccelerate
.The dependency bump for
transformers
mainly introduces new architectures such as ModernBERT, ColPali, Falcon 3, etc., as well as several fixes and improvements overall. Alsodiffusers
comes with newtext-to-image
pipelines for SANA and Flux Control.Read more about the latest releases for each dependency on their respective release notes:
transformers
at https://github.com/huggingface/transformers/releases/tag/v4.48.0diffusers
at https://github.com/huggingface/diffusers/releases/tag/v0.32.2accelerate
at https://github.com/huggingface/accelerate/releases/tag/v1.2.1huggingface_hub
at https://github.com/huggingface/huggingface_hub/releases/tag/v0.27.0Additionally, this PR updates the
entrypoint.sh
for it to be more robust and consistent in formatting, while also adding therequirements.txt
installation when theHF_MODEL_DIR
is set, as when running the PyTorch Inference DLC on GKE using a mount or a volume with custom code, therequirements.txt
where not being installed and custom code with custom requirements couldn't be used, but now therequirements.txt
will be installed if a path inHF_MODEL_ID
,HF_MODEL_DIR
orAIP_STORAGE_URI
is provided.Finally, it also adds
flash-attn
as a dependency for the GPU image, so as to benefit from it, which is useful in some scenarios as e.g.answerdotai/ModernBERT-base
as described in https://huggingface.co/answerdotai/ModernBERT-base#usage; among may others.