Releases · huggingface/transformers

22 Aug 13:11

v4.32.0

41aef33

IDEFICS

The IDEFICS model was proposed in OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents by Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh

IDEFICS is the first open state-of-the-art visual language model at the 80B scale!

The model accepts arbitrary sequences of image and text and produces text, similarly to a multimodal ChatGPT.

Blogpost: hf.co/blog/idefics
Playground: HuggingFaceM4/idefics_playground

new model: IDEFICS via HuggingFaceM4 by @stas00 in #24796

MPT

MPT has been added and is now officially supported within Transformers. The repositories from MosaicML have been updated to work best with the model integration within Transformers.

[MPT] Add MosaicML's MPT model to transformers by @ArthurZucker & @younesbelkada in #24629

GPTQ Integration

GPTQ quantization is now supported in Transformers, through the optimum library. The backend relies on the auto_gptq library, from which we use the GPTQ and QuantLinear classes.

See below for an example of the API, quantizing a model using the new GPTQConfig configuration utility.

from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
model_name = "facebook/opt-125m"

tokenizer = AutoTokenizer.from_pretrained(model_name)
config = GPTQConfig(bits=4, dataset = "c4", tokenizer=tokenizer,  group_size=128, desc_act=False)
# works also with device_map (cpu offload works but not disk offload)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, quantization_config=config)

Most models under TheBloke namespace with the suffix GPTQ should be supported, for example, to load a GPTQ quantized model on TheBloke/Llama-2-13B-chat-GPTQ simply run (after installing latest optimum and auto-gptq libraries):

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TheBloke/Llama-2-13B-chat-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

For more information about this feature, we recommend taking a look at the following announcement blogpost: https://huggingface.co/blog/gptq-integration

GPTQ integration by @SunMarc in #25062

Pipelines

A new pipeline, dedicated to text-to-audio and text-to-speech models, has been added to Transformers. It currently supports the 3 text-to-audio models integrated into transformers: SpeechT5ForTextToSpeech, MusicGen and Bark.

See below for an example:

from transformers import pipeline

classifier = pipeline(model="suno/bark")
output = pipeline("Hey it's HuggingFace on the phone!")

audio = output["audio"]
sampling_rate = output["sampling_rate"]

Add Text-To-Speech pipeline by @ylacombe in #24952

Classifier-Free Guidance decoding

Classifier-Free Guidance decoding is a text generation technique developed by EleutherAI, announced in this paper. With this technique, you can increase prompt adherence in generation. You can also set it up with negative prompts, ensuring your generation doesn't go in specific directions. See its docs for usage instructions.

add CFG for .generate() by @Vermeille in #24654

Task guides

A new task guide going into Visual Question Answering has been added to Transformers.

VQA task guide by @MKhalusova in #25244

Model deprecation

We continue the deprecation of models that was introduced in #24787.

By deprecating, we indicate that we will stop maintaining such models, but there is no intention of actually removing those models and breaking support for them (they might one day move into a separate repo/on the Hub, but we would still add the necessary imports to make sure backward compatibility stays). The main point is that we stop testing those models. The usage of the models drives this choice and aims to ease the burden on our CI so that it may be used to focus on more critical aspects of the library.

Deprecate unused OpenLlama architecture by @tomaarsen in #24922

Translation Efforts

There are ongoing efforts to translate the transformers' documentation in other languages. These efforts are driven by groups independent to Hugging Face, and their work is greatly appreciated further to lower the barrier of entry to ML and Transformers.

If you'd like to kickstart such an effort or help out on an existing one, please feel free to reach out by opening an issue.

🌐 [i18n-KO] Translatedtasks/document_question_answering.md to Korean by @jungnerd in #24588
🌐 [i18n-KO] Fixed Korean and English quicktour.md by @wonhyeongseo in #24664
🌐 [i18n-KO] Updated Korean serialization.md by @wonhyeongseo in #24686
🌐 [i18n-KO] Translated performance.md to Korean by @augustinLib in #24883
🌐 [i18n-KO] Translated testing.md to Korean by @Sunmin0520 in #24900
🌐 [i18n-KO] Translated perf_train_cpu.md to Korean by @seank021 in #24911
🌐 [i18n-KO] Translated <tf_xla>.md to Korean by @54data in #24904
🌐 [i18n-KO] Translated perf_hardware.md to Korean by @augustinLib in #24966
🌐 [i18n-KO] Translated hpo_train.md to Korean by @harheem in #24968
🌐 [i18n-KO] Translated perf_infer_cpu.md to Korean by @junejae in #24920
🌐 [i18n-KO] Translated pipeline_webserver.md to Korean by @kihoon71 in #24828
🌐 [i18n-KO] Translated transformers_agents.md to Korean by @sim-so in #24881
🌐 [i18n-KO] Translated perf_infer_gpu_many.md to Korean by @heuristicwave in #24943
🌐 [i18n-KO] Translated perf_infer_gpu_one.md to Korean by @eenzeenee in #24978
🌐 [i18n-KO] Translated add_tensorflow_model.md to Korean by @keonju2 in #25017
🌐 [i18n-KO] Translated perf_train_cpu_many.md to Korean by @nuatmochoi in #24923
🌐 [i18n-KO] Translated add_new_model.md to Korean by @mjk0618 in #24957
🌐 [i18n-KO] Translated model_summary.md to Korean by @0525hhgus in #24625
🌐 [i18n-KO] Translated philosophy.md to Korean by @TaeYupNoh in #25010
🌐 [i18n-KO] Translated perf_train_tpu_tf.md to Korean by @0525hhgus in #25433
🌐 [i18n-KO] Translated docs: ko: pr_checks.md to Korean by @sronger in #24987

Explicit input data format for image processing

Addition of input_data_format argument to image transforms and ImageProcessor methods, allowing the user to explicitly set the data format of the images being processed. This enables processing of images with non-standard number of channels e.g. 4 or removes error which occur when the data format was inferred but the channel dimension was ambiguous.

import numpy as np
from transformers import ViTImageProcessor

img = np.random.randint(0, 256, (4, 6, 3))
image_processor = ViTImageProcessor()
inputs = image_processor(img, image_mean=0, image_std=1, input_data_format="channels_first")

Input data format by @amyeroberts in #25464
Add input_data_format argument, image transforms by @amyeroberts in #25462

Documentation clarification about efficient inference through `torch.scaled_dot_product_attention` & Flash Attention

Users are not aware that it is possible to force dispatch torch.scaled_dot_product_attention method from torch to use Flash Attention kernels. This leads to considerable speedup and memory saving, and is also compatible with quantized models. We decided to make this explicit to users in the documentation.

[Docs / BetterTransformer ] Added more details about flash attention + SDPA : #25265

In a nutshell, one can just run:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m").to("cuda")

# convert the model to BetterTransformer
model.to_bettertransformer()

input_text = "Hello my dog is cute and"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
    outputs = model.generate(**inputs)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

to enable Flash-attenion in their model. However, this feature does not support padding yet.

FSDP and DeepSpeed Changes

Users will no longer encounter CPU RAM OOM when using FSDP to train very large models in multi-gpu or multi-node multi-gpu setting.
Users no longer have to pass fsdp_transformer_layer_cls_to_wrap as the code now use _no_split_modules by default which is available for most of the popular models. DeepSpeed Z3 init now works properly with Accelerate Launcher + Trainer.

add util for ram efficient loading of model when using fsdp by @pacman100 in #25107
fix fsdp checkpointing issues by @pacman100 in #24926
fsdp fixes and enhancements by @pacman100 in #24980
fix deepspeed load best model at end when the model gets sharded by @pacman100 in #25057
resolving zero3 init when using accelerate config with Trainer by @pacman100 in #25227
fix z3 init when using accelerate launcher by @pacman100 in #25589

Breaking changes

Default optimizer in the `Trainer` class

The defaul...

Contributors

tmc, kashif, and 100 other contributors

Assets 2

18 Jul 20:16

LysandreJik

v4.31.0

e42587f

v4.31.0: Llama v2, MusicGen, Bark, MMS, EnCodec, InstructBLIP, Umt5, MRa, vIvIt

New models

Llama v2

Llama 2 was proposed in LLaMA: Open Foundation and Fine-Tuned Chat Models by Hugo Touvron et al. It builds upon the Llama architecture adding Grouped Query Attention for efficient inference.

Add support for Llama 2 by @ArthurZucker in #24891

Musicgen

The MusicGen model was proposed in the paper Simple and Controllable Music Generation by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Défossez.

MusicGen is a single stage auto-regressive Transformer model capable of generating high-quality music samples conditioned on text descriptions or audio prompts. The text descriptions are passed through a frozen text encoder model to obtain a sequence of hidden-state representations. MusicGen is then trained to predict discrete audio tokens, or audio codes, conditioned on these hidden-states. These audio tokens are then decoded using an audio compression model, such as EnCodec, to recover the audio waveform.

Through an efficient token interleaving pattern, MusicGen does not require a self-supervised semantic representation of the text/audio prompts, thus eliminating the need to cascade multiple models to predict a set of codebooks (e.g. hierarchically or upsampling). Instead, it is able to generate all the codebooks in a single forward pass.

Add Musicgen by @sanchit-gandhi in #24109

Bark

Bark is a transformer-based text-to-speech model proposed by Suno AI in suno-ai/bark.

Add bark by @ylacombe in #24086

MMS

The MMS model was proposed in Scaling Speech Technology to 1,000+ Languages by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli

Add MMS CTC Fine-Tuning by @patrickvonplaten in #24281

EnCodec

The EnCodec neural codec model was proposed in High Fidelity Neural Audio Compression by Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi.

Add EnCodec model by @hollance in #23655

InstructBLIP

The InstructBLIP model was proposed in InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. InstructBLIP leverages the BLIP-2 architecture for visual instruction tuning.

Add InstructBLIP by @NielsRogge in #23460

Umt5

The UMT5 model was proposed in UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.

[Umt5] Add google's umt5 to transformers by @ArthurZucker in #24477

MRA

The MRA model was proposed in Multi Resolution Analysis (MRA) for Approximate Self-Attention by Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, and Vikas Singh.

Add Multi Resolution Analysis (MRA) by @novice03 in #24513

ViViT

The Vivit model was proposed in ViViT: A Video Vision Transformer by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid. The paper proposes one of the first successful pure-transformer based set of models for video understanding.

Add ViViT by @jegork in #22518

Python 3.7

The last version to support Python 3.7 was 4.30.x, as it reached end-of-life on June 27, 2023 and is no longer supported by the Python Software Foundation.

⚠️ Time to say goodbye to py37 by @ydshieh in #24091

PyTorch 1.9

The last version to support PyTorch 1.9 was 4.30.x. As it has been more than 2 years, and we're looking forward to using features available in PyTorch 1.10 and up, we do not support PyTorch 1.9 for v4.31 and up.

Byebye pytorch 1.9 by @ydshieh in #24080

RoPE scaling

This PR adds RoPE scaling to the LLaMa and GPTNeoX families of models. It allows us to extrapolate and go beyond the original maximum sequence length (e.g. 2048 tokens on LLaMA), without fine-tuning. It offers two strategies:

Linear scaling
Dynamic NTK scaling

Llama/GPTNeoX: add RoPE scaling by @gante in #24653

Agents

Tools now return a type that is specific to agents. This type can return a serialized version of itself (a string), that either points to a file on-disk or to the object's content. This should make interaction with text-based systems much simpler.

Tool types by @LysandreJik in #24032

Tied weights load

Models with potentially tied weights dropped off some keys from the state dict even when the weights were not tied. This has now been fixed and more generally, the whole experience of loading a model with state dict that don't match exactly should be improved in this release.

Tied weights load by @sgugger in #24310
Clean load keys by @sgugger in #24505

Whisper word-level timestamps

This PR adds a method of predicting timestamps at the word (or even token) level, by analyzing the cross-attentions and applying dynamic time warping.

add word-level timestamps to Whisper by @hollance in #23205

Auto model addition

A new auto model is added, AutoModelForTextEncoding. It is to be used when you want to extract the text encoder from an encoder-decoder architecture.

[AutoModel] Add AutoModelForTextEncoding by @sanchit-gandhi in #24305

Model deprecation

Transformers is growing a lot and to ease a bit the burden of maintenance on our side, we have taken the decision to deprecate models that are not used a lot. Those models will never actually disappear from the library, but we will stop testing them or accepting PRs modifying them.
(enfin ça
The criteria to identify models to deprecate was less than 1,000 unique downloads in the last 30 days for models that are at least one year old. The list of deprecated models is:

BORT
M-CTC-T
MMBT
RetriBERT
TAPEX
Trajectory Transformer
VAN

Deprecate models by @sgugger in #24787

Breaking changes

Fixes an issue with stripped spaces for the T5 family tokenizers. If this impacts negatively inference/training with your models, please let us know by opening an issue.

⚠️⚠️[T5Tokenize] Fix T5 family tokenizers⚠️⚠️ by @ArthurZucker in #24565

Bugfixes and improvements

add trust_remote_code option to CLI download cmd by @radames in #24097
Fix typo in Llama docstrings by @Kh4L in #24020
Avoid GPT-2 daily CI job OOM (in TF tests) by @ydshieh in #24106
[Lllama] Update tokenization code to ensure parsing of the special tokens [core] by @ArthurZucker in #24042
PLAM => PaLM by @xingener in #24129
[bnb] Fix bnb config json serialization by @younesbelkada in #24137
Correctly build models and import call_context for older TF versions by @Rocketknight1 in #24138
Generate: PT's top_p enforces min_tokens_to_keep when it is 1 by @gante in #24111
fix bugs with trainer by @pacman100 in #24134
Fix TF Rag OOM issue by @ydshieh in #24122
Fix SAM OOM issue on CI by @ydshieh in #24125
Fix XGLM OOM on CI by @ydshieh in #24123
[SAM] Fix sam slow test by @younesbelkada in #24140
[lamaTokenizerFast] Update documentation by @ArthurZucker in #24132
[BlenderBotSmall] Update doc example by @ArthurZucker in #24092
Fix Pipeline CI OOM issue by @ydshieh in #24124
[documentation] grammatical fixes in image_classification.mdx by @LiamSwayne in #24141
Fix typo in streamers.py by @freddiev4 in #24144
[tests] fix bitsandbytes import issue by @stas00 in #24151
Avoid OOM in doctest CI by @ydshieh in #24139
Fix Wav2Vec2 CI OOM by @ydshieh in #24190
Fix push to hub by @NielsRogge in #24187
Change ProgressCallback to use dynamic_ncols=True by @gmlwns2000 in #24101
[i18n]Translated "attention.mdx" to korean by @kihoon71 in #23878
Generate: force caching on the main model, in assisted generation by @gante in #24177
Fix device issue in OpenLlamaModelTest::test_model_parallelism by @ydshieh in #24195
Update GPTNeoXLanguageGenerationTest by @ydshieh in #24193
typo: fix typos in CONTRIBUTING.md and deepspeed.mdx by @zsj9509 in #24184
Generate: detect special architectures when loaded from PEFT by @gante in #24198
🌐 [i18n-KO] Translated tasks_summary.mdx to Korean by @kihoon71 in #23977
🚨🚨🚨 Replace DataLoader logic for Accelerate in Trainer, remove unneeded tests 🚨🚨🚨 by @muellerzr in #24028
Fix _load_pretrained_model by @SunMarc in #24200
Fix steps bugs in no trainer examples by @Ethan-yt in #24197
Skip RWKV test in past CI by @ydshieh in #24204
Remove unnecessary aten::to overhead in llama by @fxmarty in #24203
Update WhisperForAudioClassification doc example by @ydshieh in #24188
Finish dataloader integration by @muellerzr in #24201
Add the number of model test failures to slack CI report by @ydshieh in #24207
fix: TextIteratorStreamer cannot work with pipeline by @yuanwu2017 in #23641
Update (TF)SamModelIntegrationTest by @ydshieh in #24199
Improving error message when using use_safetensors=True. by @Narsil in #24232
Safely import pytest in testing_utils.py by @amyeroberts in #24241
fix overflow when training mDeberta in fp16 by @sjrl in #24116
deprecate use_mps_device by @pacman100 in #24239
Tied params cleanup by @sgugger in #24211
...

Contributors

kashif, lig, and 105 other contributors

Assets 2

13 Jun 19:29

sgugger

v4.30.2

66fd3a8

v4.30.2: Patch release

Fix push to hubby @NielsRogge in #24187
Fix how we detect the TF package by @Rocketknight1 in #24255

Contributors

Rocketknight1 and NielsRogge

Assets 2

09 Jun 15:58

sgugger

v4.30.1

65a1ec0

v4.30.1 Patch release

Fix bnb config json serialization in #24137 by @younesbelkada
Correctly build models and import call_context for older TF versions in #24138 by @Rocketknight1
Fix bugs with trainer in #24134 by @pacman100

Contributors

Rocketknight1, pacman100, and younesbelkada

Assets 2

08 Jun 18:07

LysandreJik

v4.30.0

fe861e5

v4.30.0: 100k, Agents improvements, Safetensors core dependency, Swiftformer, Autoformer, MobileViTv2, timm-as-a-backbone

100k

Transformers has just reached 100k stars on GitHub, and to celebrate we wanted to highlight 100 projects in the vicinity of transformers and we have decided to create an awesome-transformers page to do just that.

We accept PRs to add projects to the list!

Top 100 by @LysandreJik in #22912
Add LlamaIndex to awesome-transformers.md by @ravi03071991 in #23484
add cleanlab to awesome-transformers tools list by @jwmueller in #23440

4-bit quantization and QLoRA

By leveraging the bitsandbytes library by @TimDettmers, we add 4-bit support to transformers models!

4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by @TimDettmers in #23479

Agents

The Agents framework has been improved and continues to be stabilized. Among bug fixes, here are the important new features that were added:

Local agent capabilities, to load a generative model directly from transformers instead of relying on APIs.
Prompts are now hosted on the Hub, which means that anyone can fork the prompts and update them with theirs, to let other community contributors re-use them
We add an AzureOpenAiAgent class to support Azure OpenAI agents.

Add local agent by @sgugger in #23438
Enable prompts on the Hub by @sgugger in #23662
Add AzureOpenAiAgent by @sgugger in #24058

Safetensors

The safetensors library is a safe serialization framework for machine learning tensors. It has been audited and will become the default serialization framework for several organizations (Hugging Face, EleutherAI, Stability AI).

It has now become a core dependency of transformers.

Making safetensors a core dependency. by @Narsil in #23254

New models

Swiftformer

The SwiftFormer paper introduces a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations in the self-attention computation with linear element-wise multiplications. A series of models called ‘SwiftFormer’ is built based on this, which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed. Even their small variant achieves 78.5% top-1 ImageNet1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2× faster compared to MobileViT-v2.

Add swiftformer by @shehanmunasinghe in #22686

Autoformer

This model augments the Transformer as a deep decomposition architecture, which can progressively decompose the trend and seasonal components during the forecasting process.

[Time-Series] Autoformer model by @elisim in #21891

MobileViTv2

MobileViTV2 is the second version of MobileViT, constructed by replacing the multi-headed self-attention in MobileViT with separable self-attention.

Add MobileViTv2 by @shehanmunasinghe in #22820

PerSAM

PerSAM proposes a minimal modification to SAM to allow dreambooth-like personalization, enabling to segment concepts in new images using just one example.

Add PerSAM [bis] by @NielsRogge in #23659

Timm backbone

We add support for loading timm weights within the AutoBackbone API in transformers. timm models can be instantiated through the TimmBackbone class, and then used with any vision model that needs a backbone.

Add TimmBackbone model by @amyeroberts in #22619

Image to text pipeline conditional support

We add conditional text generation to the image to text pipeline; allowing the model to continue generating an initial text prompt according to an image.

[image-to-text pipeline] Add conditional text support + GIT by @NielsRogge in #23362

TensorFlow implementations

Add TensorFlow implementation of EfficientFormer by @D-Roberts in #22620

Accelerate Migration

A major rework of the internals of the Trainer is underway, leveraging accelerate instead of redefining them in transformers. This should unify both framework and lead to increased interoperability and more efficient development.

Smangrul/accelerate mp integrate by @pacman100 in #23148
Smangrul/accelerate ddp integrate by @pacman100 in #23151
fix trainer slow tests related to hyperparam search by @pacman100 in #24011
remove the extra accelerator.prepare by @pacman100 in #23914
move fsdp handling to accelerate by @pacman100 in #23158
shift torch dynamo handling to accelerate by @pacman100 in #23168
accelerate deepspeed and gradient accumulation integrate by @pacman100 in #23236
fix executable batch size issue by @pacman100 in #24067
fix accelerator prepare during eval only mode by @pacman100 in #24014
reset accelerate env variables after each test by @pacman100 in #24107
Fix translation no_trainer by @muellerzr in #23407
Update error message when Accelerate isn't installed by @muellerzr in #23373
Fix parallel mode check by @muellerzr in #23409
Muellerzr fix deepspeed by @muellerzr in #23657
Update all no_trainer with skip_first_batches by @muellerzr in #23664
Fix sagemaker DP/MP by @muellerzr in #23681
Log the right train_batch_size if using auto_find_batch_size and also log the adjusted value seperately. by @muellerzr in #23800
Up pinned accelerate version by @muellerzr in #24089
Move import check to before state reset by @muellerzr in #23906
Upgrade safetensors version by @muellerzr in #23911
Act on deprecations in Accelerate no_trainer examples by @muellerzr in #24053
Oops, missed one by @muellerzr in #24054

Bugfixes and improvements

chore: allow protobuf 3.20.3 requirement by @jose-turintech in #22759
Fix link displayed for custom tools by @sgugger in #23274
Remove missplaced test file by @sgugger in #23275
Bring back the PR Refactor doctests + add CI to main by @ydshieh in #23271
[gpt] Gpt2 fix half precision causal mask by @younesbelkada in #23256
Temporary tolerance fix for flaky whipser PT-TF equiv. test by @amyeroberts in #23257
Add top_k argument to post-process of conditional/deformable-DETR by @CreatlV in #22787
transformers-cli -> huggingface-cli by @AlpinDale in #23276
Temporarily increase tol for PT-FLAX whisper tests by @amyeroberts in #23288
Added missing " in CHAT_PROMPT_TEMPLATE by @galatolofederico in #23287
Update custom_tools.mdx: fix link by @mishig25 in #23292
Update transformers_agents.mdx by @mishig25 in #23289
Convert numpy arrays to lists before saving the evaluation metrics as json by @harisankar95 in #23268
Fix doctest files fetch issue by @ydshieh in #23277
skip test_run_squad_no_trainer for now by @ydshieh in #23302
Better check for packages availability by @apbard in #23163
Add gradient_checkpointing parameter to FlaxWhisperEncoder by @raghavanone in #23300
Agents extras by @LysandreJik in #23301
Fix broken links in the agent docs by @sgugger in #23297
Fix typo in gradio-tools docs by @freddyaboulton in #23305
Fix image segmentation tool test by @sgugger in #23306
unpin tf prob by @ydshieh in #23293
Revert "search buffers for dtype" by @sgugger in #23308
Remove LanguageIdentificationTool in __init__.py as we don't have it yet by @ydshieh in #23326
Fix docker image (caused by tensorflow_text) by @ydshieh in #23321
Compute the mask in-place, with less memory reads, and on CUDA on XLNetLMHeadModel by @lezcano in #23332
Only add files with modification outside doc blocks by @ydshieh in #23327
[docs] Fix Agents and Tools docstring by @stevhliu in #23313
OR am I crazy? by @hwuebben in #23295
Handle padding warning in generation when using inputs_embeds by @zrthxn in #23131
replaced assert with raise ValueError for t5, switch_transformers, pix2struct, mt5, longt5, gptsan_japanese. by @susnato in #23273
Use cu118 with cudnn >= 8.6 in docker file by @ydshieh in #23339
Removing one of the twice defined position_embeddings in LongFormer by @GregorySenay in #23343
Fix issue introduced in PR #23163 by @ydshieh in #23363
Typo suggestion by @richardachen in #23360
Fix some is_xxx_available by @ydshieh in #23365
Fix BigBirdForMaskedLM doctest by @ydshieh in #23369
Fix OwlViTForObjectDetection.image_guided_detection doc example by @ydshieh in #23370
Revert "Only add files with modification outside doc blocks" by @ydshieh in #23371
[Bugfix] OPTDecoderLayer does not return attentions when gradient_checkpointing and training is enabled. by @gmlwns2000 in #23367
Skip failing AlignModelTest::test_multi_gpu_data_parallel_forward by @ydshieh in #23374
Fix test typos - audio feature extractors by @LWprogramming in #23310
Added type hints for Graphormer pytorch version by @dewasahu2003 in #23073
Replace NumPy Operations with JAX NumPy Equivalents for JIT Compilation Compatibility by @gojiteji in #23356
Use mkstemp to replace deprecated mktemp by @ready-research in #23372
Fix RwkvModel by @ydshieh in #23392
Update test_batched_inference_image_captioning_conditioned by @ydshieh in #23391
OPT/BioGPT: Improved attention mask shape exception by @gante in #23270
Fix chat prompt in HFAgent by @IvanSedykh in #23335
🌐 [i18n-KO] Translated asr.mdx to Korean by @sim-so in #23106
Minor fixes in transformers-tools by @Wauplin in #23364
[Pix2Struct] Add conditional generation on docstring example by @younesbelkada in #23399
Generate: faster can_generate check on TF and Flax by @gante in #23398
[AutoModel] fix torch_dtype=auto in from_pretrained by @stas00 in #23379
Docs: add link to assisted generation blog post by @gante in #23397
Build with non Python files by @sgugger in #23405
Generate: add test to check KV format by @gante in #23403
Replace appends with list compr...

Contributors

kashif, ezyang, and 105 other contributors

Assets 2

16 May 19:47

sgugger

v4.29.2

ba70545

v4.29.2: Patch release

Fixes the package so non-Python files (like CUDA kernels) are properly included.

Assets 2

11 May 20:45

sgugger

v4.29.1

118e981

V4.29.1: Patch release

Reverts a regression in the FSDP integration.
Add pip install transformers["agent"] to have all dependencies agents rely on.
Fixes the documentation about agents.

Revert "search buffers for dtype" in #23308 by @sgugger
Fix image segmentation tool test in #23306 by @sgugger
Fix typo in gradio-tools docs in #23305 by @freddyaboulton
Fix broken links in the agent docs in #23297 by @sgugger
Agents extras in #23301 by @LysandreJik
Update transformers_agents.mdx in #23289 by @mishig25
Update custom_tools.mdx: fix link in #23292 by @mishig25

Contributors

mishig25, LysandreJik, and 2 other contributors

Assets 2

10 May 21:55

LysandreJik

v4.29.0

15f260a

v4.29.0: Transformers Agents, SAM, RWKV, FocalNet, OpenLLaMa

Transformers Agents

Transformers Agent is a new API that lets you use the library and Diffusers by prompting an agent (which is a large language model) in natural language. That agent will then output code using a set of predefined tools, leveraging the appropriate (and state-of-the-art) models for the task the user wants to perform. It is fully multimodal and extensible by the community. Learn more in the docs

Transformers Agents by @LysandreJik @patrickvonplaten and @sgugger in #23214

SAM

SAM (Segment Anything Model) was proposed in Segment Anything by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.

The model can be used to predict segmentation masks of any object of interest given an input image.

Add Segment Anything Model (SAM) by @ArthurZucker in #22654
[SAM] Correct arxiv link by @younesbelkada in #22886
Fix SAM example in documentation by @fxmarty in #22887
[SAM] Change to facebook/sam-vit-base by @younesbelkada in #22891
Small sam patch by @ArthurZucker in #22920
[SAM] Add sam doc by @younesbelkada in #22984
Make sam ONNX exportable by @fxmarty in #22915
DocumentQuestionAnsweringPipeline only for fast ⚡ tokenizers by @ydshieh in #22745
Add automatic-mask-generation pipeline for Segment Anything Model (SAM) by @ArthurZucker in #22840
Expose AutoModelForMaskGeneration by @fxmarty in #22910

RWKV

RWKV suggests a tweak in the traditional Transformer attention to make it linear. This way, the model can be used as recurrent network: passing inputs for timestamp 0 and timestamp 1 together is the same as passing inputs at timestamp 0, then inputs at timestamp 1 along with the state of timestamp 0 (see example below).

This can be more efficient than a regular Transformer and can deal with sentence of any length (even if the model uses a fixed context length for training).

Add RWKV-4 by @sgugger and @younesbelkada in #22797

FocalNet

The FocalNet model was proposed in Focal Modulation Networks by Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. FocalNets completely replace self-attention (used in models like ViT and Swin) by a focal modulation mechanism for modeling token interactions in vision. The authors claim that FocalNets outperform self-attention based models with similar computational costs on the tasks of image classification, object detection, and segmentation.

Add FocalNet by @NielsRogge in #21532
Add focalnet backbone by @alaradirik in #23104

OpenLLaMa

The Open-Llama model was proposed in Open-Llama project by community developer s-JoL.

The model is mainly based on LLaMA with some modifications, incorporating memory-efficient attention from Xformers, stable embedding from Bloom, and shared input-output embedding from PLAM. And the model is pre-trained on both Chinese and English, which gives it better performance on Chinese language tasks.

add open-llama model with ckpt by @s-JoL in #22795

Assisted Generation

Assisted generation is a new technique that lets you speed up generation with large language models by using a smaller model as assistant. The assistant model will be the ones doing multiple forward pass while the LLM will merely validate the tokens proposed by the assistant. This can lead to speed-ups up to 10x!

Generate: Add assisted generation by @gante in #22211
Generate: assisted generation with sample (take 2) by @gante in #22949

Code on the Hub from another repo

To avoid duplicating the model code in multiple repos when using the code on the Hub feature, loading such models will now save in their config the repo in which the code is. This way there is one source of ground truth for code on the Hub models.

Use code on the Hub from another repo by @sgugger in #22698
Use code on the Hub from another repo by @sgugger in #22814

Breaking changes

This releases has three breaking changes compared to version v4.28.0.

The first one focuses on fixing training issues for Pix2Struct. This slightly affects the results, but should result in the model training much better.

🚨🚨🚨 [Pix2Struct] Attempts to fix training issues 🚨🚨🚨 by @younesbelkada in #23004

The second one is aligning the ignore index in the LUKE model to other models in the library. This breaks the convention that models should stick to their original implementation, but it was necessary in order to align with other transformers in the library

🚨🚨🚨 Use default ignore index in Luke by @sgugger in #23014

Finally, the third breaking change aims to harmonize the training procedure for most of recent additions in transformers. It should be users' responsibility to fill_mask the padding tokens of the labels with the correct value. This PR addresses the issue that was raised by other architectures such as Luke or Pix2Struct

🚨🚨🚨 [Blip] remove labels masking by @younesbelkada in #23024

Bugfixes and improvements

Change torch_dtype to str when saved_model=True in save_pretrained for TF models by @ydshieh in #22740
🌐 [i18n-KO] Translated training.mdx to Korean by @gabrielwithappy in #22670
Remove DS_BUILD_AIO=1 by @ydshieh in #22741
[trainer] update url by @stas00 in #22747
fix(llama): fix LlamaTokenzier by @rockmagma02 in #22746
Generate: handle text conditioning with multimodal encoder-decoder models by @gante in #22748
Revert (for now) the change on Deta in #22437 by @ydshieh in #22750
Fix serving_output for TF composite models (encoder-decoder like models) by @ydshieh in #22743
🌐 [i18n-KO] Translated sequence_classification.mdx to Korean by @0525hhgus in #22655
[Examples] TPU-based training of a language model using TensorFlow by @sayakpaul in #21657
Pix2struct: doctest fix by @gante in #22761
Generate: pin number of beams in BART test by @gante in #22763
Fix a mistake in Llama weight converter log output. by @aljungberg in #22764
Fix failing torchscript tests for CpmAnt model by @ydshieh in #22766
[WIP]🌐 [i18n-KO] Translated tutorial/proprecssing.mdx to Korean by @sim-so in #22578
Tweak ESM tokenizer for Nucleotide Transformer by @Rocketknight1 in #22770
Fix word_ids hyperlink by @mayankagarwals in #22765
Seq2SeqTrainer: Evict decoder_input_ids only when it is created from labels by @gante in #22772
Indexing fix - CLIP checkpoint conversion by @amyeroberts in #22776
Move labels to the same device as logits for Whisper by @oscar-garzon in #22779
Generate: add CJK support to TextStreamer by @bcol23 in #22664
Fix test_word_time_stamp_integration for Wav2Vec2ProcessorWithLMTest by @ydshieh in #22800
🌐 [i18n-KO] Translated custom_models.mdx to Korean by @HanNayeoniee in #22534
[i18n-KO] fix: docs: ko: sagemaker anchors and _toctree.yml by @jungnerd in #22549
improve(llama): Faster apply_rotary_pos_emb by @fpgaminer in #22785
Fix sneaky torch dependency in TF example by @Rocketknight1 in #22804
🌐 [i18n-KO] Translated tasks/translation.mdx to Korean by @wonhyeongseo in #22805
Don't use LayoutLMv2 and LayoutLMv3 in some pipeline tests by @ydshieh in #22774
Fix squeeze into torch 1.x compatible form in llama model by @DyeKuu in #22808
Remove accelerate from tf test reqs by @muellerzr in #22777
Simplify update metadata job by @sgugger in #22811
Revert "Use code on the Hub from another repo" by @sgugger in #22813
Introduce PartialState as the device handler in the Trainer by @muellerzr in #22752
Mark auto models as important by @sgugger in #22815
TTS fine-tuning for SpeechT5 by @hollance in #21824
🌐 [i18n-KO] Fix anchor links for docs auto_tutorial, training by @gabrielwithappy in #22796
Fix Past CI not running against the latest main by @ydshieh in #22823
Fix test_eos_token_id_int_and_list_top_k_top_sampling by @ydshieh in #22826
Update accelerate version + warning check fix by @muellerzr in #22833
Fix from_pretrained when model is instantiated on the meta device by @sgugger in #22837
Raise err if minimum Accelerate version isn't available by @muellerzr in #22841
Make ClipSeg compatible with model parallelism by @youssefadr in #22844
fix SpeechT5 doc comments by @hollance in #22854
move preprocess_logits_for_metrics before _nested_gather in trainer.e… by @ChenyangLiu in #22603
feat(model parallelism): move labels to the same device as logits for M2M100 by @elabongaatuo in #22850
use accelerate@main in CI by @ydshieh in #22859
Remove 'main' from doc links by @amyeroberts in #22860
Show diff between 2 CI runs on Slack reports by @ydshieh in #22798
Remove some pipeline skip cases by @ydshieh in #22865
Fixup multigpu local_rank by @muellerzr in #22869
Fix to removing ESM special tokens by @Rocketknight1 in #22870
XGLM: Fix left-padding (PT and TF) by @gante in #22828
Patching clip model to create mask tensor on the device by @shanmugamr1992 in #22711
fix: Correct small typo in docstring by @oscar-defelice in #22857
Generation: only search for eos_token if set by @xloem in #22875
Change schedule CI time by @ydshieh in #22884
fix warning function call creating logger error (max_length and max_new_tokens) by @QuentinAmbard in #22889
[Examples/TensorFlow] minor refactoring to allow compatible datasets to work by @sayakpaul in #22879
moved labels to the same device as logits for OTP, CODEGEN ,gptj and pixel2struct model by @sushmanthreddy in #2...

Contributors

QuentinAmbard, aljungberg, and 91 other contributors

Assets 2

14 Apr 16:57

sgugger

v4.28.1

04ab560

v4.28.1: Patch release

Fixes a regression for DETA models

Revert the change on Deta by @ydshieh in #22750

Contributors

ydshieh

Assets 2

13 Apr 18:40

LysandreJik

v4.28.0

9417c92

v4.28.0: LLaMa, Pix2Struct, MatCha, DePlot, MEGA, NLLB-MoE, GPTBigCode

LLaMA

The LLaMA model was proposed in LLaMA: Open and Efficient Foundation Language Models. It is a collection of foundation language models ranging from 7B to 65B parameters. You can request access to the weights here then use the conversion script to generate a checkpoint compatible with Hugging Face

LLaMA Implementation by @zphang in #21955

Pix2Struct, MatCha, DePlot

Pix2Struct is a pretrained image-to-text model for purely visual language understanding, which can be finetuned on tasks containing visually-situated language. Pix2Struct has been fine-tuned on various tasks and datasets, ranging from image captioning and visual question answering (VQA) over different inputs (books, charts, science diagrams) to captioning UI components, and others.

Add Pix2Struct by @younesbelkada in #21400
Add DePlot + MatCha on transformers by @younesbelkada in #22528

Mega

MEGA proposes a new approach to self-attention with each encoder layer having a multi-headed exponential moving average in addition to a single head of standard dot-product attention, giving the attention mechanism stronger positional biases. This allows MEGA to perform competitively to Transformers on standard benchmarks including LRA while also having significantly fewer parameters. MEGA’s compute efficiency allows it to scale to very long sequences, making it an attractive option for long-document NLP tasks.

Add Mega: Moving Average Equipped Gated Attention by @mnaylor5 in #21766

GPTBigCode

The model is a an optimized GPT2 model with support for Multi-Query Attention.

Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) by @jlamypoirier in #22575

NLLB-MoE

The mixture of experts version of the NLLB release has been added to the library.

NLLB-MoE Adds the moe model by @ArthurZucker in #22024

Serializing 8bit models

[bnb] Let's make serialization of int8 models possible by @younesbelkada in #22177

You can now push 8bit models and/or load 8bit models directly from the Hub, save memory and load your 8bit models faster! An example repo here

Breaking Changes

Ordering of height and width for the BLIP image processor

Notes from the PR:

The BLIP image processor incorrectly passed in the dimensions to resize in the order (width, height). This is reordered to be correct.

In most cases, this won't have an effect as the default height and width are the same. However, this is not backwards compatible for custom configurations with different height, width settings and direct calls to the resize method with different height, width values.

🚨🚨🚨 Fix ordering of height, width for BLIP image processor by @amyeroberts in #22466

Prefix tokens for the NLLB tokenizer

The big problem was the prefix and suffix tokens of the NLLB tokenizer.

Previous behaviour:

>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
>>> tokenizer("How was your day?").input_ids
[13374, 1398, 4260, 4039, 248130, 2, 256047]
>>> # 2: '</s>'
>>> # 256047 : 'eng_Latn'

New behaviour

>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
>>> tokenizer("How was your day?").input_ids
[256047, 13374, 1398, 4260, 4039, 248130, 2]

In case you have pipelines that were relying on the old behavior, here is how you would enable it once again:

>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M", legacy_behaviour = True)

🚨🚨🚨 [NLLB Tokenizer] Fix the prefix tokens 🚨🚨🚨 by @ArthurZucker in #22313

TensorFlow ports

The BLIP model is now available in TensorFlow.

Add TF port of BLIP by @Rocketknight1 in #22090

Export TF Generate with a TF tokenizer

As the title says, this PR adds the possibility to export TF generate with a TF-native tokenizer -- the full thing in a single TF graph.

Generate: Export TF generate with a TF tokenizer by @gante in #22310

Task guides

A new task guide has been added, focusing on depth-estimation.

Depth estimation task guide by @MKhalusova in #22205

Bugfixes and improvements

Load optimizer state on CPU to avoid CUDA OOM by @sgugger in #22159
Run all tests by default by @sgugger in #22162
Fix: unfinished_sequences with correct device by @Stxr in #22184
Revert 22152 MaskedImageCompletionOutput changes by @amyeroberts in #22187
Regression pipeline device by @sgugger in #22190
Update BridgeTowerForContrastiveLearning by @abhiwand in #22145
t5 remove data dependency by @prathikr in #22097
Fix DeepSpeed CI by @ydshieh in #22194
Fix typo in Align docs by @alaradirik in #22199
Update expected values in MgpstrModelIntegrationTest by @ydshieh in #22195
Italian Translation of migration.mdx by @Baelish03 in #22183
Update tiny model creation script by @ydshieh in #22202
Temporarily fix ONNX model exporting error by @SatyaJandhyalaAtMS in #21830
[XGLM] Add accelerate support for XGLM by @younesbelkada in #22207
fixes a typo in WhisperFeatureExtractor docs. by @susnato in #22208
Hotfix for natten issue with torch 2.0.0 on CircleCI by @ydshieh in #22218
fix typos in llama.mdx by @keturn in #22223
fix code example in mgp-str doc by @wdp-007 in #22219
Use dash==2.8.1 for now for daily CI by @ydshieh in #22227
LLaMA house-keeping by @sgugger in #22216
fix AutoTP in deepspeed could not work for bloom by @sywangyi in #22196
Add LlamaForSequenceClassification by @lewtun in #22209
Removed .mdx extension in two links by @MKhalusova in #22230
fix(docs): fix task guide links in model docs by @Seb0 in #22226
Fix natten by @alihassanijr in #22229
Revert "Use dash==2.8.1 for now for daily CI" by @ydshieh in #22233
Fix Unnecessary move of tensors from CPU to GPU in LlamaRotaryEmbedding by @ma787639046 in #22234
[trainer] param count for deepspeed zero3 by @stas00 in #22193
Update training_args.py -- a nightly install is not required anymore for torch.compile by @pminervini in #22266
[Docs] fix typos in some tokenizer docs by @yesinkim in #22256
Italian translation perf_infer_cpu by @nickprock in #22243
[Trainer] Add optional communication backends for torch.distributed when using GPU by @heya5 in #22247
Fix the gradient checkpointing bug of the llama model by @yqy2001 in #22270
Fix balanced and auto device_map by @sgugger in #22271
Rework a bit the LLaMA conversion script by @sgugger in #22236
Proper map location for optimizer load by @sgugger in #22273
Fix doc links by @amyeroberts in #22274
Move torch.compile() wrapping after DDP/FSDP wrapping to ensure correct graph breaks during training by @ani300 in #22279
Example of pad_to_multiple_of for padding and truncation guide & docstring update by @MKhalusova in #22278
Update vision docstring bool masked pos by @amyeroberts in #22237
replace_8bit_linear modules_to_not_convert default value fix by @BlackSamorez in #22238
Fix error in mixed precision training of TFCvtModel by @gcuder in #22267
More doctests by @ydshieh in #22268
fix more doctests by @ydshieh in #22292
Add translation perf_infer_gpu_one for it by @davidegazze in #22296
Restore fp16 support on xla gpu device by @ymwangg in #22300
Correct NATTEN function signatures and force new version by @alihassanijr in #22298
[deepspeed] offload + non-cpuadam optimizer exception doc by @stas00 in #22044
Final update of doctest by @ydshieh in #22299
Add MaskedImageModelingOutput by @alaradirik in #22212
Enable traced model for text-generation task by @jiqing-feng in #22265
add low_cpu_mem_usage option in run_clm.py example which will benefit… by @sywangyi in #22288
fix: Allow only test_file in pytorch and flax summarization by @connor-henderson in #22293
Fix position embeddings for GPT-J and CodeGen by @njhill in #22069
Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer by @silentghoul-spec in #22302
Enforce max_memory for device_map strategies by @sgugger in #22311
Beef up Llama tests by @gante in #22314
docs: Resolve incorrect type typo in trainer methods by @tomaarsen in #22316
Chunkable token classification pipeline by @luccailliau in #21771
Fix PipelineTests skip conditions by @ydshieh in #22320
[deepspeed zero3] need generate(synced_gpus=True, ...) by @stas00 in #22242
[gptj] support older pytorch version by @stas00 in #22325
Move common properties to BackboneMixin by @amyeroberts in #21855
Backbone add mixin tests by @amyeroberts in #22542
Backbone add out indices by @amyeroberts in #22493
[MBart] Add accelerate support for MBart by @younesbelkada in #22309
Fixed gradient checkpoint bug for TimeSeriesTransformer by @mollerup23 in #22272
Mention why one needs to specify max_steps in Trainer by @lhoestq in #22333
Fix various imports by @sgugger in #22281
Minor typo in pipeline FillMaskPipeline's documentation. by @SamuelLarkin in #22339
Added type hints to TFDeiTModel by @Batese2001 in #22327
Fix --bf16 option support for Neuron after PR #22300 by @jeffhataws in #22307
Generate: add test for left-padding support by @gante in #22322
Enable training Llama with model or pipeline parallelism by @kooshi in #22329
Automatically create/update tiny models by @ydshieh in #22275
[HFTracer] Make embeddings ops take on the dtype of the weight by @jamesr66a in #22347
Fix...