v4.30.0: 100k, Agents improvements, Safetensors core dependency, Swiftformer, Autoformer, MobileViTv2, timm-as-a-backbone
100k
Transformers has just reached 100k stars on GitHub, and to celebrate we wanted to highlight 100 projects in the vicinity of transformers
and we have decided to create an awesome-transformers page to do just that.
We accept PRs to add projects to the list!
- Top 100 by @LysandreJik in #22912
- Add LlamaIndex to awesome-transformers.md by @ravi03071991 in #23484
- add cleanlab to awesome-transformers tools list by @jwmueller in #23440
4-bit quantization and QLoRA
By leveraging the bitsandbytes
library by @TimDettmers, we add 4-bit support to transformers
models!
- 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by @TimDettmers in #23479
Agents
The Agents framework has been improved and continues to be stabilized. Among bug fixes, here are the important new features that were added:
- Local agent capabilities, to load a generative model directly from
transformers
instead of relying on APIs. - Prompts are now hosted on the Hub, which means that anyone can fork the prompts and update them with theirs, to let other community contributors re-use them
- We add an
AzureOpenAiAgent
class to support Azure OpenAI agents.
- Add local agent by @sgugger in #23438
- Enable prompts on the Hub by @sgugger in #23662
- Add AzureOpenAiAgent by @sgugger in #24058
Safetensors
The safetensors
library is a safe serialization framework for machine learning tensors. It has been audited and will become the default serialization framework for several organizations (Hugging Face, EleutherAI, Stability AI).
It has now become a core dependency of transformers
.
New models
Swiftformer
The SwiftFormer paper introduces a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations in the self-attention computation with linear element-wise multiplications. A series of models called ‘SwiftFormer’ is built based on this, which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed. Even their small variant achieves 78.5% top-1 ImageNet1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2× faster compared to MobileViT-v2.
- Add swiftformer by @shehanmunasinghe in #22686
Autoformer
This model augments the Transformer as a deep decomposition architecture, which can progressively decompose the trend and seasonal components during the forecasting process.
MobileViTv2
MobileViTV2 is the second version of MobileViT, constructed by replacing the multi-headed self-attention in MobileViT with separable self-attention.
- Add MobileViTv2 by @shehanmunasinghe in #22820
PerSAM
PerSAM proposes a minimal modification to SAM to allow dreambooth-like personalization, enabling to segment concepts in new images using just one example.
- Add PerSAM [bis] by @NielsRogge in #23659
Timm backbone
We add support for loading timm
weights within the AutoBackbone
API in transformers
. timm
models can be instantiated through the TimmBackbone
class, and then used with any vision model that needs a backbone.
- Add TimmBackbone model by @amyeroberts in #22619
Image to text pipeline conditional support
We add conditional text generation to the image to text pipeline; allowing the model to continue generating an initial text prompt according to an image.
- [image-to-text pipeline] Add conditional text support + GIT by @NielsRogge in #23362
TensorFlow implementations
- Add TensorFlow implementation of EfficientFormer by @D-Roberts in #22620
Accelerate Migration
A major rework of the internals of the Trainer
is underway, leveraging accelerate
instead of redefining them in transformers
. This should unify both framework and lead to increased interoperability and more efficient development.
- Smangrul/accelerate mp integrate by @pacman100 in #23148
- Smangrul/accelerate ddp integrate by @pacman100 in #23151
- fix trainer slow tests related to hyperparam search by @pacman100 in #24011
- remove the extra
accelerator.prepare
by @pacman100 in #23914 - move fsdp handling to accelerate by @pacman100 in #23158
- shift torch dynamo handling to accelerate by @pacman100 in #23168
- accelerate deepspeed and gradient accumulation integrate by @pacman100 in #23236
- fix executable batch size issue by @pacman100 in #24067
- fix accelerator prepare during eval only mode by @pacman100 in #24014
- reset accelerate env variables after each test by @pacman100 in #24107
- Fix translation no_trainer by @muellerzr in #23407
- Update error message when Accelerate isn't installed by @muellerzr in #23373
- Fix parallel mode check by @muellerzr in #23409
- Muellerzr fix deepspeed by @muellerzr in #23657
- Update all no_trainer with skip_first_batches by @muellerzr in #23664
- Fix sagemaker DP/MP by @muellerzr in #23681
- Log the right train_batch_size if using auto_find_batch_size and also log the adjusted value seperately. by @muellerzr in #23800
- Up pinned accelerate version by @muellerzr in #24089
- Move import check to before state reset by @muellerzr in #23906
- Upgrade safetensors version by @muellerzr in #23911
- Act on deprecations in Accelerate no_trainer examples by @muellerzr in #24053
- Oops, missed one by @muellerzr in #24054
Bugfixes and improvements
-
chore: allow protobuf 3.20.3 requirement by @jose-turintech in #22759
-
Bring back the PR
Refactor doctests + add CI
tomain
by @ydshieh in #23271 -
[
gpt
] Gpt2 fix half precision causal mask by @younesbelkada in #23256 -
Temporary tolerance fix for flaky whipser PT-TF equiv. test by @amyeroberts in #23257
-
Add
top_k
argument to post-process of conditional/deformable-DETR by @CreatlV in #22787 -
transformers-cli
->huggingface-cli
by @AlpinDale in #23276 -
Temporarily increase tol for PT-FLAX whisper tests by @amyeroberts in #23288
-
Added missing " in CHAT_PROMPT_TEMPLATE by @galatolofederico in #23287
-
Convert numpy arrays to lists before saving the evaluation metrics as json by @harisankar95 in #23268
-
skip
test_run_squad_no_trainer
for now by @ydshieh in #23302 -
Add gradient_checkpointing parameter to FlaxWhisperEncoder by @raghavanone in #23300
-
Agents extras by @LysandreJik in #23301
-
Fix typo in gradio-tools docs by @freddyaboulton in #23305
-
Remove
LanguageIdentificationTool
in__init__.py
as we don't have it yet by @ydshieh in #23326 -
Fix docker image (caused by
tensorflow_text
) by @ydshieh in #23321 -
Compute the mask in-place, with less memory reads, and on CUDA on
XLNetLMHeadModel
by @lezcano in #23332 -
Only add files with modification outside doc blocks by @ydshieh in #23327
-
[docs] Fix Agents and Tools docstring by @stevhliu in #23313
-
Handle padding warning in generation when using
inputs_embeds
by @zrthxn in #23131 -
replaced assert with raise ValueError for t5, switch_transformers, pix2struct, mt5, longt5, gptsan_japanese. by @susnato in #23273
-
Use cu118 with cudnn >= 8.6 in docker file by @ydshieh in #23339
-
Removing one of the twice defined position_embeddings in LongFormer by @GregorySenay in #23343
-
Typo suggestion by @richardachen in #23360
-
Fix
OwlViTForObjectDetection.image_guided_detection
doc example by @ydshieh in #23370 -
Revert "Only add files with modification outside doc blocks" by @ydshieh in #23371
-
[Bugfix]
OPTDecoderLayer
does not return attentions whengradient_checkpointing
andtraining
is enabled. by @gmlwns2000 in #23367 -
Skip failing
AlignModelTest::test_multi_gpu_data_parallel_forward
by @ydshieh in #23374 -
Fix test typos - audio feature extractors by @LWprogramming in #23310
-
Added type hints for
Graphormer
pytorch version by @dewasahu2003 in #23073 -
Replace NumPy Operations with JAX NumPy Equivalents for JIT Compilation Compatibility by @gojiteji in #23356
-
Use
mkstemp
to replace deprecatedmktemp
by @ready-research in #23372 -
Update
test_batched_inference_image_captioning_conditioned
by @ydshieh in #23391 -
OPT/BioGPT: Improved attention mask shape exception by @gante in #23270
-
Fix chat prompt in HFAgent by @IvanSedykh in #23335
-
🌐 [i18n-KO] Translated
asr.mdx
to Korean by @sim-so in #23106 -
[
Pix2Struct
] Add conditional generation on docstring example by @younesbelkada in #23399 -
Generate: faster
can_generate
check on TF and Flax by @gante in #23398 -
[AutoModel] fix
torch_dtype=auto
infrom_pretrained
by @stas00 in #23379 -
Docs: add link to assisted generation blog post by @gante in #23397
-
Replace appends with list comprehension. by @ttsugriy in #23359
-
Why crash the whole run when HFHub gives a 50x error? by @ropoctl in #23320
-
Run doctest (in PRs) only when some doc example(s) are modified by @ydshieh in #23387
-
Update
ConvNextV2ModelIntegrationTest::test_inference_image_classification_head
by @ydshieh in #23402 -
Use dict.items to avoid unnecessary lookups. by @ttsugriy in #23415
-
[
SAM
] fix sam slow test by @younesbelkada in #23376 -
Return early once stop token is found. by @ttsugriy in #23421
-
[Reland] search model buffers for dtype as the last resort by @cyyever in #23319
-
Add Missing tokenization test [electra] by @IMvision12 in #22997
-
Small fixes and link in the README by @LysandreJik in #23428
-
TF: embeddings out of bounds check factored into function by @gante in #23427
-
Encoder-Decoder: add informative exception when the decoder is not compatible by @gante in #23426
-
Remove hardcoded prints in Trainer by @hugoabonizio in #23432
-
Fix device issue in
SwiftFormerModelIntegrationTest::test_inference_image_classification_head
by @ydshieh in #23435 -
Generate: skip left-padding tests on old models by @gante in #23437
-
remove unnecessary print in gpt neox sequence classifier by @cfhammill in #23433
-
🌐 [i18n-KO] Translated
tasks/zero_shot_object_detection.mdx
to Korean by @HanNayeoniee in #23430 -
Fix (skip) a pipeline test for
RwkvModel
by @ydshieh in #23444 -
Fix DecisionTransformerConfig doctring by @joaoareis in #23450
-
Make
RwkvModel
acceptattention_mask
but discard it internally by @ydshieh in #23442 -
Less flaky
test_assisted_decoding_matches_greedy_search
by @ydshieh in #23451 -
Add an option to log result from the Agent by @sgugger in #23454
-
fix bug in group_texts function, that was inserting short batches by @BodaSadalla98 in #23429
-
feat: Whisper prompting by @connor-henderson in #22496
-
Remove .data usages in optimizations.py by @alanwaketan in #23417
-
TF port of the Segment Anything Model (SAM) by @Rocketknight1 in #22970
-
[
RWKV
] Rwkv fix for 8bit inference by @younesbelkada in #23468 -
Use config to set name and description if not present by @sgugger in #23473
-
Fix PretrainedConfig
min_length
docstring by @joaoareis in #23471 -
Fix: Change tensors to integers for torch.dynamo and torch.compile compatibility by @loevlie in #23475
-
[
Blip
] Remove redundant shift right by @younesbelkada in #23153 -
Fix confusing
transformers
installation in CI by @ydshieh in #23465 -
Fix
tests/repo_utils/test_get_test_info.py
by @ydshieh in #23485 -
Debug example code for MegaForCausalLM by @Tylersuard in #23382
-
Fix tensor device while attention_mask is not None by @zspo in #23538
-
Fix accelerate logger bug by @younesbelkada in #23650
-
Bugfix: LLaMA layer norm incorrectly changes input type and consumers lots of memory by @TimDettmers in #23535
-
Fix wav2vec2 is_batched check to include 2-D numpy arrays by @LWprogramming in #23223
-
changing the requirements to a cpu torch version that works by @sshahrokhi in #23483
-
Fix SAM tests and use smaller checkpoints by @Rocketknight1 in #23656
-
small fix to remove unused eos in processor when it's not used. by @Narsil in #23408
-
Fix typo in a parameter name for open llama model by @aaalexlit in #23637
-
🌐 [i18n-KO] Translated
tasks/monocular_depth_estimation.mdx
to Korean by @HanNayeoniee in #23621 -
[
SAM
] Fixes pipeline and adds a dummy pipeline test by @younesbelkada in #23684 -
TF version compatibility fixes by @Rocketknight1 in #23663
-
[
Blip
] Fix blip doctest by @younesbelkada in #23698 -
is_batched fix for remaining 2-D numpy arrays by @LWprogramming in #23309
-
Skip
TFCvtModelTest::test_keras_fit_mixed_precision
for now by @ydshieh in #23699 -
fix: load_best_model_at_end error when load_in_8bit is True by @dkqkxx in #23443
-
add GPTJ/bloom/llama/opt into model list and enhance the jit support by @sywangyi in #23291
-
Paged Optimizer + Lion Optimizer for Trainer by @TimDettmers in #23217
-
Export to ONNX doc refocused on using optimum, added tflite by @MKhalusova in #23434
-
fix: use bool instead of uint8/byte in Deberta/DebertaV2/SEW-D to make it compatible with TensorRT by @uchuhimo in #23683
-
Better TF docstring types by @Rocketknight1 in #23477
-
TF SAM memory reduction by @Rocketknight1 in #23732
-
fix: delete duplicate sentences in
document_question_answering.mdx
by @jungnerd in #23735 -
fix: Whisper generate, move text_prompt_ids trim up for max_new_tokens calculation by @connor-henderson in #23724
-
Overhaul TF serving signatures + dummy inputs by @Rocketknight1 in #23234
-
[Whisper] Reduce batch size in tests by @sanchit-gandhi in #23736
-
Fix the regex in
get_imports
to support multiline try blocks and excepts with specific exception types by @dakinggg in #23725 -
Remove the last few TF serving sigs by @Rocketknight1 in #23738
-
Fix
pip install --upgrade accelerate
command in modeling_utils.py by @tloen in #23747 -
Fix psuh_to_hub in Trainer when nothing needs pushing by @sgugger in #23751
-
Revamp test selection for the example tests by @sgugger in #23737
-
[LongFormer] code nits, removed unused parameters by @ArthurZucker in #23749
-
[
Nllb-Moe
] Fix nllb moe accelerate issue by @younesbelkada in #23758 -
[OPT] Doc nit, using fast is fine by @ArthurZucker in #23789
-
Update trainer.mdx class_weights example by @amitportnoy in #23787
-
no_cuda does not take effect in non distributed environment by @sywangyi in #23795
-
Enable code-specific revision for code on the Hub by @sgugger in #23799
-
add type hint in pipeline model argument by @y3sar in #23740
-
TF SAM shape flexibility fixes by @Rocketknight1 in #23842
-
🌐 [i18n-KO] Translated
fast_tokenizers.mdx
to Korean by @kihoon71 in #22956 -
[i18n-KO] Translated video_classification.mdx to Korean by @kihoon71 in #23026
-
🌐 [i18n-KO] Translated
troubleshooting.mdx
to Korean by @0525hhgus in #23166 -
Adds a FlyteCallback by @peridotml in #23759
-
Update collating_graphormer.py by @clefourrier in #23862
-
[LlamaTokenizerFast] nit update
post_processor
on the fly by @ArthurZucker in #23855 -
#23388 Issue: Update RoBERTa configuration by @vijethmoudgalya in #23863
-
[from_pretrained] imporve the error message when
_no_split_modules
is not defined by @ArthurZucker in #23861 -
Editing issue with pickle def with lambda function by @Natyren in #23869
-
Adds AutoProcessor.from_pretrained support for MCTCTProcessor by @Ubadub in #23856
-
🌐 [i18n-KO] Translated
pad_truncation.mdx
to Korean by @sim-so in #23823 -
Fix bug leading to missing token in GPTSanJapaneseTokenizer by @passaglia in #23883
-
Fix last instances of kbit -> quantized by @sgugger in #23797
-
fix(configuration_llama): add
keys_to_ignore_at_inference
toLlamaConfig
by @calico-1226 in #23891 -
Fix Trainer when model is loaded on a different GPU by @sgugger in #23792
-
Support shared tensors by @thomasw21 in #23871
-
ensure banned_mask and indices in same device by @cauyxy in #23901
-
Unpin numba by @sanchit-gandhi in #23162
-
[
bnb
] add warning when no linear by @younesbelkada in #23894 -
fix: Replace
add_prefix_space
inget_prompt_ids
with manual space for FastTokenizer compatibility by @connor-henderson in #23796 -
[
RWKV
] Fix RWKV 4bit by @younesbelkada in #23910 -
add conditional statement for auxiliary loss calculation by @harisankar95 in #23899
-
Raise error if loss can't be calculated - ViT MIM by @amyeroberts in #23872
-
Bug fix - flip_channel_order for channels first images by @amyeroberts in #23701
-
Update the update metadata job to use upload_folder by @sgugger in #23917
-
[PushToHub] Make it possible to upload folders by @NielsRogge in #23920
-
Skip device placement for past key values in decoder models by @sgugger in #23919
-
[Flax Whisper] Update decode docstring by @sanchit-gandhi in #23908
-
Effectively allow
encoder_outputs
input to be a tuple in pix2struct by @fxmarty in #23932 -
rename DocumentQuestionAnsweringTool parameter input to match docstring by @Adam-D-Lewis in #23939
-
Update stale.yml to use HuggingFaceBot by @LysandreJik in #23941
-
Make TF ESM inv_freq non-trainable like PyTorch by @Rocketknight1 in #23940
-
Revert "Update stale.yml to use HuggingFaceBot" by @LysandreJik in #23943
-
#23675 Registering Malay language by @soongbren in #23689
-
Modify device_map behavior when loading a model using from_pretrained by @SunMarc in #23922
-
use _make_causal_mask in clip/vit models by @kashif in #23942
-
Fix
ReduceLROnPlateau
object has no attribute 'get_last_lr' by @wasupandceacar in #23944 -
[MMS] Scaling Speech Technology to 1,000+ Languages | Add attention adapter to Wav2Vec2 by @patrickvonplaten in #23813
-
add new mms functions to doc by @patrickvonplaten in #23954
-
🌐 [i18n-KO] Translated object_detection.mdx to Korean by @kihoon71 in #23164
-
Trainer: fixed evaluate raising
KeyError
for ReduceLROnPlateau by @claudius-kienle in #23952 -
[Whisper Tokenizer] Skip special tokens when decoding with timestamps by @sanchit-gandhi in #23945
-
Add an option to reduce compile() console spam by @Rocketknight1 in #23938
-
Fix typo in doc comment of BitsAndBytesConfig by @ledyba in #23978
-
Skip
test_multi_gpu_data_parallel_forward
forMobileViTV2ModelTest
by @ydshieh in #24017 -
Auto tokenizer registration by @Bearnardd in #23965
-
expose safe_serialization argument in the pipeline API by @yessenzhar in #23775
-
Pix2Struct: fix wrong broadcast axis of attention mask in visual encoder by @affjljoo3581 in #23976
-
TensorBoard callback no longer adds hparams by @bri25yu in #23999
-
🌐 [i18n-KO] Translated
tasks_explained.mdx
to Korean by @0525hhgus in #23844 -
🌐 [i18n-KO] Translated
language-modeling.mdx
by @wonhyeongseo in #23969 -
🌐 [i18n-KO] Translated
bertology.mdx
to Korean by @wonhyeongseo in #23968 -
Use TruncatedNormal from Keras initializers by @hvaara in #24036
-
Prevent ZeroDivisionError on
trainer.evaluate
if model and dataset are tiny by @tomaarsen in #24049 -
Modification of one text example file should trigger said test by @sgugger in #24051
-
Tiny fix for
check_self_hosted_runner.py
by @ydshieh in #24052 -
Reduce memory usage in TF building by @Rocketknight1 in #24046
-
Move TF building to an actual build() method by @Rocketknight1 in #23760
-
Use new parametrization based weight norm if available by @ezyang in #24030
-
bring back
filtered_test_list_cross_tests.txt
by @ydshieh in #24055 -
Fix device placement for model-parallelism in generate for encoder/de… by @sgugger in #24025
-
Generate: increase left-padding test atol by @gante in #23448
-
[Wav2Vec2] Fix torch srcipt by @patrickvonplaten in #24062
-
Add support for non-rust implemented tokenization for
__getitem__
method. by @jacklanda in #24039 -
Support PEFT models when saving the model using trainer by @younesbelkada in #24073
-
[
Hub
] Addsafe_serialization
in push_to_hub by @younesbelkada in #24074 -
Fix
is_optimum_neuron_available
by @michaelbenayoun in #23961 -
[
bnb
] Fix bnb skip modules by @younesbelkada in #24043 -
Make the TF dummies even smaller by @Rocketknight1 in #24071
-
Fix expected value in tests of the test fetcher by @sgugger in #24077
-
Update delete_doc_comment_trigger.yml by @mishig25 in #24084
-
Do not prepare lr scheduler as it as the right number of steps by @sgugger in #24088
-
Fix a tiny typo in
WhisperForConditionalGeneration::generate
docstring by @sadra-barikbin in #24045 -
[
Trainer
] Correct behavior of_load_best_model
for PEFT models by @younesbelkada in #24103
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @shehanmunasinghe
- @TimDettmers
- @elisim
- @kihoon71
- @D-Roberts
- Add TensorFlow implementation of EfficientFormer (#22620)
- @soongbren