Falcon, Code Llama, ViTDet, DINO v2, VITS
Falcon
Falcon is a class of causal decoder-only models built by TII. The largest Falcon checkpoints have been trained on >=1T tokens of text, with a particular emphasis on the RefinedWeb corpus. They are made available under the Apache 2.0 license.
Falcon’s architecture is modern and optimized for inference, with multi-query attention and support for efficient attention variants like FlashAttention. Both ‘base’ models trained only as causal language models as well as ‘instruct’ models that have received further fine-tuning are available.
- Falcon port #24523 by @Rocketknight1
- Falcon: Add RoPE scaling by @gante in #25878
- Add proper Falcon docs and conversion script by @Rocketknight1 in #25954
- Put Falcon back by @LysandreJik in #25960
- [
Falcon
] Remove SDPA for falcon to support earlier versions of PyTorch (< 2.0) by @younesbelkada in #25947
Code Llama
Code Llama, is a family of large language models for code based on Llama 2, providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.
- [
CodeLlama
] Add support forCodeLlama
by @ArthurZucker in #25740 - [
CodeLlama
] Fix CI by @ArthurZucker in #25890
ViTDet
ViTDet reuses the ViT model architecture, adapted to object detection.
- Add ViTDet by @NielsRogge in #25524
DINO v2
DINO v2 is the next iteration of the DINO model. It is added as a backbone class, allowing it to be re-used in downstream models.
- [DINOv2] Add backbone class by @NielsRogge in #25520
VITS
VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is an end-to-end speech synthesis model that predicts a speech waveform conditional on an input text sequence. It is a conditional variational autoencoder (VAE) comprised of a posterior encoder, decoder, and conditional prior.
Breaking changes:
- 🚨🚨🚨 [
Refactor
] Move third-party related utility files intointegrations/
folder 🚨🚨🚨 by @younesbelkada in #25599
Moves all third party libs (outside HF ecosystem) related utility files inside integrations/
instead of having them in transformers
directly.
In order to get the previous usage you should be changing your call to the following:
- from transformers.deepspeed import HfDeepSpeedConfig
+ from transformers.integrations import HfDeepSpeedConfig
Bugfixes and improvements
- [DOCS] MusicGen Docs Update by @xNul in #25510
- [MINOR:TYPO] by @cakiki in #25646
- Pass the proper token to PEFT integration in auto classes by @sgugger in #25649
- Put IDEFICS in the right section of the doc by @sgugger in #25650
- TF 2.14 compatibility by @Rocketknight1 in #25630
- Fix bloom add prefix space by @ArthurZucker in #25652
- removing unnecesssary extra parameter by @rafaelpadilla in #25643
- Adds
TRANSFORMERS_TEST_BACKEND
by @vvvm23 in #25655 - stringify config by @AleksanderWWW in #25637
- Add input_embeds functionality to gpt_neo Causal LM by @gaasher in #25659
- Update doc toctree by @ydshieh in #25661
- Add Llama2 resources by @wonhyeongseo in #25531
- [
SPM
] Patchspm
Llama and T5 by @ArthurZucker in #25656 - [
GPTNeo
] Add input_embeds functionality to gpt_neo Causal LM by @ArthurZucker in #25664 - fix wrong path in some doc by @ydshieh in #25658
- Remove
utils/documentation_tests.txt
by @ydshieh in #25680 - Prevent Dynamo graph fragmentation in GPTNeoX with torch.baddbmm fix by @norabelrose in #24941
⚠️ [CLAP] Fix dtype of logit scales in init by @sanchit-gandhi in #25682- Sets the stalebot to 10 AM CEST by @LysandreJik in #25678
- Fix
pad_token
check condition by @ydshieh in #25685 - [DOCS] Added docstring example for EpsilonLogitsWarper #24783 by @sanjeevk-os in #25378
- correct resume training steps number in progress bar by @pphuc25 in #25691
- Generate: general test for decoder-only generation from
inputs_embeds
by @gante in #25687 - Fix typo in
configuration_gpt2.py
by @susnato in #25676 - fix ram efficient fsdp init by @pacman100 in #25686
- [
LlamaTokenizer
] make unk_token_length a property by @ArthurZucker in #25689 - Update list of persons to tag by @sgugger in #25708
- docs: Resolve typos in warning text by @tomaarsen in #25711
- Fix failing
test_batch_generation
for bloom by @ydshieh in #25718 - [
PEFT
] Fix peft version by @younesbelkada in #25710 - Fix number of minimal calls to the Hub with peft integration by @sgugger in #25715
- [
AutoGPTQ
] Add correct installation of GPTQ library + fix slow tests by @younesbelkada in #25713 - Generate: nudge towards
do_sample=False
whentemperature=0.0
by @gante in #25722 - [
from_pretrained
] Simpler code for peft by @ArthurZucker in #25726 - [idefics] idefics-9b test use 4bit quant by @stas00 in #25734
- ImageProcessor - check if input pixel values between 0-255 by @amyeroberts in #25688
- [
from_pretrained
] Fix failing PEFT tests by @younesbelkada in #25733 - [ASR Pipe Test] Fix CTC timestamps error message by @sanchit-gandhi in #25727
- 🌐 [i18n-KO] Translated
visual_question_answering.md
to Korean by @wonhyeongseo in #25679 - [
PEFT
] Fix PeftConfig save pretrained when callingadd_adapter
by @younesbelkada in #25738 - fixed typo in speech encoder decoder doc by @asusevski in #25745
- Add FlaxCLIPTextModelWithProjection by @pcuenca in #25254
- Generate: add missing logits processors docs by @gante in #25653
- [DOCS] Add example for HammingDiversityLogitsProcessor by @jessthebp in #25481
- Generate: logits processors are doctested and fix broken doctests by @gante in #25692
- [CLAP] Fix logit scales dtype for fp16 by @sanchit-gandhi in #25754
- [
Sentencepiece
] make surelegacy
do not requireprotobuf
by @ArthurZucker in #25684 - fix encoder hook by @SunMarc in #25735
- Docs: fix indentation in
HammingDiversityLogitsProcessor
by @gante in #25756 - Add type hints for several pytorch models (batch-3) by @nablabits in #25705
- Correct attention mask dtype for Flax GPT2 by @liutianlin0121 in #25636
- fix a typo in docsting by @statelesshz in #25759
- [idefics] small fixes by @stas00 in #25764
- Add docstrings and fix VIVIT examples by @Geometrein in #25628
- [
LlamaFamiliy
] add a tip about dtype by @ArthurZucker in #25794 - Add type hints for several pytorch models (batch-2) by @nablabits in #25557
- Add type hints for pytorch models (final batch) by @nablabits in #25750
- Add type hints for several pytorch models (batch-4) by @nablabits in #25749
- [idefics] fix vision's
hidden_act
by @stas00 in #25787 - Arde/fsdp activation checkpointing by @arde171 in #25771
- Fix incorrect Boolean value in deepspeed example by @tmm1 in #25788
- fixing name position_embeddings to object_queries by @Lorenzobattistela in #24652
- Resolving Attribute error when using the FSDP ram efficient feature by @pacman100 in #25820
- [
Docs
] More clarifications on BT + FA by @younesbelkada in #25823 - fix register by @zspo in #25779
- Minor wording changes for Code Llama by @osanseviero in #25815
- [
LlamaTokenizer
]tokenize
nits. by @ArthurZucker in #25793 - fix warning trigger for embed_positions when loading xglm by @MattYoon in #25798
- 🌐 [i18n-KO] Translated peft.md to Korean by @nuatmochoi in #25706
- 🌐 [i18n-KO]
model_memory_anatomy.md
to Korean by @mjk0618 in #25755 - Error with checking args.eval_accumulation_steps to gather tensors by @chaumng in #25819
- Tests: detect lines removed from "utils/not_doctested.txt" and doctest ALL generation files by @gante in #25763
- 🌐 [i18n-KO] Translated
add_new_pipeline.md
to Korean by @heuristicwave in #25498 - 🌐 [i18n-KO] Translated
community.md
to Korean by @sim-so in #25674 - 🤦update warning to If you want to use the new behaviour, set `legacy=… by @ArthurZucker in #25833
- update remaining
Pop2Piano
checkpoints by @susnato in #25827 - [AutoTokenizer] Add data2vec to mapping by @sanchit-gandhi in #25835
- MaskFormer,Mask2former - reduce memory load by @amyeroberts in #25741
- Support loading base64 images in pipelines by @InventivetalentDev in #25633
- Update README.md by @NinoRisteski in #25834
- Generate: models with custom
generate()
returnTrue
incan_generate()
by @gante in #25838 - Update README.md by @NinoRisteski in #25832
- minor typo fix in PeftAdapterMixin docs by @tmm1 in #25829
- Add flax installation in daily doctest workflow by @ydshieh in #25860
- Add Blip2 model in VQA pipeline by @jpizarrom in #25532
- Remote tools are turned off by @LysandreJik in #25867
- Fix imports by @ydshieh in #25869
- fix max_memory for bnb by @SunMarc in #25842
- Docs: fix example failing doctest in
generation_strategies.md
by @gante in #25874 - pin pandas==2.0.3 by @ydshieh in #25875
- Reduce CI output by @ydshieh in #25876
- [ViTDet] Fix doc tests by @NielsRogge in #25880
- For xla tensors, use an alternative way to get a unique id by @qihqi in #25802
- fix ds z3 checkpointing when
stage3_gather_16bit_weights_on_model_save=False
by @pacman100 in #25817 - Modify efficient GPU training doc with now-available adamw_bnb_8bit optimizer by @veezbo in #25807
- [
TokenizerFast
]can_save_slow_tokenizer
as a property for whenvocab_file
's folder was removed by @ArthurZucker in #25626 - Save image_processor while saving pipeline (ImageSegmentationPipeline) by @raghavanone in #25884
- [
InstructBlip
] FINAL Fix instructblip test by @younesbelkada in #25887 - Add type hints for tf models batch 1 by @nablabits in #25853
- Update
setup.py
by @ydshieh in #25893 - Smarter check for
is_tensor
by @sgugger in #25871 - remove torch_dtype override by @SunMarc in #25894
- fix FSDP model resume optimizer & scheduler by @pkumc in #25852
- Better error message for pipeline loading by @ydshieh in #25912
- Remove broken docs for MusicGen by @osanseviero in #25905
- Revert frozen training arguments by @muellerzr in #25903
- [VITS] Add to TTA pipeline by @sanchit-gandhi in #25906
- [MMS] Update docs with HF TTS implementation by @sanchit-gandhi in #25907
- [VITS] Only trigger tokenizer warning for uroman by @sanchit-gandhi in #25915
- Update-llama-code by @ArthurZucker in #25826
- Update model_memory_anatomy.md by @NinoRisteski in #25896
- Skip offload tests for
ViTDet
by @ydshieh in #25913 - Fix typos by @omahs in #25936
- Update community.md by @NinoRisteski in #25928
- Update autoclass_tutorial.md by @NinoRisteski in #25929
- Update README.md by @NinoRisteski in #25941
- [MMS] Fix pip install in docs by @sanchit-gandhi in #25949
- [VITS] Handle deprecated weight norm by @sanchit-gandhi in #25946
- Import deepspeed utilities from integrations by @osanseviero in #25919
- Update README.md by @NinoRisteski in #25922
- [VITS] Fix init test by @sanchit-gandhi in #25945
- Fix failing test by @LysandreJik in #25963
- Fix smart check by @ydshieh in #25955
- Add type hints for tf models final batch by @nablabits in #25883
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @nablabits
- Add type hints for several pytorch models (batch-3) (#25705)
- Add type hints for several pytorch models (batch-2) (#25557)
- Add type hints for pytorch models (final batch) (#25750)
- Add type hints for several pytorch models (batch-4) (#25749)
- Add type hints for tf models batch 1 (#25853)
- Add type hints for tf models final batch (#25883)
- @Lorenzobattistela
- fixing name position_embeddings to object_queries (#24652)
- @hollance
- add VITS model (#24085)