v4.47.0: PaliGemma-2, I-JEPA, OLMo-2, LayerSkip, Tensor Parallel
New models
PaliGemma-2
PaliGemma 2 and PaliGemma are lightweight open vision-language models (VLM) inspired by PaLI-3, and based on open components like the SigLIP vision model and the Gemma language model. PaliGemma takes both images and text as inputs and can answer questions about images with detail and context, meaning that PaliGemma can perform deeper analysis of images and provide useful insights, such as captioning for images and short videos, object detection, and reading text embedded within images.
PaliGemma 2 is available in 3B, 10B, and 28B parameter sizes, which are based on Gemma 2 2B, 9B, and 27B models, respectively. The original PaliGemma models are available in the 3B size. For more information on Gemma model variants, see the Gemma models list. PaliGemma model variants support different pixel resolutions for image inputs, including 224 x 224, 448 x 448, and 896 x 896 pixels.
I-JEPA
The I-JEPA model was proposed in Image-based Joint-Embedding Predictive Architecture by Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, Nicolas Ballas. I-JEPA is a self-supervised learning method that predicts the representations of one part of an image based on other parts of the same image. This approach focuses on learning semantic features without relying on pre-defined invariances from hand-crafted data transformations, which can bias specific tasks, or on filling in pixel-level details, which often leads to less meaningful representations.
OLMo 2
The OLMo2 model is the successor of the OLMo model, which was proposed in OLMo: Accelerating the Science of Language Models.
The architectural changes from the original OLMo model to this model are:
- RMSNorm is used instead of standard layer norm.
- Norm is applied to attention queries and keys.
- Norm is applied after attention/feedforward layers rather than before.
Commits:
- Add OLMo November 2024 by @2015aroras in #34551
- Rename OLMo November to OLMo2 by @2015aroras in #34864
Layer-Skip Llama
We add support for Meta's Layer-Skip Llama 3.2 1B model.
The Llama3.2 1B model was continually pretrained with LayerSkip recipe, early exit loss and layer dropout, as presented in Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding and is capable of performing self-speculative decoding: decode with earlier layers and verify with remaining layers.
- Self-speculation (Layer-Skip Llama) by @ArthurZucker in #34240
Tensor Parallel implementation
This PR uses the torch.distributed.tensor.parallel
subpackage to implement Tensor Parallel for Llama (as an example).
The motivation is multi-fold:
-
to make modeling code simple as single-worker case:
all manual TP implementations underif self.config.pretraining_tp > 1
can be removed. -
to make tensor parallelism easily accessible by users:
added amodel.tensor_parallel(device_mesh)
method that allows users to turn a single-proc model into a parallel model. !- Please guide me to a right place to put this function/method ifPreTrainedModel
is not a preferred place. -!
This is the first PR of many to simplify and enable Tensor Parallel across models.
Farewell, Python 3.8
Python 3.8 reaches end of life, and, as such, we drop it from our CI.
GGUF improvements
Several improvements have been done to the GGUF support in transformers; notably by adding new architectures to the list of supported architectures.
- Add T5 GGUF loading support by @junejae in #33389
- Add GGUF for Mamba by @VladOS95-cyber in #34200
- Add Nemotron GGUF Loading Support by @farrosalferro in #34725
- Improve gguf tensor processing by @VladOS95-cyber in #34515
- Fix
use_parallel_residual
andqkv_bias
for StableLM GGUF config extraction by @Isotr0py in #34450
Fast processors
We continue the work to improve the speed of fast processors as detailed in this roadmap.
We contribute a fast processor to RT-DETR.
- Add Image Processor Fast RT-DETR by @yonigozlan in #34354
New pipelines
A new pipeline has been added to transformers: image-text-to-text!
the pipeline support the following inputs:
- unbatched images and text - images=image, text=text
- batched images and text - images = [image, image], text= [text, text]
- several images per prompt (only for models supporting the use of an image token) - images = [[image, image], [image]] or images=[image, image, image], text = ["... ......", "......"]
- Chat templates (for models supporting them).
- Add image text to text pipeline by @yonigozlan in #34170
Notable refactors
Separate chat templates into a single file
We have had several issues with chat templates because they're stored as single lines in the JSON config files:
- Impossible to review diffs
- Very hard to edit in the web UI (or in general)
- Differences between
processor
templates inchat_template.json
andtokenizer
templates intokenizer_config.json
causing confusion - Some models use multiple templates, requiring a template dict, but we're trying to discourage that in future and move those models to single templates with conditional behaviour instead
The solution:
- Just move chat templates to a single
chat_template.jinja
file in the repo - If multiple templates are required, then they should still be stored in the JSON file. This is not supported for
Processor
classes, so processors should always be able to save their template as a raw Jinja file. In general, we'll be gently deprecating multiple templates in future. - If a
chat_template.jinja
file is present, it overrides the JSON files. If a tokenizer is loaded with both Jinja and JSON chat templates and resaved, it should save only the Jinja file, and not have anychat_template
entry intokenizer_config.json
.
For now, we continue saving in the old format by default. I'll probably keep it this way for several versions before making the new format the default, to ensure that most users are able to load the new format before it becomes common. Until then, the new format should mostly be used for testing, to make sure it's ready for deployment when we do the switch.
- Separate chat templates into a single file by @Rocketknight1 in #33957
Large modular logic refactor
This PR largely rework the logic we use in the modular converter. It is (hopefully) clearer and maintainable. Instead of going in all directions, adding stuff, then deleting it if not needed, we now do the following:
- visit all the modular file (record imports/functions/classes/assignments nodes)
- create function dependency mapping
- for each import coming from another model:
- visit the corresponding file
- create function dependency mapping
- update mapping with function/assignment from the modular (updated/new functions)
- create the class dependency graph based on merged dependencies
- update dependency graph of the modular with the functions and assignments imported from the other files
- for each class recorded in the modular:
- if inherithing from class in another file:
- replace call to super
- find the dependencies after the node was replaced
- follow (updated with modular defs) dependency mapping to add all nodes
- else:
- only add needed imported functions (and their dependencies)
- if inherithing from class in another file:
- determine the needed imports and add them
- Large modular logic refactoring by @Cyrilvallez in #34487
Community bugfixes and improvements
- Remove graph breaks for torch.compile() in flash_attention_forward when Lllama Model is padding free tuned by @Abhishek-TAMU in #33932
- Better defaults by @ArthurZucker in #34026
- translated gguf.md into chinese by @blueingman in #34163
- CI: fix failures by @zucchini-nlp in #34371
- Zamba is an LM by @LysandreJik in #34342
- add code generation to natural language processing section by @furtnerthomas in #34333
- Fix pil_torch_interpolation_mapping import in image_processing_detr_fast by @yonigozlan in #34375
- Add code sample docstrings and checkpoint reference for GLM models by @h3110Fr13nd in #34360
- refactor: remove redundant if-condition and improve type correctness for
convert_tokens_to_ids
by @winstxnhdw in #34030 - Ignore unsupported kwarg in ProcessorMixin call by @yonigozlan in #34285
- [PEFT] Add warning for missing key in LoRA adapter by @BenjaminBossan in #34068
- Fix
torch.fx
issue related to the newloss_kwargs
keyword argument by @michaelbenayoun in #34380 - Correct the new defaults by @Cyrilvallez in #34377
- [auto. ping] Avoid sending empty info + add more team members by @ydshieh in #34383
- Fix glm by @Cyrilvallez in #34388
- Use non nested images and batched text Idefics2/3 by @yonigozlan in #34222
- Fix onnx non-expotable inplace aten op by @IlyasMoutawwakil in #34376
- Fix right padding in LLaVA models by @zucchini-nlp in #34305
- no filter by @ydshieh in #34391
- SynthID: better example by @gante in #34372
- Tests: upgrade
test_eager_matches_sdpa_generate
by @gante in #34386 - Fix bnb training test failure by @matthewdouglas in #34414
- Avoid check expected exception when it is on CUDA by @ydshieh in #34408
- Fix typos in agents_advanced.md by @rudydel in #34405
- [docs] Cache implementations by @stevhliu in #34325
- Fix pix2struct by @IlyasMoutawwakil in #34374
- pin
tensorflow_probability<0.22
in docker files by @ydshieh in #34381 - Tiny update after #34383 by @ydshieh in #34404
- Fix batch size handling in prediction_loop for DataLoaderShard by @zeus2611 in #34343
- exclude fsdp from delay_optimizer_creation by @eljandoubi in #34140
- New option called
"best"
forargs.save_strategy
. by @seanswyi in #31817 - [docs] update input documentation for MAMBA2 and MISTRAL models to include cache_position and attention_mask details by @h3110Fr13nd in #34322
- 🌐 [i18n-KO] Translated
model_doc/barthez.md
to Korean by @Jwaminju in #33980 - Apply linting to the important code blocks to make it readable by @ShubhamJagtap2000 in #34449
- Torchao weights only + prequantized compability by @SunMarc in #34355
- [i18n-ar] Translated file :
docs/source/ar/fast_tokenizers.md
into Arabic by @AhmedAlmaghz in #33034 - enable average tokens across devices by @techkang in #34373
- feat: run benchmarks on A100 by @McPatate in #34287
- Add
post_process_depth_estimation
for GLPN by @alex-bene in #34413 - LLaVA: latency issues by @zucchini-nlp in #34460
- Generation: fix test by @zucchini-nlp in #34369
- Fix CI by @zucchini-nlp in #34458
- use a tinymodel to test generation config which aviod timeout by @techkang in #34482
- 🚨🚨🚨 [SuperPoint] Fix keypoint coordinate output and add post processing by @sbucaille in #33200
- Simplify running tests in a subprocess by @ydshieh in #34213
- Fix perplexity computation in perplexity.md by @Framartin in #34387
- Fixes for Modular Converter on Windows by @hlky in #34266
- Fix regression loading dtype by @SunMarc in #34409
- Bert is ExecuTorch compatible by @guangy10 in #34424
- manual
head_dim
formixtral
model by @wavy-jung in #34281 - fix-qwen2vl-no-position_ids by @simonJJJ in #33487
- Bug fix for drop path decay rate in swin transformer by @abhi-glitchhg in #34291
- MobileBERT is ExecuTorch compatible by @guangy10 in #34473
- Albert is ExecuTorch compatible by @guangy10 in #34476
- Adding
optimizer_cls_and_kwargs
toTrainer.__init__
by @apoorvkh in #34358 - Fix performance in get_imports regexp by @AlekseyLobanov in #34298
- fix incorrect warning by @yonigozlan in #34416
- Un-deprecate timeout arg in pipelines by @Rocketknight1 in #34382
- Roberta is ExecuTorch compatible by @guangy10 in #34425
- Fix format mistake in string repr of tokenizer objects by @gpetho in #34493
- Mllama: update docs by @zucchini-nlp in #34334
- VLMs: fix number of image tokens by @zucchini-nlp in #34332
- Tests: move
generate
tests to the right mixin and delete redundant tests by @gante in #34464 - fix pixtral processor by @molbap in #34486
- Use torch 2.5 in scheduled CI by @ydshieh in #34465
- Fix super tiny extra space typo by @fzyzcjy in #34440
- UPDATE Documentation for #TRANSLATING.md Documentation into Multiple Languages.(Changes made) by @anshumangahlot in #34226
- enable QA bf16 pipeline by @jiqing-feng in #34483
- Fix: img size mismatch caused by incorrect unpadding in LLaVA-Next by @jp1924 in #34522
- Fix step shifting when accumulate gradient by @kibitzing in #33673
- avoid calling
gc.collect
andcuda.empty_cache
by @ydshieh in #34514 - Qwen2VL: skip base
input_ids
-inputs_embeds
equivalence check by @gante in #34535 - fix(DPT,Depth-Anything) Address expected_slice errors inside inference tests by @philkuz in #34518
- feat: add benchmarks pg indexes by @McPatate in #34536
- make
test_eager_matches_sdpa_inference
less flaky by @ydshieh in #34512 - Bug Fix for issue #34294 by @fpgaminer in #34295
- [CLIPSeg] Make interpolate_pos_encoding default to True by @NielsRogge in #34419
- update doc by @jiqing-feng in #34478
- [i18n-ar] Translated file :
docs/source/ar/multilingual.md
into Arabic by @AhmedAlmaghz in #33048 - Blip: get/set input embeddings correctly by @zucchini-nlp in #34152
- BLIP: enable generation tests by @zucchini-nlp in #34174
- 🔴 🔴 fix
query_pre_attn_scalar
different ofnum_heads
in default gemma2 config by @molbap in #34540 - [i18n-HI] Translated accelerate page to Hindi by @karthik-script in #34443
- Update trainer for easier handling of accumulate, compile fixes, and proper reporting by @muellerzr in #34511
- VLM: special multimodal Tokenizer by @zucchini-nlp in #34461
- MPS:
isin_mps_friendly
can support 0D tensors by @gante in #34538 - Add text support to the Trainer's TensorBoard integration by @JacobLinCool in #34418
- [i18n-HI] Translated TFLite page to Hindi by @karthik-script in #34572
- 🌐 [i18n-KO] Translated perf_train_special.md to Korean by @maximizemaxwell in #34590
- 🌐 [i18n-KO] Update README_ko.md by @J4BEZ in #33098
- fix TrainerState doc because num_input_tokens_seen is unused by defau… by @techkang in #34593
- Fix Whisper CI by @ydshieh in #34541
- Skip DeepSpeed ZeRO Stage 3 model initialization when bnb by @eljandoubi in #34395
- FIX: Broken repr of TorchAoConfig by @BenjaminBossan in #34560
- Load sub-configs from composite configs by @zucchini-nlp in #34410
- DistilBERT is ExecuTorch compatible by @guangy10 in #34475
- Remove unused test_dataset by @thisisiron in #34516
- Revert "Fix Whisper CI" by @ydshieh in #34605
- Fix #34494 assistant tokens when truncated by @yonigottesman in #34531
- Remove
@slow
fortest_eager_matches_sdpa_inference
by @ydshieh in #34558 - Changing repr in torchao to show quantized Linear by @MekkCyber in #34202
- Fix torchvision interpolation CI by @yonigozlan in #34539
- 🌐 [i18n-KO] Translated
convbert.md
to Korean by @ahnjj in #34599 - fix(dvclive): pass fake dataset to avoid exception in trainer init by @shcheklein in #34455
- 🌐 [i18n-KO] Translated
timesformer.md
to Korean by @mreraser in #33972 - 🌐 [i18n-KO] Translated bert.md to Korean by @maximizemaxwell in #34627
- [i18n-ar] Translated file :
docs/source/ar/trainer.md
into Arabic by @AhmedAlmaghz in #33080 - Update llm_engine.py by @louisbrulenaudet in #33332
- Agents: turn any Space into a Tool with
Tool.from_space()
by @aymeric-roucher in #34561 - [docs] update not-working model revision by @faaany in #34682
- [i18n-ar] Translated file :
docs/source/ar/torchscript.md
into Arabic by @AhmedAlmaghz in #33079 - Agents: Small fixes in streaming to gradio + add tests by @aymeric-roucher in #34549
- 🌐 [i18n-KO] Translated marian.md to Korean by @maximizemaxwell in #34698
- [docs] Broken link in generation_strategies by @pcuenca in #34717
- Fix example in EsmConfig docstring by @yuanx749 in #34653
- [docs] add xpu device check by @faaany in #34684
- Retain newlines in chat template when
continue_final_message=True
by @lewtun in #34253 - Update llava.md by @LysandreJik in #34749
- fix(wandb): pass fake dataset to avoid exception in trainer (see #34455) by @CezaPasc in #34720
- add xpu path for awq by @jiqing-feng in #34712
- FSDP grad accum fix by @winglian in #34645
- Remove FSDP wrapping from sub-models. by @eljandoubi in #34452
- 🧼 remove v4.44 deprecations by @gante in #34245
- VLMs:
patch_size
->num_image_tokens
in processing by @zucchini-nlp in #33424 - Fix broken link by @ofek in #34618
- fix a typo bug where 'id2label' was incorrectly written as 'i2label' when reading config by @ZuoChenFttS in #34637
- Fix skip of test_training_gradient_checkpointing by @dvrogozh in #34723
- make sure to disable gradients for integer tensor by @winglian in #32943
- [docs] make
empty_cache
device-agnostic by @faaany in #34774 - [docs] add XPU besides CUDA, MPS etc. by @faaany in #34777
- [tests] add XPU part to testing by @faaany in #34778
- fix: Update pixel_values parameter in hf_model input by @thisisiron in #34782
- Fix callback key name by @jung-hunsoo in #34762
- fix: Wrong task mentioned in docs by @ecyht2 in #34757
- Allow handling files as args for a tool created with Tool.from_space by @aymeric-roucher in #34687
- Fix Whisper CI by @ydshieh in #34617
- protect tensor parallel usage by @ArthurZucker in #34800
- Trainer hyperparameter search kwargs docs update by @GuillemGSubies in #34459
- feat: allow to use hf-hub models for timm backbone by @cgebbe in #34729
- Support gradient checkpointing in Qwen2VL ViT by @li-plus in #34724
- Fix: siglip image processor rgb_convert is not being applied correctly. by @jp1924 in #34301
- fix cpu bnb path by @jiqing-feng in #34647
- Gemma capping by @ArthurZucker in #34282
- Fix cache_utils for optimum.quanto kvcache quantization by @SunMarc in #34750
- Modular fix by @Cyrilvallez in #34802
- MLU devices : Checks if mlu is available via an cndev-based check which won't trigger the drivers and leave mlu by @huismiling in #34326
- 🚨🚨🚨 fix(Mask2Former): torch export 🚨🚨🚨 by @philkuz in #34393
- Feature: print tokens per second during training by @tibor-reiss in #34507
- Add do_convert_rgb to vit by @jp1924 in #34523
- Fix post process function called in the instance segmentation example of mask2former by @OnTheThirdDay in #34588
- fix crash in tiiuae/falcon-11B-vlm image-to-text generation by @sywangyi in #34728
- Add support for OpenAI api "image_url" input in chat for image-text-to-text pipeline by @yonigozlan in #34562
- Add Image Processor Fast Deformable DETR by @yonigozlan in #34353
- Run
test_medium_seamless_m4t_pt
insubprocess
to avoid many failures by @ydshieh in #34812 - Fix
check_training_gradient_checkpointing
by @ydshieh in #34806 - Added image-text-to-text pipeline to task guide by @merveenoyan in #34783
- Translate attention.md into Chinese by @wwwbai in #34716
- LLaVA OV: fix unpadding precision by @zucchini-nlp in #34779
- Fix low memory beam search by @zucchini-nlp in #34746
- Fix the memory usage issue of logits in generate() by @kjohew in #34813
- fix(DPT,Depth-Anything)
torch.export
by @philkuz in #34103 - Fix: take into account meta device by @tibor-reiss in #34134
- Fix hyperparameter search when optuna+deepseed by @corentin-ryr in #34642
- Fix CI by tweaking torchao tests by @SunMarc in #34832
- Fix CI slack reporting issue by @ydshieh in #34833
- VLMs: enable generation tests - last batch by @zucchini-nlp in #34484
- Change logging level from warning to info for
max_steps
overridingnum_train_epochs
by @qgallouedec in #34810 - Fix ds nvme by @eljandoubi in #34444
- Fix heuristic scheduling for UAG by @jmamou in #34805
- Refactor StarCoder2 using modular by @Cyrilvallez in #34015
- Watermarking: fix order by @zucchini-nlp in #34849
- Update checks for torch.distributed.tensor to require torch >= 2.5 by @loadams in #34816
- Remove quantization related config from dequantized model by @konradkalita in #34856
- Auto compile when static cache by @ArthurZucker in #34247
- Speculative decoding: Test the target distribution (to prevent issues like #32867) by @keyboardAnt in #34553
- smol improvements to support more flexible usage by @andimarafioti in #34857
- [CI] Skip EETQ tests while package is broken with latest transformers by @BenjaminBossan in #34854
- Bitnet test fix to avoid using gated model by @MekkCyber in #34863
- Fix support for image processors modifications in modular by @yonigozlan in #34866
- Fix: Enable prefill phase key value caching of nemotron/minitron models by @jeongin601 in #34742
- Add safe_globals to resume training on PyTorch 2.6 by @dvrogozh in #34632
- Cache: init empty cache when
use_cache
by @zucchini-nlp in #34274 - BLIP: fix generation after hub update by @zucchini-nlp in #34876
- [
Deberta/Deberta-v2
] Refactor code base to support compile, export, and fix LLM by @ArthurZucker in #22105 - 🔴 Mllama: fix base prefix by @zucchini-nlp in #34874
- Sum gathered input tokens by @techkang in #34554
- allow unused input parameters passthrough when chunking in asr pipelines by @VictorAtIfInsurance in #33889
- prepare_fa2_from_position_ids function bugfix by @meliksahturker in #33269
- chore: fix some typos by @wanxiangchwng in #34891
- Fix convert_tokens_to_string when decoder is None by @dszeto in #34569
- [
peft
] Given thatself.active_adapter
is deprecated, avoid using it by @tomaarsen in #34804 - Fix Qwen2 failing tests by @jla524 in #34819
- Fix : BitNet tests by @MekkCyber in #34895
- [AWQ, CI] Bump AWQ version used in docker image by @BenjaminBossan in #34922
- fix static cache data type miss-match by @jiqing-feng in #34799
- Fix
test_auto_backbone_timm_model_from_pretrained
by @ydshieh in #34877 - Upgrade torch version to 2.5 in dockerfile for quantization CI by @MekkCyber in #34924
- Fix failling GGML test by @MekkCyber in #34871
- Updated documentation and added conversion utility by @ViktorooReps in #34319
- making gpt2 fx traceable by @xuzifei-dmatrix in #34633
- Fix import structure for Fast Image processors by @yonigozlan in #34859
- VideoLLaVA: add default values by @zucchini-nlp in #34916
- Skipping aqlm non working inference tests till fix merged by @MekkCyber in #34865
- [Whisper] Fix whisper integration tests by @eustlb in #34111
- Add Pytorch Tensor Parallel support for Mistral by @VladOS95-cyber in #34927
- change apply_rotary_pos_emb of Glmmodel for GLM-Edge Series model by @zRzRzRzRzRzRzR in #34629
- Fix torch.onnx.export of Qwen2-VL vision encoder by @xenova in #34852
- Update the Python version in the Chinese README to match the English README. by @vansin in #34870
- [i18n-ar] Translated file :
docs/source/ar/benchmarks.md
into Arabic by @AhmedAlmaghz in #33023 - [docs] use device-agnostic API instead of cuda by @faaany in #34913
- [doc] use full path for run_qa.py by @faaany in #34914
- docs: HUGGINGFACE_HUB_CACHE -> HF_HUB_CACHE by @imba-tjd in #34904
- [i18n-zh]Translated tiktoken.md into chinese by @blueingman in #34936
- [
FlexAttention
] Update gemma2 by @ArthurZucker in #34942 - Fix : Add PEFT from source to CI docker by @MekkCyber in #34969
- Avoid calling
get_max_length
by @ydshieh in #34971 - Fix flaky test execution caused by
Thread
by @ydshieh in #34966 - 🌐 [i18n-KO] Translated encoder-decoder.md to Korean by @maximizemaxwell in #34880
- [docs] add explanation to
release_memory()
by @faaany in #34911 - [i18n-zh]Translated perf_train_special.md into Chinese by @blueingman in #34948
- Fix typo in code block in vipllava.md by @yuanx749 in #34957
- Fixed typo in
VisitWebpageTool
by @sergiopaniego in #34978 - [PEFT] Set eval mode when loading PEFT adapter by @BenjaminBossan in #34509
- Fix
save_pretrained
for partially offloaded models by @kylesayrs in #34890 - 🚨🚨🚨 Changed DINOv2Config default patch size to 14 by @OFSkean in #34568
- Refine the code of Universal Assisted Generation by @xinpengzz in #34823
- Allow compressed-tensors quantized model to be trained by @horheynm in #34520
- Offloaded cache: fix generate by @zucchini-nlp in #34921
- Fix
utils/check_bad_commit.py
(for auto ping in CI) by @ydshieh in #34943 - Add optimized
PixtralImageProcessorFast
by @mgoin in #34836 - Improve
.from_pretrained
type annotations by @qubvel in #34973 - Fix docker CI : install autogptq from source by @MekkCyber in #35000
- Let server decide default repo visibility by @Wauplin in #34999
- 🚨🚨🚨 Uniformize kwargs for TrOCR Processor by @tibor-reiss in #34587
- Update timm version by @qubvel in #35005
- fix: double verbs by @SamuelLarkin in #35008
- Update
FillMaskPipeline.__call__
signature and docstring by @alvarobartt in #35006 - Only cast
cu_seqlens
when tracing by @xenova in #35016 - fix variable undefined bug when return_tensors is not specified in llava processing by @chenweize1998 in #34953
- Optimize memory usage of mllama encoder by @milesial in #34930
- Typo in warning switching to optimum-quanto by @Bojun-Feng in #35028
- Add type hints for forward functions in Gemma2 by @jla524 in #35034
- Fix
test_eager_matches_sdpa_inference
forXPU
backend by @dvrogozh in #34889 - Multiple typo fixes in Tutorials docs by @henryhmko in #35035
- add docstring example for compute_loss_func by @secrettoad in #35020
- [i18n-ar] Translated file :
docs/source/ar/notebooks.md
into Arabic by @AhmedAlmaghz in #33049 - [docs] add the missing import for Image and bug fix by @faaany in #34776
- Translate bertlogy.md into Chinese by @wwwbai in #34908
- Automatic compilation in generate: do not rely on inner function by @Cyrilvallez in #34923
- Add token cost + runtime monitoring to Agent and HfEngine children by @aymeric-roucher in #34548
- Fix
BertGeneration
by @ydshieh in #35043 - fix speecht5 failure issue in test_peft_gradient_checkpointing_enable… by @sywangyi in #34454
- [docs] fix example code bug by @faaany in #35054
- Translate community.md into Chinese by @wwwbai in #35013
- [docs] use device-agnostic instead of
cuda
by @faaany in #35047 - [docs] use device-agnostic API instead of hard-coded cuda by @faaany in #35048
- Fix
pad_token_tensor
is None in warning by @tshu-w in #34005 - Add Pytorch Tensor Parallel support for Qwen2, Qwen2Moe, Starcoder2 by @VladOS95-cyber in #35007
- [
GPTNeoX
] Flex Attention + Refactor by @vasqu in #34896 - Support for easier multimodal use of modular by @Cyrilvallez in #35056
- [docs] add a comment that offloading requires CUDA GPU by @faaany in #35055
- [docs] Increase visibility of torch_dtype="auto" by @stevhliu in #35067
- Informative by @ydshieh in #35059
- [Whisper] Fix whisper tokenizer by @eustlb in #34537
- [
tokenizers
] bump to 0.21 by @ArthurZucker in #34972 - Update Mistral conversion script by @Cyrilvallez in #34829
- Fix
tie_word_embeddings
handling for GGUF models by @Isotr0py in #35085 - Deprecate quanto and switch to optimum-quanto by @MekkCyber in #35001
- BLIP: this is correct now by @zucchini-nlp in #35081
- [
trainer
] fix the GAmodel_accepts_loss_kwargs
by @ArthurZucker in #34915 - Fix flaky Hub CI (
test_trainer.py
) by @ydshieh in #35062 - Adaptive dynamic number of speculative tokens by @jmamou in #34156
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @AhmedAlmaghz
- [i18n-ar] Translated file :
docs/source/ar/fast_tokenizers.md
into Arabic (#33034) - [i18n-ar] Translated file :
docs/source/ar/multilingual.md
into Arabic (#33048) - [i18n-ar] Translated file :
docs/source/ar/trainer.md
into Arabic (#33080) - [i18n-ar] Translated file :
docs/source/ar/torchscript.md
into Arabic (#33079) - [i18n-ar] Translated file :
docs/source/ar/benchmarks.md
into Arabic (#33023)
- [i18n-ar] Translated file :
- @maximizemaxwell
- @2015aroras
- @mgoin
- Add optimized
PixtralImageProcessorFast
(#34836)
- Add optimized