Release v4.47.0: PaliGemma-2, I-JEPA, OLMo-2, LayerSkip, Tensor Parallel · huggingface/transformers

New models

PaliGemma-2

PaliGemma 2 and PaliGemma are lightweight open vision-language models (VLM) inspired by PaLI-3, and based on open components like the SigLIP vision model and the Gemma language model. PaliGemma takes both images and text as inputs and can answer questions about images with detail and context, meaning that PaliGemma can perform deeper analysis of images and provide useful insights, such as captioning for images and short videos, object detection, and reading text embedded within images.

PaliGemma 2 is available in 3B, 10B, and 28B parameter sizes, which are based on Gemma 2 2B, 9B, and 27B models, respectively. The original PaliGemma models are available in the 3B size. For more information on Gemma model variants, see the Gemma models list. PaliGemma model variants support different pixel resolutions for image inputs, including 224 x 224, 448 x 448, and 896 x 896 pixels.

I-JEPA

The I-JEPA model was proposed in Image-based Joint-Embedding Predictive Architecture by Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, Nicolas Ballas. I-JEPA is a self-supervised learning method that predicts the representations of one part of an image based on other parts of the same image. This approach focuses on learning semantic features without relying on pre-defined invariances from hand-crafted data transformations, which can bias specific tasks, or on filling in pixel-level details, which often leads to less meaningful representations.

Add I-JEPA by @jmtzt in #33125

OLMo 2

The OLMo2 model is the successor of the OLMo model, which was proposed in OLMo: Accelerating the Science of Language Models.

The architectural changes from the original OLMo model to this model are:

RMSNorm is used instead of standard layer norm.
Norm is applied to attention queries and keys.
Norm is applied after attention/feedforward layers rather than before.

Commits:

Add OLMo November 2024 by @2015aroras in #34551
Rename OLMo November to OLMo2 by @2015aroras in #34864

Layer-Skip Llama

We add support for Meta's Layer-Skip Llama 3.2 1B model.

The Llama3.2 1B model was continually pretrained with LayerSkip recipe, early exit loss and layer dropout, as presented in Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding and is capable of performing self-speculative decoding: decode with earlier layers and verify with remaining layers.

Self-speculation (Layer-Skip Llama) by @ArthurZucker in #34240

Tensor Parallel implementation

This PR uses the torch.distributed.tensor.parallel subpackage to implement Tensor Parallel for Llama (as an example).

The motivation is multi-fold:

to make modeling code simple as single-worker case:
all manual TP implementations under if self.config.pretraining_tp > 1 can be removed.
to make tensor parallelism easily accessible by users:
added a model.tensor_parallel(device_mesh) method that allows users to turn a single-proc model into a parallel model. !- Please guide me to a right place to put this function/method if PreTrainedModel is not a preferred place. -!

This is the first PR of many to simplify and enable Tensor Parallel across models.

Simplify Tensor Parallel implementation with PyTorch TP by @kwen2501 in #34184

Farewell, Python 3.8

Python 3.8 reaches end of life, and, as such, we drop it from our CI.

Drop support for Python 3.8 by @ydshieh in #34314

GGUF improvements

Several improvements have been done to the GGUF support in transformers; notably by adding new architectures to the list of supported architectures.

Add T5 GGUF loading support by @junejae in #33389
Add GGUF for Mamba by @VladOS95-cyber in #34200
Add Nemotron GGUF Loading Support by @farrosalferro in #34725
Improve gguf tensor processing by @VladOS95-cyber in #34515
Fix use_parallel_residual and qkv_bias for StableLM GGUF config extraction by @Isotr0py in #34450

Fast processors

We continue the work to improve the speed of fast processors as detailed in this roadmap.

We contribute a fast processor to RT-DETR.

Add Image Processor Fast RT-DETR by @yonigozlan in #34354

New pipelines

A new pipeline has been added to transformers: image-text-to-text!

the pipeline support the following inputs:

unbatched images and text - images=image, text=text
batched images and text - images = [image, image], text= [text, text]
several images per prompt (only for models supporting the use of an image token) - images = [[image, image], [image]] or images=[image, image, image], text = ["... ......", "......"]
Chat templates (for models supporting them).

Add image text to text pipeline by @yonigozlan in #34170

Notable refactors

Separate chat templates into a single file

We have had several issues with chat templates because they're stored as single lines in the JSON config files:

Impossible to review diffs
Very hard to edit in the web UI (or in general)
Differences between processor templates in chat_template.json and tokenizer templates in tokenizer_config.json causing confusion
Some models use multiple templates, requiring a template dict, but we're trying to discourage that in future and move those models to single templates with conditional behaviour instead

The solution:

Just move chat templates to a single chat_template.jinja file in the repo
If multiple templates are required, then they should still be stored in the JSON file. This is not supported for Processor classes, so processors should always be able to save their template as a raw Jinja file. In general, we'll be gently deprecating multiple templates in future.
If a chat_template.jinja file is present, it overrides the JSON files. If a tokenizer is loaded with both Jinja and JSON chat templates and resaved, it should save only the Jinja file, and not have any chat_template entry in tokenizer_config.json.

For now, we continue saving in the old format by default. I'll probably keep it this way for several versions before making the new format the default, to ensure that most users are able to load the new format before it becomes common. Until then, the new format should mostly be used for testing, to make sure it's ready for deployment when we do the switch.

Separate chat templates into a single file by @Rocketknight1 in #33957

Large modular logic refactor

This PR largely rework the logic we use in the modular converter. It is (hopefully) clearer and maintainable. Instead of going in all directions, adding stuff, then deleting it if not needed, we now do the following:

visit all the modular file (record imports/functions/classes/assignments nodes)
- create function dependency mapping
for each import coming from another model:
- visit the corresponding file
- create function dependency mapping
- update mapping with function/assignment from the modular (updated/new functions)
- create the class dependency graph based on merged dependencies
update dependency graph of the modular with the functions and assignments imported from the other files
for each class recorded in the modular:
- if inherithing from class in another file:
  - replace call to super
  - find the dependencies after the node was replaced
  - follow (updated with modular defs) dependency mapping to add all nodes
- else:
  - only add needed imported functions (and their dependencies)
determine the needed imports and add them

Large modular logic refactoring by @Cyrilvallez in #34487

Community bugfixes and improvements

Remove graph breaks for torch.compile() in flash_attention_forward when Lllama Model is padding free tuned by @Abhishek-TAMU in #33932
Better defaults by @ArthurZucker in #34026
translated gguf.md into chinese by @blueingman in #34163
CI: fix failures by @zucchini-nlp in #34371
Zamba is an LM by @LysandreJik in #34342
add code generation to natural language processing section by @furtnerthomas in #34333
Fix pil_torch_interpolation_mapping import in image_processing_detr_fast by @yonigozlan in #34375
Add code sample docstrings and checkpoint reference for GLM models by @h3110Fr13nd in #34360
refactor: remove redundant if-condition and improve type correctness for convert_tokens_to_ids by @winstxnhdw in #34030
Ignore unsupported kwarg in ProcessorMixin call by @yonigozlan in #34285
[PEFT] Add warning for missing key in LoRA adapter by @BenjaminBossan in #34068
Fix torch.fx issue related to the new loss_kwargs keyword argument by @michaelbenayoun in #34380
Correct the new defaults by @Cyrilvallez in #34377
[auto. ping] Avoid sending empty info + add more team members by @ydshieh in #34383
Fix glm by @Cyrilvallez in #34388
Use non nested images and batched text Idefics2/3 by @yonigozlan in #34222
Fix onnx non-expotable inplace aten op by @IlyasMoutawwakil in #34376
Fix right padding in LLaVA models by @zucchini-nlp in #34305
no filter by @ydshieh in #34391
SynthID: better example by @gante in #34372
Tests: upgrade test_eager_matches_sdpa_generate by @gante in #34386
Fix bnb training test failure by @matthewdouglas in #34414
Avoid check expected exception when it is on CUDA by @ydshieh in #34408
Fix typos in agents_advanced.md by @rudydel in #34405
[docs] Cache implementations by @stevhliu in #34325
Fix pix2struct by @IlyasMoutawwakil in #34374
pin tensorflow_probability<0.22 in docker files by @ydshieh in #34381
Tiny update after #34383 by @ydshieh in #34404
Fix batch size handling in prediction_loop for DataLoaderShard by @zeus2611 in #34343
exclude fsdp from delay_optimizer_creation by @eljandoubi in #34140
New option called "best" for args.save_strategy. by @seanswyi in #31817
[docs] update input documentation for MAMBA2 and MISTRAL models to include cache_position and attention_mask details by @h3110Fr13nd in #34322
🌐 [i18n-KO] Translated model_doc/barthez.md to Korean by @Jwaminju in #33980
Apply linting to the important code blocks to make it readable by @ShubhamJagtap2000 in #34449
Torchao weights only + prequantized compability by @SunMarc in #34355
[i18n-ar] Translated file : docs/source/ar/fast_tokenizers.md into Arabic by @AhmedAlmaghz in #33034
enable average tokens across devices by @techkang in #34373
feat: run benchmarks on A100 by @McPatate in #34287
Add post_process_depth_estimation for GLPN by @alex-bene in #34413
LLaVA: latency issues by @zucchini-nlp in #34460
Generation: fix test by @zucchini-nlp in #34369
Fix CI by @zucchini-nlp in #34458
use a tinymodel to test generation config which aviod timeout by @techkang in #34482
🚨🚨🚨 [SuperPoint] Fix keypoint coordinate output and add post processing by @sbucaille in #33200
Simplify running tests in a subprocess by @ydshieh in #34213
Fix perplexity computation in perplexity.md by @Framartin in #34387
Fixes for Modular Converter on Windows by @hlky in #34266
Fix regression loading dtype by @SunMarc in #34409
Bert is ExecuTorch compatible by @guangy10 in #34424
manual head_dim for mixtral model by @wavy-jung in #34281
fix-qwen2vl-no-position_ids by @simonJJJ in #33487
Bug fix for drop path decay rate in swin transformer by @abhi-glitchhg in #34291
MobileBERT is ExecuTorch compatible by @guangy10 in #34473
Albert is ExecuTorch compatible by @guangy10 in #34476
Adding optimizer_cls_and_kwargs to Trainer.__init__ by @apoorvkh in #34358
Fix performance in get_imports regexp by @AlekseyLobanov in #34298
fix incorrect warning by @yonigozlan in #34416
Un-deprecate timeout arg in pipelines by @Rocketknight1 in #34382
Roberta is ExecuTorch compatible by @guangy10 in #34425
Fix format mistake in string repr of tokenizer objects by @gpetho in #34493
Mllama: update docs by @zucchini-nlp in #34334
VLMs: fix number of image tokens by @zucchini-nlp in #34332
Tests: move generate tests to the right mixin and delete redundant tests by @gante in #34464
fix pixtral processor by @molbap in #34486
Use torch 2.5 in scheduled CI by @ydshieh in #34465
Fix super tiny extra space typo by @fzyzcjy in #34440
UPDATE Documentation for #TRANSLATING.md Documentation into Multiple Languages.(Changes made) by @anshumangahlot in #34226
enable QA bf16 pipeline by @jiqing-feng in #34483
Fix: img size mismatch caused by incorrect unpadding in LLaVA-Next by @jp1924 in #34522
Fix step shifting when accumulate gradient by @kibitzing in #33673
avoid calling gc.collect and cuda.empty_cache by @ydshieh in #34514
Qwen2VL: skip base input_ids-inputs_embeds equivalence check by @gante in #34535
fix(DPT,Depth-Anything) Address expected_slice errors inside inference tests by @philkuz in #34518
feat: add benchmarks pg indexes by @McPatate in #34536
make test_eager_matches_sdpa_inference less flaky by @ydshieh in #34512
Bug Fix for issue #34294 by @fpgaminer in #34295
[CLIPSeg] Make interpolate_pos_encoding default to True by @NielsRogge in #34419
update doc by @jiqing-feng in #34478
[i18n-ar] Translated file : docs/source/ar/multilingual.md into Arabic by @AhmedAlmaghz in #33048
Blip: get/set input embeddings correctly by @zucchini-nlp in #34152
BLIP: enable generation tests by @zucchini-nlp in #34174
🔴 🔴 fix query_pre_attn_scalar different of num_heads in default gemma2 config by @molbap in #34540
[i18n-HI] Translated accelerate page to Hindi by @karthik-script in #34443
Update trainer for easier handling of accumulate, compile fixes, and proper reporting by @muellerzr in #34511
VLM: special multimodal Tokenizer by @zucchini-nlp in #34461
MPS: isin_mps_friendly can support 0D tensors by @gante in #34538
Add text support to the Trainer's TensorBoard integration by @JacobLinCool in #34418
[i18n-HI] Translated TFLite page to Hindi by @karthik-script in #34572
🌐 [i18n-KO] Translated perf_train_special.md to Korean by @maximizemaxwell in #34590
🌐 [i18n-KO] Update README_ko.md by @J4BEZ in #33098
fix TrainerState doc because num_input_tokens_seen is unused by defau… by @techkang in #34593
Fix Whisper CI by @ydshieh in #34541
Skip DeepSpeed ZeRO Stage 3 model initialization when bnb by @eljandoubi in #34395
FIX: Broken repr of TorchAoConfig by @BenjaminBossan in #34560
Load sub-configs from composite configs by @zucchini-nlp in #34410
DistilBERT is ExecuTorch compatible by @guangy10 in #34475
Remove unused test_dataset by @thisisiron in #34516
Revert "Fix Whisper CI" by @ydshieh in #34605
Fix #34494 assistant tokens when truncated by @yonigottesman in #34531
Remove @slow for test_eager_matches_sdpa_inference by @ydshieh in #34558
Changing repr in torchao to show quantized Linear by @MekkCyber in #34202
Fix torchvision interpolation CI by @yonigozlan in #34539
🌐 [i18n-KO] Translated convbert.md to Korean by @ahnjj in #34599
fix(dvclive): pass fake dataset to avoid exception in trainer init by @shcheklein in #34455
🌐 [i18n-KO] Translated timesformer.md to Korean by @mreraser in #33972
🌐 [i18n-KO] Translated bert.md to Korean by @maximizemaxwell in #34627
[i18n-ar] Translated file : docs/source/ar/trainer.md into Arabic by @AhmedAlmaghz in #33080
Update llm_engine.py by @louisbrulenaudet in #33332
Agents: turn any Space into a Tool with Tool.from_space() by @aymeric-roucher in #34561
[docs] update not-working model revision by @faaany in #34682
[i18n-ar] Translated file : docs/source/ar/torchscript.md into Arabic by @AhmedAlmaghz in #33079
Agents: Small fixes in streaming to gradio + add tests by @aymeric-roucher in #34549
🌐 [i18n-KO] Translated marian.md to Korean by @maximizemaxwell in #34698
[docs] Broken link in generation_strategies by @pcuenca in #34717
Fix example in EsmConfig docstring by @yuanx749 in #34653
[docs] add xpu device check by @faaany in #34684
Retain newlines in chat template when continue_final_message=True by @lewtun in #34253
Update llava.md by @LysandreJik in #34749
fix(wandb): pass fake dataset to avoid exception in trainer (see #34455) by @CezaPasc in #34720
add xpu path for awq by @jiqing-feng in #34712
FSDP grad accum fix by @winglian in #34645
Remove FSDP wrapping from sub-models. by @eljandoubi in #34452
🧼 remove v4.44 deprecations by @gante in #34245
VLMs: patch_size -> num_image_tokens in processing by @zucchini-nlp in #33424
Fix broken link by @ofek in #34618
fix a typo bug where 'id2label' was incorrectly written as 'i2label' when reading config by @ZuoChenFttS in #34637
Fix skip of test_training_gradient_checkpointing by @dvrogozh in #34723
make sure to disable gradients for integer tensor by @winglian in #32943
[docs] make empty_cache device-agnostic by @faaany in #34774
[docs] add XPU besides CUDA, MPS etc. by @faaany in #34777
[tests] add XPU part to testing by @faaany in #34778
fix: Update pixel_values parameter in hf_model input by @thisisiron in #34782
Fix callback key name by @jung-hunsoo in #34762
fix: Wrong task mentioned in docs by @ecyht2 in #34757
Allow handling files as args for a tool created with Tool.from_space by @aymeric-roucher in #34687
Fix Whisper CI by @ydshieh in #34617
protect tensor parallel usage by @ArthurZucker in #34800
Trainer hyperparameter search kwargs docs update by @GuillemGSubies in #34459
feat: allow to use hf-hub models for timm backbone by @cgebbe in #34729
Support gradient checkpointing in Qwen2VL ViT by @li-plus in #34724
Fix: siglip image processor rgb_convert is not being applied correctly. by @jp1924 in #34301
fix cpu bnb path by @jiqing-feng in #34647
Gemma capping by @ArthurZucker in #34282
Fix cache_utils for optimum.quanto kvcache quantization by @SunMarc in #34750
Modular fix by @Cyrilvallez in #34802
MLU devices : Checks if mlu is available via an cndev-based check which won't trigger the drivers and leave mlu by @huismiling in #34326
🚨🚨🚨 fix(Mask2Former): torch export 🚨🚨🚨 by @philkuz in #34393
Feature: print tokens per second during training by @tibor-reiss in #34507
Add do_convert_rgb to vit by @jp1924 in #34523
Fix post process function called in the instance segmentation example of mask2former by @OnTheThirdDay in #34588
fix crash in tiiuae/falcon-11B-vlm image-to-text generation by @sywangyi in #34728
Add support for OpenAI api "image_url" input in chat for image-text-to-text pipeline by @yonigozlan in #34562
Add Image Processor Fast Deformable DETR by @yonigozlan in #34353
Run test_medium_seamless_m4t_pt in subprocess to avoid many failures by @ydshieh in #34812
Fix check_training_gradient_checkpointing by @ydshieh in #34806
Added image-text-to-text pipeline to task guide by @merveenoyan in #34783
Translate attention.md into Chinese by @wwwbai in #34716
LLaVA OV: fix unpadding precision by @zucchini-nlp in #34779
Fix low memory beam search by @zucchini-nlp in #34746
Fix the memory usage issue of logits in generate() by @kjohew in #34813
fix(DPT,Depth-Anything) torch.export by @philkuz in #34103
Fix: take into account meta device by @tibor-reiss in #34134
Fix hyperparameter search when optuna+deepseed by @corentin-ryr in #34642
Fix CI by tweaking torchao tests by @SunMarc in #34832
Fix CI slack reporting issue by @ydshieh in #34833
VLMs: enable generation tests - last batch by @zucchini-nlp in #34484
Change logging level from warning to info for max_steps overriding num_train_epochs by @qgallouedec in #34810
Fix ds nvme by @eljandoubi in #34444
Fix heuristic scheduling for UAG by @jmamou in #34805
Refactor StarCoder2 using modular by @Cyrilvallez in #34015
Watermarking: fix order by @zucchini-nlp in #34849
Update checks for torch.distributed.tensor to require torch >= 2.5 by @loadams in #34816
Remove quantization related config from dequantized model by @konradkalita in #34856
Auto compile when static cache by @ArthurZucker in #34247
Speculative decoding: Test the target distribution (to prevent issues like #32867) by @keyboardAnt in #34553
smol improvements to support more flexible usage by @andimarafioti in #34857
[CI] Skip EETQ tests while package is broken with latest transformers by @BenjaminBossan in #34854
Bitnet test fix to avoid using gated model by @MekkCyber in #34863
Fix support for image processors modifications in modular by @yonigozlan in #34866
Fix: Enable prefill phase key value caching of nemotron/minitron models by @jeongin601 in #34742
Add safe_globals to resume training on PyTorch 2.6 by @dvrogozh in #34632
Cache: init empty cache when use_cache by @zucchini-nlp in #34274
BLIP: fix generation after hub update by @zucchini-nlp in #34876
[Deberta/Deberta-v2] Refactor code base to support compile, export, and fix LLM by @ArthurZucker in #22105
🔴 Mllama: fix base prefix by @zucchini-nlp in #34874
Sum gathered input tokens by @techkang in #34554
allow unused input parameters passthrough when chunking in asr pipelines by @VictorAtIfInsurance in #33889
prepare_fa2_from_position_ids function bugfix by @meliksahturker in #33269
chore: fix some typos by @wanxiangchwng in #34891
Fix convert_tokens_to_string when decoder is None by @dszeto in #34569
[peft] Given that self.active_adapter is deprecated, avoid using it by @tomaarsen in #34804
Fix Qwen2 failing tests by @jla524 in #34819
Fix : BitNet tests by @MekkCyber in #34895
[AWQ, CI] Bump AWQ version used in docker image by @BenjaminBossan in #34922
fix static cache data type miss-match by @jiqing-feng in #34799
Fix test_auto_backbone_timm_model_from_pretrained by @ydshieh in #34877
Upgrade torch version to 2.5 in dockerfile for quantization CI by @MekkCyber in #34924
Fix failling GGML test by @MekkCyber in #34871
Updated documentation and added conversion utility by @ViktorooReps in #34319
making gpt2 fx traceable by @xuzifei-dmatrix in #34633
Fix import structure for Fast Image processors by @yonigozlan in #34859
VideoLLaVA: add default values by @zucchini-nlp in #34916
Skipping aqlm non working inference tests till fix merged by @MekkCyber in #34865
[Whisper] Fix whisper integration tests by @eustlb in #34111
Add Pytorch Tensor Parallel support for Mistral by @VladOS95-cyber in #34927
change apply_rotary_pos_emb of Glmmodel for GLM-Edge Series model by @zRzRzRzRzRzRzR in #34629
Fix torch.onnx.export of Qwen2-VL vision encoder by @xenova in #34852
Update the Python version in the Chinese README to match the English README. by @vansin in #34870
[i18n-ar] Translated file : docs/source/ar/benchmarks.md into Arabic by @AhmedAlmaghz in #33023
[docs] use device-agnostic API instead of cuda by @faaany in #34913
[doc] use full path for run_qa.py by @faaany in #34914
docs: HUGGINGFACE_HUB_CACHE -> HF_HUB_CACHE by @imba-tjd in #34904
[i18n-zh]Translated tiktoken.md into chinese by @blueingman in #34936
[FlexAttention] Update gemma2 by @ArthurZucker in #34942
Fix : Add PEFT from source to CI docker by @MekkCyber in #34969
Avoid calling get_max_length by @ydshieh in #34971
Fix flaky test execution caused by Thread by @ydshieh in #34966
🌐 [i18n-KO] Translated encoder-decoder.md to Korean by @maximizemaxwell in #34880
[docs] add explanation to release_memory() by @faaany in #34911
[i18n-zh]Translated perf_train_special.md into Chinese by @blueingman in #34948
Fix typo in code block in vipllava.md by @yuanx749 in #34957
Fixed typo in VisitWebpageTool by @sergiopaniego in #34978
[PEFT] Set eval mode when loading PEFT adapter by @BenjaminBossan in #34509
Fix save_pretrained for partially offloaded models by @kylesayrs in #34890
🚨🚨🚨 Changed DINOv2Config default patch size to 14 by @OFSkean in #34568
Refine the code of Universal Assisted Generation by @xinpengzz in #34823
Allow compressed-tensors quantized model to be trained by @horheynm in #34520
Offloaded cache: fix generate by @zucchini-nlp in #34921
Fix utils/check_bad_commit.py (for auto ping in CI) by @ydshieh in #34943
Add optimized PixtralImageProcessorFast by @mgoin in #34836
Improve .from_pretrained type annotations by @qubvel in #34973
Fix docker CI : install autogptq from source by @MekkCyber in #35000
Let server decide default repo visibility by @Wauplin in #34999
🚨🚨🚨 Uniformize kwargs for TrOCR Processor by @tibor-reiss in #34587
Update timm version by @qubvel in #35005
fix: double verbs by @SamuelLarkin in #35008
Update FillMaskPipeline.__call__ signature and docstring by @alvarobartt in #35006
Only cast cu_seqlens when tracing by @xenova in #35016
fix variable undefined bug when return_tensors is not specified in llava processing by @chenweize1998 in #34953
Optimize memory usage of mllama encoder by @milesial in #34930
Typo in warning switching to optimum-quanto by @Bojun-Feng in #35028
Add type hints for forward functions in Gemma2 by @jla524 in #35034
Fix test_eager_matches_sdpa_inference for XPU backend by @dvrogozh in #34889
Multiple typo fixes in Tutorials docs by @henryhmko in #35035
add docstring example for compute_loss_func by @secrettoad in #35020
[i18n-ar] Translated file : docs/source/ar/notebooks.md into Arabic by @AhmedAlmaghz in #33049
[docs] add the missing import for Image and bug fix by @faaany in #34776
Translate bertlogy.md into Chinese by @wwwbai in #34908
Automatic compilation in generate: do not rely on inner function by @Cyrilvallez in #34923
Add token cost + runtime monitoring to Agent and HfEngine children by @aymeric-roucher in #34548
Fix BertGeneration by @ydshieh in #35043
fix speecht5 failure issue in test_peft_gradient_checkpointing_enable… by @sywangyi in #34454
[docs] fix example code bug by @faaany in #35054
Translate community.md into Chinese by @wwwbai in #35013
[docs] use device-agnostic instead of cuda by @faaany in #35047
[docs] use device-agnostic API instead of hard-coded cuda by @faaany in #35048
Fix pad_token_tensor is None in warning by @tshu-w in #34005
Add Pytorch Tensor Parallel support for Qwen2, Qwen2Moe, Starcoder2 by @VladOS95-cyber in #35007
[GPTNeoX] Flex Attention + Refactor by @vasqu in #34896
Support for easier multimodal use of modular by @Cyrilvallez in #35056
[docs] add a comment that offloading requires CUDA GPU by @faaany in #35055
[docs] Increase visibility of torch_dtype="auto" by @stevhliu in #35067
Informative by @ydshieh in #35059
[Whisper] Fix whisper tokenizer by @eustlb in #34537
[tokenizers] bump to 0.21 by @ArthurZucker in #34972
Update Mistral conversion script by @Cyrilvallez in #34829
Fix tie_word_embeddings handling for GGUF models by @Isotr0py in #35085
Deprecate quanto and switch to optimum-quanto by @MekkCyber in #35001
BLIP: this is correct now by @zucchini-nlp in #35081
[trainer] fix the GA model_accepts_loss_kwargs by @ArthurZucker in #34915
Fix flaky Hub CI (test_trainer.py) by @ydshieh in #35062
Adaptive dynamic number of speculative tokens by @jmamou in #34156

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@AhmedAlmaghz
- [i18n-ar] Translated file : docs/source/ar/fast_tokenizers.md into Arabic (#33034)
- [i18n-ar] Translated file : docs/source/ar/multilingual.md into Arabic (#33048)
- [i18n-ar] Translated file : docs/source/ar/trainer.md into Arabic (#33080)
- [i18n-ar] Translated file : docs/source/ar/torchscript.md into Arabic (#33079)
- [i18n-ar] Translated file : docs/source/ar/benchmarks.md into Arabic (#33023)
@maximizemaxwell
- 🌐 [i18n-KO] Translated perf_train_special.md to Korean (#34590)
- 🌐 [i18n-KO] Translated bert.md to Korean (#34627)
- 🌐 [i18n-KO] Translated marian.md to Korean (#34698)
- 🌐 [i18n-KO] Translated encoder-decoder.md to Korean (#34880)
@2015aroras
- Add OLMo November 2024 (#34551)
- Rename OLMo November to OLMo2 (#34864)
@mgoin
- Add optimized PixtralImageProcessorFast (#34836)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.47.0: PaliGemma-2, I-JEPA, OLMo-2, LayerSkip, Tensor Parallel