Skip to content

v4.24.0: ESM-2/ESMFold, LiLT, Flan-T5, Table Transformer and Contrastive search decoding

Compare
Choose a tag to compare
@sgugger sgugger released this 01 Nov 15:45
· 6587 commits to main since this release
94b3f54

ESM-2/ESMFold

ESM-2 and ESMFold are new state-of-the-art Transformer protein language and folding models from Meta AI's Fundamental AI Research Team (FAIR). ESM-2 is trained with a masked language modeling objective, and it can be easily transferred to sequence and token classification tasks for proteins. Checkpoints exist in various sizes, from 8 million parameters up to a huge 15 billion parameter model.

ESMFold is a state-of-the-art single sequence protein folding model which produces high accuracy predictions significantly faster. Unlike previous protein folding tools like AlphaFold2 and openfold, ESMFold uses a pretrained protein language model to generate token embeddings that are used as input to the folding model, and so does not require a multiple sequence alignment (MSA) of related proteins as input. As a result, proteins can be folded in a single forward pass of the model without requiring any external databases or search/alignment tools to be present at inference time. This hugely reduces the time and compute requirements for folding.

Transformer protein language models were introduced in the paper Biological structure and function emerge from scaling
unsupervised learning to 250 million protein sequences
by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus.

ESMFold was introduced in the paper Language models of protein sequences at the scale of evolution enable accurate structure prediction by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, and Alexander Rives.

LiLT

LiLT allows to combine any pre-trained RoBERTa text encoder with a lightweight Layout Transformer, to enable LayoutLM-like document understanding for many languages.

It was proposed in LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding by Jiapeng Wang, Lianwen Jin, Kai Ding.

Flan-T5

FLAN-T5 is an enhanced version of T5 that has been finetuned on a mixture of tasks.

It was released in the paper Scaling Instruction-Finetuned Language Models by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei.

Table Transformer

Table Transformer is a model that can perform table extraction and table structure recognition from unstructured documents based on the DETR architecture.

It was proposed in PubTables-1M: Towards comprehensive table extraction from unstructured documents by Brandon Smock, Rohith Pesala, Robin Abraham.

Contrastive search decoding

Contrastive search decoding is a new state-of-the-art generation method which aims at reducing the repetitive patterns in which generation models often fall.

It was introduced in A Contrastive Framework for Neural Text Generation by Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, Nigel Collier.

  • Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py by @gmftbyGMFTBY in #19477

Safety and security

We continue to explore the new serialization format not using Pickle via the safetensors library, this time by adding support for TensorFlow models. More checkpoints have been converted to this format. Support is still experimental.

🚨 Breaking changes

The following changes are bugfixes that we have chosen to fix even if it changes the resulting behavior. We mark them as breaking changes, so if you are using this part of the codebase, we recommend you take a look at the PRs to understand what changes were done exactly.

  • 🚨🚨🚨 TF: Remove TFWrappedEmbeddings (breaking: TF embedding initialization updated for encoder-decoder models) by @gante in #19263
  • 🚨🚨🚨 [Breaking change] Deformable DETR intermediate representations by @Narsil in #19678

Bugfixes and improvements

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @arnaudstiegler
    • Make LayoutLM tokenizers independent from BertTokenizer (#19351)
  • @asofiaoliveira
    • Make XLMRoberta model and config independent from Roberta (#19359)
  • @srhrshr
    • Decouples XLMProphet model from Prophet (#19406)
  • @Davidy22
    • Make bert_japanese and cpm independent of their inherited modules (#19431)
    • Clean up deprecation warnings (#19654)
  • @mathieujouffroy
    • [CvT] Tensorflow implementation (#18597)
    • using trunc_normal for weight init & cls_token (#19486)
  • @IMvision12
    • New (#19481)
    • Added type hints to DebertaV2ForMultipleChoice Pytorch (#19536)
    • Update modeling_markuplm.py (#19723)
    • Update modeling_layoutlmv3.py (#19753)
  • @501Good
    • Make MobileBert tokenizers independent from Bert (#19531)
  • @mukesh663
    • Removed Bert interdependency from Funnel transformer (#19655)
    • ]Fixed pegasus config doctest (#19722)
    • [Doctest] Fixing doctest configuration_pegasus_x.py (#19725)
  • @D3xter1922
    • Removed XLMModel inheritance from FlaubertModel(torch+tf) (#19432)
  • @falcaopetri
    • Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode (#18351)
    • Fix bug in Wav2Vec2's GPU tests (#19803)
  • @gmftbyGMFTBY
    • Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py (#19477)
  • @davialvb
    • [ custom_models.mdx ] - Translated to Portuguese the custom models tutorial. (#19779)
    • Added translation of run_scripts.mdx to Portuguese Issue #16824 (#19800)
    • Added translation of converting_tensorflow_models.mdx to Portuguese Issue #16824 (#19824)
    • Added translation of serialization.mdx to Portuguese Issue #16824 (#19869)
  • @alceballosa
    • Spanish translation of multiple_choice.mdx, question_answering.mdx. (#19821)