Skip to content

Added LayoutLMv3 #2178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

carrycooldude
Copy link

Description

This PR fixes the LayoutLMv3 checkpoint conversion script to properly handle different spatial embedding dimensions between the base and large models. The base model uses 128 dimensions for all spatial embeddings, while the large model uses 171 dimensions for x/y coordinates and 170 dimensions for height/width.

Changes Made

  • Added dynamic detection of spatial embedding dimensions from the Hugging Face model
  • Implemented padding for smaller embeddings to match the maximum dimension
  • Updated projection matrices to use consistent dimensions
  • Added detailed debug output for spatial embedding shapes

Technical Details

The conversion script now:

  1. Detects individual dimensions for x, y, h, w embeddings
  2. Uses the maximum dimension (171 for large model) for all embeddings
  3. Pads smaller embeddings (170) with zeros to match the larger dimension
  4. Creates projection matrices with consistent dimensions

Testing

  • Successfully converted both base and large models
  • Verified output shapes match expected dimensions
  • Confirmed no dimension mismatch errors during conversion

Output Example

Screenshot from 2025-03-30 12-50-29

@divyashreepathihalli
Copy link
Collaborator

@carrycooldude That you for the PR - the code structure does not match KerasHub style.
please go through the guide here - https://github.com/keras-team/keras-hub/blob/master/CONTRIBUTING_MODELS.md
Take a look at other model folders.
What would the task model look like?
the preset file contents should be just metadata and kaggle hub path
Can you provide a model code usage example?

@@ -0,0 +1,152 @@
"""Tests for LayoutLMv3 backbone."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these docstring at the start of the file.

@sachinprasadhs
Copy link
Collaborator

Adding General code structuring comments.

  • Add all the files under the model directory only, we don't recommend using sub directories.
  • We don't encourage using Tensorflow specific operation, like tf. , we make the mode design to support backend agnostic.
  • The code does not follow the general code format we follow in Keras Hub, I suggest you to refer other model implementations in detail.
  • Arguments needs to be descriptive, with type of data it accepts and what is the default arguments etc.

Refer any existing model implementations here https://github.com/keras-team/keras-hub/tree/master/keras_hub/src/models

The test cases also should follow the template we are following in the models.

Copy link
Collaborator

@sachinprasadhs sachinprasadhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added few comments, most of it are general practice which we follow. Incorporate those general suggested changes across all the files.
And remove the files and directory which are not required like env directory.

@@ -0,0 +1 @@

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this directory and file

@@ -0,0 +1,4 @@
"""LayoutLMv3 document classifier."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs to be empty, all the import is handled in keras_hub/api directory and will be automatically generated whenever you run git commit -m "<message>"
Make sure you run pre-commit install for the first time.

@@ -0,0 +1,15 @@
from keras_hub.src.models.layoutlmv3.layoutlmv3_backbone import LayoutLMv3Backbone
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is mainly to register presets, follow other models to understand the format we follow.


def __init__(
self,
vocab_size: int = 30522,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove type annotation from everywhere, we don't follow type annotation in Keras Hub

References:
- [LayoutLMv3 Paper](https://arxiv.org/abs/2204.08387)
- [LayoutLMv3 GitHub](https://github.com/microsoft/unilm/tree/master/layoutlmv3)
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire doctring needs to be inside the Backbone class

"""

import os
from typing import Dict, List, Optional, Tuple, Union
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this once type annotation is removed


from .layoutlmv3_tokenizer import LayoutLMv3Tokenizer
from .layoutlmv3_presets import backbone_presets
from .layoutlmv3_transformer import LayoutLMv3TransformerLayer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change from relative imports to absolute imports everywhere.

maintaining spatial relationships in documents.

Args:
vocab_size: int, defaults to 30522. Size of the vocabulary.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format for Args we follow is:
vocab_size: int. Size of the vocabulary. Defaults to 30522

This format should be followed for all and make sure it conveys the proper and complete required information.

```
"""

presets = backbone_presets
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need of this here.

self.use_rel_pos = use_rel_pos
self.rel_pos_bins = rel_pos_bins
self.max_rel_pos = max_rel_pos
self.spatial_embedding_dim = spatial_embedding_dim
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should come at last.
You can follow below order:

# === Layers ===

# === Functional Model ===

# === Config ===

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants