Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ViViT(Video Vision Transformer) to KerasCV #2335

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

aditya02shah
Copy link

What does this PR do?

Adding ViViT model

Overview:
This PR integrates the ViViT model into KerasCV along with the inclusion of relevant test cases

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue? Please add a link
    to it if that's the case.
  • Did you write any new necessary tests?
  • If this adds a new model, can you run a few training steps on TPU in Colab to ensure that no XLA incompatible OP are used?

Who can review?

@divyashreepathihalli

@divyashreepathihalli
Copy link
Collaborator

Thanks for the PR @aditya02shah, can you please add a colab demo to verify the results and also share the weights file with us. How does this compare to HF implementation?

@aditya02shah
Copy link
Author

@divyashreepathihalli This implementation closely aligns to the one used in keras-examples.
Here is a Colab Demo. It is similar to the HF implementation, but easier to use and with simpler functionality.

@pranavvp16
Copy link
Contributor

@aditya02shah what expected here is the outputs of your model should match with the outputs in the hf implementation in last layer . Am i right @divyashreepathihalli ??

@innat-asj
Copy link
Contributor

FYI, Official implementation: https://github.com/google-research/scenic/tree/aaeaa203bfbbaf3d2c6d9865fe86d1379cfe4a58/scenic/projects/vivit

@divyashreepathihalli
Copy link
Collaborator

divyashreepathihalli commented Feb 12, 2024

@divyashreepathihalli This implementation closely aligns to the one used in keras-examples. Here is a Colab Demo. It is similar to the HF implementation, but easier to use and with simpler functionality.

Thanks Adithya!! If the outputs match the example that is good enough. But I would like to see a colab demo that uses the changes from your PR.
you can test your changes on the colab by installing your repo like this
!pip install -q git+https://github.com/<your-github-username>/keras-cv.git@<branch-name-which-has-the-changes>

@aditya02shah
Copy link
Author

@divyashreepathihalli I've created a Colab demo that incorporates the changes from my pull request. You can access it here

Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @aditya02shah. I have left a few cleanup comments. Also, lets make sure the tests pass.

keras_cv/models/video_classification/vivit.py Outdated Show resolved Hide resolved
keras_cv/models/video_classification/vivit.py Show resolved Hide resolved
self.patch_size = patch_size

def build(self, input_shape):
self.projection = keras.layers.Conv3D(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define all layers in init and build them here like self.layer_name.build(expected_input_shape)

@aditya02shah
Copy link
Author

@divyashreepathihalli I have made the recommended changes. You can find the colab for the latest commit here

@divyashreepathihalli
Copy link
Collaborator

Thanks @aditya02shah!!
one additional chore
please add keras_cv/models/video_classification \ to this file
https://github.com/keras-team/keras-cv/blob/master/.kokoro/github/ubuntu/gpu/build.sh
to line 72 and 86

PS: we will fix this overhead soon, but in the mean time this is what we need to do.

@aditya02shah
Copy link
Author

@divyashreepathihalli No worries, I have updated the build script!

@divyashreepathihalli divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Feb 27, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Feb 27, 2024
Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, just one NIT regarding the build method.

keras_cv/models/video_classification/vivit.py Show resolved Hide resolved
@aditya02shah
Copy link
Author

@divyashreepathihalli I have made revisions to the build method. Colab for the latest changes.

@divyashreepathihalli divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Mar 4, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Mar 4, 2024
@divyashreepathihalli
Copy link
Collaborator

Thanks for the update @aditya02shah! there is one error that needs to be fixed

_________________________ ViViT_Test.test_saved_model __________________________

self = 

    @pytest.mark.large  # Saving is slow, so mark these large.
    def test_saved_model(self):
        input_shape = (28, 28, 28, 1)
        num_classes = 11
        patch_size = (8, 8, 8)
        layer_norm_eps = 1e-6
        projection_dim = 128
        num_heads = 8
        num_layers = 8
    
>       model = ViViT(
            projection_dim=projection_dim,
            patch_size=patch_size,
            inp_shape=input_shape,
            transformer_layers=num_layers,
            num_heads=num_heads,
            layer_norm_eps=layer_norm_eps,
            num_classes=num_classes,
        )

keras_cv/models/video_classification/vivit_test.py:135: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
keras_cv/models/video_classification/vivit.py:107: in __init__
    super().__init__(**kwargs)
keras_cv/models/task.py:30: in __init__
    super().__init__(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 
args = (), kwargs = {}, previous_value = True

    def _method_wrapper(self, *args, **kwargs):
      previous_value = getattr(self, "_self_setattr_tracking", True)
      self._self_setattr_tracking = False  # pylint: disable=protected-access
      try:
>       result = method(self, *args, **kwargs)
E       TypeError: __init__() missing 2 required positional arguments: 'inputs' and 'outputs'

/tmpfs/venv/lib/python3.9/site-packages/tensorflow/python/trackable/base.py:204: TypeError
__________________________ ViViT_Test.test_vivit_call __________________________

self = 

    def test_vivit_call(self):
        input_shape = (28, 28, 28, 1)
        num_classes = 11
        patch_size = (8, 8, 8)
        layer_norm_eps = 1e-6
        projection_dim = 128
        num_heads = 8
        num_layers = 8
    
>       model = ViViT(
            projection_dim=projection_dim,
            patch_size=patch_size,
            inp_shape=input_shape,
            transformer_layers=num_layers,
            num_heads=num_heads,
            layer_norm_eps=layer_norm_eps,
            num_classes=num_classes,
        )

keras_cv/models/video_classification/vivit_test.py:67: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
keras_cv/models/video_classification/vivit.py:107: in __init__
    super().__init__(**kwargs)
keras_cv/models/task.py:30: in __init__
    super().__init__(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 
args = (), kwargs = {}, previous_value = True

    def _method_wrapper(self, *args, **kwargs):
      previous_value = getattr(self, "_self_setattr_tracking", True)
      self._self_setattr_tracking = False  # pylint: disable=protected-access
      try:
>       result = method(self, *args, **kwargs)
E       TypeError: __init__() missing 2 required positional arguments: 'inputs' and 'outputs'

/tmpfs/venv/lib/python3.9/site-packages/tensorflow/python/trackable/base.py:204: TypeError
______________________ ViViT_Test.test_vivit_construction ______________________

self = 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants