Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with Understanding the Depth of RepViT-M2.3 Model and Code Verification #69

Open
MiguelMC-UNEX opened this issue Jun 16, 2024 · 0 comments

Comments

@MiguelMC-UNEX
Copy link

Hi everyone,

I am currently working with the RepViT-M2.3 model and I am trying to understand the correct configuration for its depth. Specifically, I want to verify if my implementation of the Multi_Level_Extract class aligns with the model specifications. Here's the code I have so far:

import torch
from torch import nn

class Multi_Level_Extract(nn.Module):
    def __init__(self, out_channels):
        super().__init__()
        self.seq = nn.Sequential(
            nn.Conv2d(3, out_channels[0], 7, 2, 3, bias=False),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels[0], out_channels[1], 3, 2, 1, bias=False),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels[1], out_channels[2], 3, 2, 1, bias=False),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels[2], out_channels[3], 3, 2, 1, bias=False),
        )

    def forward(self, x):
        return self.seq(x)

According to my understanding, the depth for the RepViT-M2.3 model should be 34 layers. Here is the configuration part for the SelfAttention class:

class SelfAttention(nn.Module):
    def __init__(self, model_type="m2_3", pretrained=True):
        super(SelfAttention, self).__init__()
        model_config = {
            "m2_3": {
                "d_model": 640,
                "depth": 34,
                "heads": 16,
                "mlp_dim": 2560,
                "model_path": "./model/repvit_m2_3_distill_450e.pth",
                "out_channels": [64, 128, 256, 640]
                },
            # Other configurations...
        }
        # Rest of the class implementation...

My questions are:

Is the depth of 34 layers correct for the RepViT-M2.3 model? I have seen different sources mentioning varying depths, and I want to make sure my configuration is accurate.
Does my implementation of the Multi_Level_Extract class align with the RepViT-M2.3 model specifications? Is there anything I need to change to better fit the model's architecture?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant