Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about QSAttn. #76

Open
nhw649 opened this issue Oct 23, 2023 · 8 comments
Open

Some questions about QSAttn. #76

nhw649 opened this issue Oct 23, 2023 · 8 comments

Comments

@nhw649
Copy link

nhw649 commented Oct 23, 2023

(1) When computing category codes, src is support feature, why do src, category_code and tsp need to do QSAttn? It's not mentioned in the paper.
(2) Why do QSAttn only on the first layer?

In deformable_transformer.py:

def forward_supp_branch(self, src, pos, reference_points, spatial_shapes, level_start_index, padding_mask, tsp, support_boxes):
    # self attention
    src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
    src = src + self.dropout1(src2)
    src = self.norm1(src)

    support_img_h, support_img_w = spatial_shapes[0, 0], spatial_shapes[0, 1]
    supp_roi = torchvision.ops.roi_align(
        src.transpose(1, 2).reshape(src.shape[0], -1, support_img_h, support_img_w),
        support_boxes,
        output_size=(7, 7),
        spatial_scale=1 / 32.0,
        aligned=True)

    supp_roi=supp_roi.mean(3).mean(2) # [episode_size, C]
    category_code = supp_roi.sigmoid() # [episode_size, C]

    if self.QSAttn:
        # siamese attention
        src, tsp = self.siamese_attn(src,
                                     inverse_sigmoid(category_code).unsqueeze(0).expand(src.shape[0], -1, -1),
                                     category_code.unsqueeze(0).expand(src.shape[0], -1, -1),
                                     tsp)

        # ffn
        src = self.forward_ffn(src + tsp)
    else:
        src = self.forward_ffn(src)

    return src, category_code
@nanfangAlan
Copy link

You could find answers in Sec 4.2 and Table 8.

@ZhangGongjie
Copy link
Owner

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

@nhw649
Copy link
Author

nhw649 commented Nov 16, 2023

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

When computing category codes, src is support feature, why do src, category_code and tsp need to do QSAttn? It's not mentioned in the paper. Cannot be seen from fig. 4.

In deformable_transformer.py:

def forward_supp_branch(self, src, pos, reference_points, spatial_shapes, level_start_index, padding_mask, tsp, support_boxes):
    # self attention
    src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
    src = src + self.dropout1(src2)
    src = self.norm1(src)

    support_img_h, support_img_w = spatial_shapes[0, 0], spatial_shapes[0, 1]
    supp_roi = torchvision.ops.roi_align(
        src.transpose(1, 2).reshape(src.shape[0], -1, support_img_h, support_img_w),
        support_boxes,
        output_size=(7, 7),
        spatial_scale=1 / 32.0,
        aligned=True)

    supp_roi=supp_roi.mean(3).mean(2) # [episode_size, C]
    category_code = supp_roi.sigmoid() # [episode_size, C]

    if self.QSAttn:
        # siamese attention
        src, tsp = self.siamese_attn(src,
                                     inverse_sigmoid(category_code).unsqueeze(0).expand(src.shape[0], -1, -1),
                                     category_code.unsqueeze(0).expand(src.shape[0], -1, -1),
                                     tsp)

        # ffn
        src = self.forward_ffn(src + tsp)
    else:
        src = self.forward_ffn(src)

    return src, category_code

@Yuxuan-W
Copy link

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

Sorry for my careless reading, the comment has been removed for clarification.

@ZhangGongjie
Copy link
Owner

@Yuxuan-W No problem. Any discussions are welcome.

@nhw649 Do allow me some time to address your concern, as I have a full-time job now, and I haven't read my own paper for a long period of time.

@nhw649
Copy link
Author

nhw649 commented Nov 16, 2023

@Yuxuan-W No problem. Any discussions are welcome.

@nhw649 Do allow me some time to address your concern, as I have a full-time job now, and I haven't read my own paper for a long period of time.

Okay, please. Thank you.

@Yuxuan-W
Copy link

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

When computing category codes, src is support feature, why do src, category_code and tsp need to do QSAttn? It's not mentioned in the paper. Cannot be seen from fig. 4.

In deformable_transformer.py:

def forward_supp_branch(self, src, pos, reference_points, spatial_shapes, level_start_index, padding_mask, tsp, support_boxes):
    # self attention
    src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
    src = src + self.dropout1(src2)
    src = self.norm1(src)

    support_img_h, support_img_w = spatial_shapes[0, 0], spatial_shapes[0, 1]
    supp_roi = torchvision.ops.roi_align(
        src.transpose(1, 2).reshape(src.shape[0], -1, support_img_h, support_img_w),
        support_boxes,
        output_size=(7, 7),
        spatial_scale=1 / 32.0,
        aligned=True)

    supp_roi=supp_roi.mean(3).mean(2) # [episode_size, C]
    category_code = supp_roi.sigmoid() # [episode_size, C]

    if self.QSAttn:
        # siamese attention
        src, tsp = self.siamese_attn(src,
                                     inverse_sigmoid(category_code).unsqueeze(0).expand(src.shape[0], -1, -1),
                                     category_code.unsqueeze(0).expand(src.shape[0], -1, -1),
                                     tsp)

        # ffn
        src = self.forward_ffn(src + tsp)
    else:
        src = self.forward_ffn(src)

    return src, category_code

In the DeformableTransformerEncoder, you will find self.QSAttn == True is only for the first layer. However, if you look at the category code computed for the first layer, namely category_code[0], you will find it is not involved in the siamese_attn. In another word, only the category_code[1:] are effected by siamese_attn, but they are never used.

So I guess it's just for convenience in implementation. There's no problem and it is aligned with the paper.

@ZhangGongjie
Copy link
Owner

I hope Yuxuan-W's comments have addressed your concerns. @nhw649 Thank you @Yuxuan-W !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants