Some questions about QSAttn. #76

nhw649 · 2023-10-23T09:40:48Z

(1) When computing category codes, src is support feature, why do src, category_code and tsp need to do QSAttn? It's not mentioned in the paper.
(2) Why do QSAttn only on the first layer?

In deformable_transformer.py:

def forward_supp_branch(self, src, pos, reference_points, spatial_shapes, level_start_index, padding_mask, tsp, support_boxes):
    # self attention
    src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
    src = src + self.dropout1(src2)
    src = self.norm1(src)

    support_img_h, support_img_w = spatial_shapes[0, 0], spatial_shapes[0, 1]
    supp_roi = torchvision.ops.roi_align(
        src.transpose(1, 2).reshape(src.shape[0], -1, support_img_h, support_img_w),
        support_boxes,
        output_size=(7, 7),
        spatial_scale=1 / 32.0,
        aligned=True)

    supp_roi=supp_roi.mean(3).mean(2) # [episode_size, C]
    category_code = supp_roi.sigmoid() # [episode_size, C]

    if self.QSAttn:
        # siamese attention
        src, tsp = self.siamese_attn(src,
                                     inverse_sigmoid(category_code).unsqueeze(0).expand(src.shape[0], -1, -1),
                                     category_code.unsqueeze(0).expand(src.shape[0], -1, -1),
                                     tsp)

        # ffn
        src = self.forward_ffn(src + tsp)
    else:
        src = self.forward_ffn(src)

    return src, category_code

The text was updated successfully, but these errors were encountered:

nanfangAlan · 2023-10-23T13:39:13Z

You could find answers in Sec 4.2 and Table 8.

ZhangGongjie · 2023-11-16T05:23:55Z

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

nhw649 · 2023-11-16T05:29:13Z

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

When computing category codes, src is support feature, why do src, category_code and tsp need to do QSAttn? It's not mentioned in the paper. Cannot be seen from fig. 4.

In deformable_transformer.py:

def forward_supp_branch(self, src, pos, reference_points, spatial_shapes, level_start_index, padding_mask, tsp, support_boxes):
    # self attention
    src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
    src = src + self.dropout1(src2)
    src = self.norm1(src)

    support_img_h, support_img_w = spatial_shapes[0, 0], spatial_shapes[0, 1]
    supp_roi = torchvision.ops.roi_align(
        src.transpose(1, 2).reshape(src.shape[0], -1, support_img_h, support_img_w),
        support_boxes,
        output_size=(7, 7),
        spatial_scale=1 / 32.0,
        aligned=True)

    supp_roi=supp_roi.mean(3).mean(2) # [episode_size, C]
    category_code = supp_roi.sigmoid() # [episode_size, C]

    if self.QSAttn:
        # siamese attention
        src, tsp = self.siamese_attn(src,
                                     inverse_sigmoid(category_code).unsqueeze(0).expand(src.shape[0], -1, -1),
                                     category_code.unsqueeze(0).expand(src.shape[0], -1, -1),
                                     tsp)

        # ffn
        src = self.forward_ffn(src + tsp)
    else:
        src = self.forward_ffn(src)

    return src, category_code

Yuxuan-W · 2023-11-16T05:32:28Z

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

Sorry for my careless reading, the comment has been removed for clarification.

ZhangGongjie · 2023-11-16T05:34:21Z

@Yuxuan-W No problem. Any discussions are welcome.

@nhw649 Do allow me some time to address your concern, as I have a full-time job now, and I haven't read my own paper for a long period of time.

nhw649 · 2023-11-16T05:40:09Z

@Yuxuan-W No problem. Any discussions are welcome.

@nhw649 Do allow me some time to address your concern, as I have a full-time job now, and I haven't read my own paper for a long period of time.

Okay, please. Thank you.

Yuxuan-W · 2023-11-16T05:55:54Z

@Yuxuan-W The implementation is the same as the paper. And if you read codes carefully, you will find CAM is not used at every encoder layer.

When computing category codes, src is support feature, why do src, category_code and tsp need to do QSAttn? It's not mentioned in the paper. Cannot be seen from fig. 4.

In deformable_transformer.py:

def forward_supp_branch(self, src, pos, reference_points, spatial_shapes, level_start_index, padding_mask, tsp, support_boxes):
    # self attention
    src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
    src = src + self.dropout1(src2)
    src = self.norm1(src)

    support_img_h, support_img_w = spatial_shapes[0, 0], spatial_shapes[0, 1]
    supp_roi = torchvision.ops.roi_align(
        src.transpose(1, 2).reshape(src.shape[0], -1, support_img_h, support_img_w),
        support_boxes,
        output_size=(7, 7),
        spatial_scale=1 / 32.0,
        aligned=True)

    supp_roi=supp_roi.mean(3).mean(2) # [episode_size, C]
    category_code = supp_roi.sigmoid() # [episode_size, C]

    if self.QSAttn:
        # siamese attention
        src, tsp = self.siamese_attn(src,
                                     inverse_sigmoid(category_code).unsqueeze(0).expand(src.shape[0], -1, -1),
                                     category_code.unsqueeze(0).expand(src.shape[0], -1, -1),
                                     tsp)

        # ffn
        src = self.forward_ffn(src + tsp)
    else:
        src = self.forward_ffn(src)

    return src, category_code

In the DeformableTransformerEncoder, you will find self.QSAttn == True is only for the first layer. However, if you look at the category code computed for the first layer, namely category_code[0], you will find it is not involved in the siamese_attn. In another word, only the category_code[1:] are effected by siamese_attn, but they are never used.

So I guess it's just for convenience in implementation. There's no problem and it is aligned with the paper.

ZhangGongjie · 2023-11-21T01:50:51Z

I hope Yuxuan-W's comments have addressed your concerns. @nhw649 Thank you @Yuxuan-W !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about QSAttn. #76

Some questions about QSAttn. #76

nhw649 commented Oct 23, 2023

nanfangAlan commented Oct 23, 2023

ZhangGongjie commented Nov 16, 2023

nhw649 commented Nov 16, 2023

Yuxuan-W commented Nov 16, 2023

ZhangGongjie commented Nov 16, 2023

nhw649 commented Nov 16, 2023

Yuxuan-W commented Nov 16, 2023

ZhangGongjie commented Nov 21, 2023

Some questions about QSAttn. #76

Some questions about QSAttn. #76

Comments

nhw649 commented Oct 23, 2023

nanfangAlan commented Oct 23, 2023

ZhangGongjie commented Nov 16, 2023

nhw649 commented Nov 16, 2023

Yuxuan-W commented Nov 16, 2023

ZhangGongjie commented Nov 16, 2023

nhw649 commented Nov 16, 2023

Yuxuan-W commented Nov 16, 2023

ZhangGongjie commented Nov 21, 2023