Skip to content

Issues with Running MaskCLIP++ Demo #3

Closed
@oneHFR

Description

@oneHFR

Hi~~ I have followed the official installation instructions to set up the environment and installed all the necessary weights.

Environment Specifications:

  • torch.__version__: 2.5.1+cu124
  • torch.version.cuda: 12.4
  • torch.backends.cudnn.version: 90100
  • Python: 3.10.16

Installation Steps Followed:

  1. Installation:

  2. Preparations:

    • Datasets: Prepared as per [Preparing Datasets for MaskCLIP++](datasets/README.md).
    • Pretrained CLIP Models: Downloaded automatically from Hugging Face.
    • Mask Generators: Downloaded the required mask generator models manually using the provided URLs and placed them in the specified paths.
    • Fine-tuning on COCO-Stuff Dataset
  3. Demo Usage:

    config="configs/coco-stuff/eva-clip-vit-l-14-336/maft-l/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext_maft-l_ens.yaml"
    ckpt="output/ckpts/maskclippp/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext.pth"
    python demo/app.py \
        --config-file $config \
        --opts \
        MODEL.WEIGHTS $ckpt \
        MODEL.MASK_FORMER.TEST.PANOPTIC_ON False \
        MODEL.MASK_FORMER.TEST.INSTANCE_ON False \
        MODEL.MASK_FORMER.TEST.SEMANTIC_ON True

    Here is mine:

    python demo/app.py --config-file configs/coco-stuff/eva-clip-vit-l-14-336/maft-l/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext_maft-l_ens.yaml --opts MODEL.WEIGHTS output/ckpts/maskclippp/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext.pth MODEL.MASK_FORMER.TEST.PANOPTIC_ON False MODEL.MASK_FORMER.TEST.INSTANCE_ON False MODEL.MASK_FORMER.TEST.SEMANTIC_ON True

Issue:

During the execution, I encountered several issues, mostly related to library version mismatches. I was able to resolve some of them with minor code modifications. For example, I removed the type annotation as shown below:

# Original function definition with type annotation
def extract_features(self, inputs: PaddedList) -> Dict[str, Tensor]:

# Modified function definition without type annotation
def extract_features(self, inputs: PaddedList):
    if self._finetune_none:
        self.eval()
        with torch.no_grad():
            return self._extract_features(inputs.images)
    else:
        return self._extract_features(inputs.images)

However, I am now stuck at the following error:

[01/08 03:13:38 detectron2]: Predefined Classes: ['cocostuff']
User Classes: []
Available features: dict_keys(['stage1_f', 'stage2_f', 'stage3_f', 'stage4_f', 'input_f', 'test_t_embs_f', 'num_synonyms'])
Required features: ['stage2_f', 'stage3_f', 'stage4_f']

/root/miniconda3/envs/mcp/lib/python3.10/site-packages/torch/functional.py:534: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3595.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

Traceback (most recent call last):
  File "/root/autodl-tmp/MaskCLIPpp/demo/app.py", line 90, in process_image
    predictions, visualized_output = meta_demo.run_on_image(bgr_image)
  File "/root/autodl-tmp/MaskCLIPpp/demo/predictor.py", line 220, in run_on_image
    predictions = self.predictor(image)
  File "/root/autodl-tmp/MaskCLIPpp/demo/detectron2/detectron2/engine/defaults.py", line 351, in __call__
    predictions = self.model([inputs])[0]
  File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/maskclippp.py", line 623, in forward
    encode_dict.update(self.visual_encoder(images, masks=valid_masks_by_imgs))  # List(B) of Q,D
  File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 382, in extract_features
    return self._extract_features(inputs.images, masks)
  File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 338, in _extract_features
    attn_biases, areas = self._masks_to_attn_biases(masks, curr_grid_size)
  File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 310, in _masks_to_attn_biases
    attn_bias[:, :, :Q, -hw:].copy_(down_mask)
RuntimeError: output with shape [3, 16, 1, 1064] doesn't match the broadcast shape [3, 16, 3, 1064]

Current Progress:

I have modified some of the code to address some compatibility issues. Currently, the error seems to be occurring in the following function:

attn_bias[:, :, :Q, -hw:].copy_(down_mask)

Question:

Is completing the above modifications sufficient to run the demo? It appears that running the demo with the following command leads to a series of errors:

python demo/app.py --config-file configs/coco-stuff/eva-clip-vit-l-14-336/maft-l/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext_maft-l_ens.yaml --opts MODEL.WEIGHTS output/ckpts/maskclippp/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext.pth MODEL.MASK_FORMER.TEST.PANOPTIC_ON False MODEL.MASK_FORMER.TEST.INSTANCE_ON False MODEL.MASK_FORMER.TEST.SEMANTIC_ON True

Could you please advise on what additional installations or configurations are required to successfully run the demo?


Issue in Chinese

我按照官方的安装说明完成了环境搭建,并安装了所有必要的权重.

环境基本参数:

  • torch.__version__: 2.5.1+cu124
  • torch.version.cuda: 12.4
  • torch.backends.cudnn.version: 90100
  • Python: 3.10.16

安装步骤:

  1. 安装:

  2. 准备工作:

    • deleted #4
    • 预训练的 CLIP 模型: 从 Hugging Face 自动下载。
    • Mask 生成器: 根据提供的链接手动下载所需的 Mask 生成器模型,并放置在指定路径。
    • 在 COCO-Stuff 数据集上的微调的权重下载
  3. Demo 使用指令:

    python demo/app.py --config-file configs/coco-stuff/eva-clip-vit-l-14-336/maft-l/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext_maft-l_ens.yaml --opts MODEL.WEIGHTS output/ckpts/maskclippp/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext.pth MODEL.MASK_FORMER.TEST.PANOPTIC_ON False MODEL.MASK_FORMER.TEST.INSTANCE_ON False MODEL.MASK_FORMER.TEST.SEMANTIC_ON True

问题:

在执行过程中,我遇到了不少问题,主要是库的版本不匹配。通过微小的代码修改,我解决了一些问题。例如,删除了类型注解,如下所示:

# 原始函数定义带有类型注解
def extract_features(self, inputs: PaddedList) -> Dict[str, Tensor]:

# 修改后的函数定义移除了类型注解
def extract_features(self, inputs: PaddedList):
    if self._finetune_none:
        self.eval()
        with torch.no_grad():
            return self._extract_features(inputs.images)
    else:
        return self._extract_features(inputs.images)

但是,现在卡在了以下错误:

[01/08 03:13:38 detectron2]: Predefined Classes: ['cocostuff']
User Classes: []
Available features: dict_keys(['stage1_f', 'stage2_f', 'stage3_f', 'stage4_f', 'input_f', 'test_t_embs_f', 'num_synonyms'])
Required features: ['stage2_f', 'stage3_f', 'stage4_f']

/root/miniconda3/envs/mcp/lib/python3.10/site-packages/torch/functional.py:534: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3595.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

Traceback (most recent call last):
  File "/root/autodl-tmp/MaskCLIPpp/demo/app.py", line 90, in process_image
    predictions, visualized_output = meta_demo.run_on_image(bgr_image)
  File "/root/autodl-tmp/MaskCLIPpp/demo/predictor.py", line 220, in run_on_image
    predictions = self.predictor(image)
  File "/root/autodl-tmp/MaskCLIPpp/demo/detectron2/detectron2/engine/defaults.py", line 351, in __call__
    predictions = self.model([inputs])[0]
  File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/maskclippp.py", line 623, in forward
    encode_dict.update(self.visual_encoder(images, masks=valid_masks_by_imgs))  # List(B) of Q,D
  File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 382, in extract_features
    return self._extract_features(inputs.images, masks)
  File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 338, in _extract_features
    attn_biases, areas = self._masks_to_attn_biases(masks, curr_grid_size)
  File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 310, in _masks_to_attn_biases
    attn_bias[:, :, :Q, -hw:].copy_(down_mask)
RuntimeError: output with shape [3, 16, 1, 1064] doesn't match the broadcast shape [3, 16, 3, 1064]

当前进展:

我逐步修改了一些代码的细节,目前错误应该发生在以下函数中:

attn_bias[:, :, :Q, -hw:].copy_(down_mask)

问题:

是否完成上述基本操作其实是远远不够的,实际上还需要完善更多配置才能运行demo?运行以下命令时出现了一连串的错误,每一个错误都让我猜测是不是缺少了某个环节操作,所以有一系列的某个形参的mask没有传入,每个特征的维度不匹配无法继续传递等问题

python demo/app.py --config-file configs/coco-stuff/eva-clip-vit-l-14-336/maft-l/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext_maft-l_ens.yaml --opts MODEL.WEIGHTS output/ckpts/maskclippp/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext.pth MODEL.MASK_FORMER.TEST.PANOPTIC_ON False MODEL.MASK_FORMER.TEST.INSTANCE_ON False MODEL.MASK_FORMER.TEST.SEMANTIC_ON True

请问我还需要安装或配置哪些内容才能成功运行demo?非常感谢期待您的回复解答!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions