Issues with Running MaskCLIP++ Demo

Hi~~ I have followed the official installation instructions to set up the environment and installed all the necessary weights.

**Environment Specifications:**

- `torch.__version__`: 2.5.1+cu124
- `torch.version.cuda`: 12.4
- `torch.backends.cudnn.version`: 90100
- `Python`: 3.10.16

**Installation Steps Followed:**

1. **Installation:**
   - [x]  Followed the [[installation instructions](https://chat1.sorryios.chat/c/INSTALL.md)](INSTALL.md) provided in the repository.

2. **Preparations:**
   - [ ] ~~**Datasets:** Prepared as per [[Preparing Datasets for MaskCLIP++](https://chat1.sorryios.chat/c/datasets/README.md)](datasets/README.md).~~
   - [x]  **Pretrained CLIP Models:** Downloaded automatically from Hugging Face.
   - [x] **Mask Generators:** Downloaded the required mask generator models manually using the provided URLs and placed them in the specified paths.
   - [x] **Fine-tuning on COCO-Stuff Dataset**

3. **Demo Usage:**

   ```shell
   config="configs/coco-stuff/eva-clip-vit-l-14-336/maft-l/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext_maft-l_ens.yaml"
   ckpt="output/ckpts/maskclippp/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext.pth"
   python demo/app.py \
       --config-file $config \
       --opts \
       MODEL.WEIGHTS $ckpt \
       MODEL.MASK_FORMER.TEST.PANOPTIC_ON False \
       MODEL.MASK_FORMER.TEST.INSTANCE_ON False \
       MODEL.MASK_FORMER.TEST.SEMANTIC_ON True
   ```

      Here is mine:
      ```bash
      python demo/app.py --config-file configs/coco-stuff/eva-clip-vit-l-14-336/maft-l/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext_maft-l_ens.yaml --opts MODEL.WEIGHTS output/ckpts/maskclippp/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext.pth MODEL.MASK_FORMER.TEST.PANOPTIC_ON False MODEL.MASK_FORMER.TEST.INSTANCE_ON False MODEL.MASK_FORMER.TEST.SEMANTIC_ON True
      ```


**Issue:**

During the execution, I encountered several issues, mostly related to library version mismatches. I was able to resolve some of them with minor code modifications. For example, I removed the type annotation as shown below:

```python
# Original function definition with type annotation
def extract_features(self, inputs: PaddedList) -> Dict[str, Tensor]:

# Modified function definition without type annotation
def extract_features(self, inputs: PaddedList):
    if self._finetune_none:
        self.eval()
        with torch.no_grad():
            return self._extract_features(inputs.images)
    else:
        return self._extract_features(inputs.images)
```

However, I am now stuck at the following error:

```python
[01/08 03:13:38 detectron2]: Predefined Classes: ['cocostuff']
User Classes: []
Available features: dict_keys(['stage1_f', 'stage2_f', 'stage3_f', 'stage4_f', 'input_f', 'test_t_embs_f', 'num_synonyms'])
Required features: ['stage2_f', 'stage3_f', 'stage4_f']

/root/miniconda3/envs/mcp/lib/python3.10/site-packages/torch/functional.py:534: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3595.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

Traceback (most recent call last):
  File "/root/autodl-tmp/MaskCLIPpp/demo/app.py", line 90, in process_image
    predictions, visualized_output = meta_demo.run_on_image(bgr_image)
  File "/root/autodl-tmp/MaskCLIPpp/demo/predictor.py", line 220, in run_on_image
    predictions = self.predictor(image)
  File "/root/autodl-tmp/MaskCLIPpp/demo/detectron2/detectron2/engine/defaults.py", line 351, in __call__
    predictions = self.model([inputs])[0]
  File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/maskclippp.py", line 623, in forward
    encode_dict.update(self.visual_encoder(images, masks=valid_masks_by_imgs))  # List(B) of Q,D
  File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 382, in extract_features
    return self._extract_features(inputs.images, masks)
  File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 338, in _extract_features
    attn_biases, areas = self._masks_to_attn_biases(masks, curr_grid_size)
  File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 310, in _masks_to_attn_biases
    attn_bias[:, :, :Q, -hw:].copy_(down_mask)
RuntimeError: output with shape [3, 16, 1, 1064] doesn't match the broadcast shape [3, 16, 3, 1064]
```

**Current Progress:**

I have modified some of the code to address some compatibility issues. Currently, the error seems to be occurring in the following function:

```python
attn_bias[:, :, :Q, -hw:].copy_(down_mask)
```

**Question:**

Is completing the above modifications sufficient to run the demo? It appears that running the demo with the following command leads to a series of errors:

```bash
python demo/app.py --config-file configs/coco-stuff/eva-clip-vit-l-14-336/maft-l/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext_maft-l_ens.yaml --opts MODEL.WEIGHTS output/ckpts/maskclippp/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext.pth MODEL.MASK_FORMER.TEST.PANOPTIC_ON False MODEL.MASK_FORMER.TEST.INSTANCE_ON False MODEL.MASK_FORMER.TEST.SEMANTIC_ON True
```

Could you please advise on what additional installations or configurations are required to successfully run the demo?

---

### Issue in Chinese



我按照官方的[安装说明](https://chat1.sorryios.chat/c/INSTALL.md)完成了环境搭建，并安装了所有必要的权重.

**环境基本参数:**
- `torch.__version__`: 2.5.1+cu124
- `torch.version.cuda`: 12.4
- `torch.backends.cudnn.version`: 90100
- `Python`: 3.10.16

**安装步骤:**

1. **安装:**
   - [x] 按照[安装说明](INSTALL.md)完成环境搭建。

2. **准备工作:**
   - [x] #4
   - [x] **预训练的 CLIP 模型:** 从 Hugging Face 自动下载。
   - [x] **Mask 生成器:** 根据提供的链接手动下载所需的 Mask 生成器模型，并放置在指定路径。
   - [x]  **在 COCO-Stuff 数据集上的微调的权重下载**

3. **Demo 使用指令:**
      ```bash
      python demo/app.py --config-file configs/coco-stuff/eva-clip-vit-l-14-336/maft-l/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext_maft-l_ens.yaml --opts MODEL.WEIGHTS output/ckpts/maskclippp/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext.pth MODEL.MASK_FORMER.TEST.PANOPTIC_ON False MODEL.MASK_FORMER.TEST.INSTANCE_ON False MODEL.MASK_FORMER.TEST.SEMANTIC_ON True
      ```

**问题:**

在执行过程中，我遇到了不少问题，主要是库的版本不匹配。通过微小的代码修改，我解决了一些问题。例如，删除了类型注解，如下所示：

```python
# 原始函数定义带有类型注解
def extract_features(self, inputs: PaddedList) -> Dict[str, Tensor]:

# 修改后的函数定义移除了类型注解
def extract_features(self, inputs: PaddedList):
    if self._finetune_none:
        self.eval()
        with torch.no_grad():
            return self._extract_features(inputs.images)
    else:
        return self._extract_features(inputs.images)
```

但是，现在卡在了以下错误：

```python
[01/08 03:13:38 detectron2]: Predefined Classes: ['cocostuff']
User Classes: []
Available features: dict_keys(['stage1_f', 'stage2_f', 'stage3_f', 'stage4_f', 'input_f', 'test_t_embs_f', 'num_synonyms'])
Required features: ['stage2_f', 'stage3_f', 'stage4_f']

/root/miniconda3/envs/mcp/lib/python3.10/site-packages/torch/functional.py:534: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3595.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

Traceback (most recent call last):
  File "/root/autodl-tmp/MaskCLIPpp/demo/app.py", line 90, in process_image
    predictions, visualized_output = meta_demo.run_on_image(bgr_image)
  File "/root/autodl-tmp/MaskCLIPpp/demo/predictor.py", line 220, in run_on_image
    predictions = self.predictor(image)
  File "/root/autodl-tmp/MaskCLIPpp/demo/detectron2/detectron2/engine/defaults.py", line 351, in __call__
    predictions = self.model([inputs])[0]
  File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/maskclippp.py", line 623, in forward
    encode_dict.update(self.visual_encoder(images, masks=valid_masks_by_imgs))  # List(B) of Q,D
  File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 382, in extract_features
    return self._extract_features(inputs.images, masks)
  File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 338, in _extract_features
    attn_biases, areas = self._masks_to_attn_biases(masks, curr_grid_size)
  File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 310, in _masks_to_attn_biases
    attn_bias[:, :, :Q, -hw:].copy_(down_mask)
RuntimeError: output with shape [3, 16, 1, 1064] doesn't match the broadcast shape [3, 16, 3, 1064]
```

**当前进展:**

我逐步修改了一些代码的细节，目前错误应该发生在以下函数中：

```python
attn_bias[:, :, :Q, -hw:].copy_(down_mask)
```

**问题:**

是否完成上述基本操作其实是远远不够的，实际上还需要完善更多配置才能运行demo？运行以下命令时出现了一连串的错误，每一个错误都让我猜测是不是缺少了某个环节操作，所以有一系列的某个形参的mask没有传入，每个特征的维度不匹配无法继续传递等问题

```bash
python demo/app.py --config-file configs/coco-stuff/eva-clip-vit-l-14-336/maft-l/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext_maft-l_ens.yaml --opts MODEL.WEIGHTS output/ckpts/maskclippp/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext.pth MODEL.MASK_FORMER.TEST.PANOPTIC_ON False MODEL.MASK_FORMER.TEST.INSTANCE_ON False MODEL.MASK_FORMER.TEST.SEMANTIC_ON True
```

请问我还需要安装或配置哪些内容才能成功运行demo？非常感谢期待您的回复解答！



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issues with Running MaskCLIP++ Demo #3

Issue in Chinese

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issues with Running MaskCLIP++ Demo #3

Description

Issue in Chinese

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions