Description
Hi~~ I have followed the official installation instructions to set up the environment and installed all the necessary weights.
Environment Specifications:
torch.__version__
: 2.5.1+cu124torch.version.cuda
: 12.4torch.backends.cudnn.version
: 90100Python
: 3.10.16
Installation Steps Followed:
-
Installation:
- Followed the [installation instructions](INSTALL.md) provided in the repository.
-
Preparations:
-
Datasets: Prepared as per [Preparing Datasets for MaskCLIP++](datasets/README.md). - Pretrained CLIP Models: Downloaded automatically from Hugging Face.
- Mask Generators: Downloaded the required mask generator models manually using the provided URLs and placed them in the specified paths.
- Fine-tuning on COCO-Stuff Dataset
-
-
Demo Usage:
config="configs/coco-stuff/eva-clip-vit-l-14-336/maft-l/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext_maft-l_ens.yaml" ckpt="output/ckpts/maskclippp/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext.pth" python demo/app.py \ --config-file $config \ --opts \ MODEL.WEIGHTS $ckpt \ MODEL.MASK_FORMER.TEST.PANOPTIC_ON False \ MODEL.MASK_FORMER.TEST.INSTANCE_ON False \ MODEL.MASK_FORMER.TEST.SEMANTIC_ON True
Here is mine:
python demo/app.py --config-file configs/coco-stuff/eva-clip-vit-l-14-336/maft-l/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext_maft-l_ens.yaml --opts MODEL.WEIGHTS output/ckpts/maskclippp/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext.pth MODEL.MASK_FORMER.TEST.PANOPTIC_ON False MODEL.MASK_FORMER.TEST.INSTANCE_ON False MODEL.MASK_FORMER.TEST.SEMANTIC_ON True
Issue:
During the execution, I encountered several issues, mostly related to library version mismatches. I was able to resolve some of them with minor code modifications. For example, I removed the type annotation as shown below:
# Original function definition with type annotation
def extract_features(self, inputs: PaddedList) -> Dict[str, Tensor]:
# Modified function definition without type annotation
def extract_features(self, inputs: PaddedList):
if self._finetune_none:
self.eval()
with torch.no_grad():
return self._extract_features(inputs.images)
else:
return self._extract_features(inputs.images)
However, I am now stuck at the following error:
[01/08 03:13:38 detectron2]: Predefined Classes: ['cocostuff']
User Classes: []
Available features: dict_keys(['stage1_f', 'stage2_f', 'stage3_f', 'stage4_f', 'input_f', 'test_t_embs_f', 'num_synonyms'])
Required features: ['stage2_f', 'stage3_f', 'stage4_f']
/root/miniconda3/envs/mcp/lib/python3.10/site-packages/torch/functional.py:534: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3595.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "/root/autodl-tmp/MaskCLIPpp/demo/app.py", line 90, in process_image
predictions, visualized_output = meta_demo.run_on_image(bgr_image)
File "/root/autodl-tmp/MaskCLIPpp/demo/predictor.py", line 220, in run_on_image
predictions = self.predictor(image)
File "/root/autodl-tmp/MaskCLIPpp/demo/detectron2/detectron2/engine/defaults.py", line 351, in __call__
predictions = self.model([inputs])[0]
File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/maskclippp.py", line 623, in forward
encode_dict.update(self.visual_encoder(images, masks=valid_masks_by_imgs)) # List(B) of Q,D
File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 382, in extract_features
return self._extract_features(inputs.images, masks)
File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 338, in _extract_features
attn_biases, areas = self._masks_to_attn_biases(masks, curr_grid_size)
File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 310, in _masks_to_attn_biases
attn_bias[:, :, :Q, -hw:].copy_(down_mask)
RuntimeError: output with shape [3, 16, 1, 1064] doesn't match the broadcast shape [3, 16, 3, 1064]
Current Progress:
I have modified some of the code to address some compatibility issues. Currently, the error seems to be occurring in the following function:
attn_bias[:, :, :Q, -hw:].copy_(down_mask)
Question:
Is completing the above modifications sufficient to run the demo? It appears that running the demo with the following command leads to a series of errors:
python demo/app.py --config-file configs/coco-stuff/eva-clip-vit-l-14-336/maft-l/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext_maft-l_ens.yaml --opts MODEL.WEIGHTS output/ckpts/maskclippp/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext.pth MODEL.MASK_FORMER.TEST.PANOPTIC_ON False MODEL.MASK_FORMER.TEST.INSTANCE_ON False MODEL.MASK_FORMER.TEST.SEMANTIC_ON True
Could you please advise on what additional installations or configurations are required to successfully run the demo?
Issue in Chinese
我按照官方的安装说明完成了环境搭建,并安装了所有必要的权重.
环境基本参数:
torch.__version__
: 2.5.1+cu124torch.version.cuda
: 12.4torch.backends.cudnn.version
: 90100Python
: 3.10.16
安装步骤:
-
安装:
- 按照安装说明完成环境搭建。
-
准备工作:
- deleted #4
- 预训练的 CLIP 模型: 从 Hugging Face 自动下载。
- Mask 生成器: 根据提供的链接手动下载所需的 Mask 生成器模型,并放置在指定路径。
- 在 COCO-Stuff 数据集上的微调的权重下载
-
Demo 使用指令:
python demo/app.py --config-file configs/coco-stuff/eva-clip-vit-l-14-336/maft-l/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext_maft-l_ens.yaml --opts MODEL.WEIGHTS output/ckpts/maskclippp/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext.pth MODEL.MASK_FORMER.TEST.PANOPTIC_ON False MODEL.MASK_FORMER.TEST.INSTANCE_ON False MODEL.MASK_FORMER.TEST.SEMANTIC_ON True
问题:
在执行过程中,我遇到了不少问题,主要是库的版本不匹配。通过微小的代码修改,我解决了一些问题。例如,删除了类型注解,如下所示:
# 原始函数定义带有类型注解
def extract_features(self, inputs: PaddedList) -> Dict[str, Tensor]:
# 修改后的函数定义移除了类型注解
def extract_features(self, inputs: PaddedList):
if self._finetune_none:
self.eval()
with torch.no_grad():
return self._extract_features(inputs.images)
else:
return self._extract_features(inputs.images)
但是,现在卡在了以下错误:
[01/08 03:13:38 detectron2]: Predefined Classes: ['cocostuff']
User Classes: []
Available features: dict_keys(['stage1_f', 'stage2_f', 'stage3_f', 'stage4_f', 'input_f', 'test_t_embs_f', 'num_synonyms'])
Required features: ['stage2_f', 'stage3_f', 'stage4_f']
/root/miniconda3/envs/mcp/lib/python3.10/site-packages/torch/functional.py:534: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3595.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Traceback (most recent call last):
File "/root/autodl-tmp/MaskCLIPpp/demo/app.py", line 90, in process_image
predictions, visualized_output = meta_demo.run_on_image(bgr_image)
File "/root/autodl-tmp/MaskCLIPpp/demo/predictor.py", line 220, in run_on_image
predictions = self.predictor(image)
File "/root/autodl-tmp/MaskCLIPpp/demo/detectron2/detectron2/engine/defaults.py", line 351, in __call__
predictions = self.model([inputs])[0]
File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/maskclippp.py", line 623, in forward
encode_dict.update(self.visual_encoder(images, masks=valid_masks_by_imgs)) # List(B) of Q,D
File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 382, in extract_features
return self._extract_features(inputs.images, masks)
File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 338, in _extract_features
attn_biases, areas = self._masks_to_attn_biases(masks, curr_grid_size)
File "/root/autodl-tmp/MaskCLIPpp/demo/../maskclippp/vencoder/eva_clip_vit.py", line 310, in _masks_to_attn_biases
attn_bias[:, :, :Q, -hw:].copy_(down_mask)
RuntimeError: output with shape [3, 16, 1, 1064] doesn't match the broadcast shape [3, 16, 3, 1064]
当前进展:
我逐步修改了一些代码的细节,目前错误应该发生在以下函数中:
attn_bias[:, :, :Q, -hw:].copy_(down_mask)
问题:
是否完成上述基本操作其实是远远不够的,实际上还需要完善更多配置才能运行demo?运行以下命令时出现了一连串的错误,每一个错误都让我猜测是不是缺少了某个环节操作,所以有一系列的某个形参的mask没有传入,每个特征的维度不匹配无法继续传递等问题
python demo/app.py --config-file configs/coco-stuff/eva-clip-vit-l-14-336/maft-l/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext_maft-l_ens.yaml --opts MODEL.WEIGHTS output/ckpts/maskclippp/maskclippp_coco-stuff_eva-clip-vit-l-14-336_wtext.pth MODEL.MASK_FORMER.TEST.PANOPTIC_ON False MODEL.MASK_FORMER.TEST.INSTANCE_ON False MODEL.MASK_FORMER.TEST.SEMANTIC_ON True
请问我还需要安装或配置哪些内容才能成功运行demo?非常感谢期待您的回复解答!