Sniffing Threatening Open-World Objects in Autonomous Driving by Open-Vocabulary Models (ACMMM 2024)

Yulin He, Siqi Wang, Wei Chen, Tianci Xun, and Yusong Tan

Abstract

Autonomous driving (AD) is a typical application that requires effectively exploiting multimedia information. For AD, it is critical to ensure safety by detecting unknown objects in an open world, driving the demand for open world object detection (OWOD). However, existing OWOD methods treat generic objects beyond known classes in the train set as unknown objects and prioritize recall in evaluation. This encourages excessive false positives and endangers safety of AD. To address this issue, we restrict the definition of unknown objects to threatening objects in AD, and introduce a new evaluation protocol, which is built upon a new metric named U-ARecall, to alleviate biased evaluation caused by neglecting false positives. Under the new evaluation protocol, we re-evaluate existing OWOD methods and discover that they typically perform poorly in AD. Then, we propose a novel OWOD paradigm for AD based on fine-tuning foundational open-vocabulary models (OVMs), as they can exploit rich linguistic and visual prior knowledge for OWOD. Following this new paradigm, we propose a brand-new OWOD solution, which effectively addresses two core challenges of fine-tuning OVMs via two novel techniques: 1) the maintenance of open-world generic knowledge by a dual-branch architecture; 2) the acquisition of scenario-specific knowledge by the visual-oriented contrastive learning scheme. Besides, a dual-branch prediction fusion module is proposed to avoid post-processing and hand-crafted heuristics. Extensive experiments show that our proposed method not only surpasses classic OWOD methods in unknown object detection by a large margin ($\sim$ 3 $\times$ U-ARecall), but also notably outperforms OVMs without fine-tuning in known object detection ($\sim$ 20% K-mAP).

Overview

We devise a more suitable evaluation protocol for AD-oriented OWOD, which includes restricting unknown objects to threatening objects in AD and introducing a new evaluation metric to alleviate the biased evaluation in OWOD. Based on this evaluation protocol, we re-evaluate existing OWOD methods and establish the AD-oriented OWOD benchmark.
We propose a new OWOD paradigm for AD based on fine-tuning OVMs, exploiting rich prior knowledge, including high-level language semantic and diverse visual patterns, to distinguish between threatening objects and false positives.
We identify and address two core challenges when fine-tuning OVMs: preserving open-world generic knowledge by dual-branch architecture and acquiring scenario-specific knowledge by visual-oriented contrastive learning.
We propose a prediction fusion module that can integrate predictions from multiple branches without the need of post-processing and hand-crafted heuristics, serving as a generalized method applicable to transformer-based detectors.

Installation

Our code is based on the dev-3.x branch of mmdetection, which releases the implementation of GroundingDino.

Note: using the ``dev-3.x'' branch.

Requirements

conda create --name ad-owod python==3.8 -y
conda activate ad-owod
conda install pytorch torchvision -c pytorch
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"
pip install -v -e .

optional:
pip install peft # for lora finetuning

Backbone features

Download the GroundingDINO pretrained weight from mmdetection and add in weights folder.

Code Structure

AD-OWOD /
└── configs
        └── auto_driving_grounding_dino (fine-tuning methods and our method)
└── mmdet 
        └── datasets
                ├── soda.py
                └── bdd.py
        └── hooks
                └── load_weight_hook.py (load the weights of dual-branch network)
        └── models
                └── detectors
                        ├── auto_driving_grounding_dino.py
                        ├── adapter_grounding_dino.py linear_prob_grounding_dino.py 
                        ├── linear_prob_grounding_dino.py
                        └── lora_grounding_dino.py
                    └── dense_heads 
                            └── auto_driving_grounding_dino_head.py
└── eval_tools (data conversion tools, visualization tools, and evaluation tools)
    ├── soda_formnat_to_voc.py 
    ├── bdd_formnat_to_voc.py 
    ├── coda_formnat_to_voc.py 
    ├── convert_to_pretest.py 
    ├── convert_coda_to_pretest.py 
    ├── soda_visualization.py 
    ├── bdd_visualization.py 
    ├── coda_visualization.py 
    ├── output_visualization.py 
    ├── eval.py (evaluation) 
    └── voc_eval_offical.py (evaluation)

Note: If you can not access the website of huggingface online, you need to download the bert-base-cased from https://huggingface.co/bert-base-uncased/tree/main and change the path of lang_model_name in the config file.

Data Structure

AD-OWOD/
└── data/
    └── CODA/
        ├── train (train and val set of SODA)
        ├── val (val set of CODA)
        └── annotations
            ├── train.json
            ├── val.json
            └── annotation.json (CODA)
    └── BDD/
        ├── train
        ├── val
        └── annotations
            ├── bdd100k_labels_images_det_coco_train.json
            ├── bdd100k_labels_images_det_coco_val.json
            └── annotation.json (CODA)

Dataset Preparation

CODA:

Download the train and val set of SODA from https://soda-2d.github.io/download.html.
Download the val set of CODA from https://coda-dataset.github.io/.
Prepare the annotation files. (1) Get the train.json file by running eval_tools/merge_json.py. (2) Get the val.json by running eval_tools/coda_format_to_voc.py and eval_tools/convert_coda_to_coco.py. You can also directly download the annotation files from here

BDD:

Download the train and val set of BDD0100K from https://doc.bdd100k.com/download.html.

Training

Training on single node

To train SGROD on a single node with 4 GPUS, run

bash tools/dist_train.sh configs/auto_driving_grounding_dino/auto_driving_grounding_dino_swin-t_16xb2_1x_soda.py    4

you can also decide to run each one of the configurations defined in configs/auto_driving_grounding_dino.

Evaluation & Result Reproduction

For reproducing any of the aforementioned results, please download our pretrain weights and place them in the 'weights' directory.

Run the demo/image_demo.py file to output json file for testing.

python demo/image_demo.py ./data/CODA/val/  $config  --weights $weights  --texts "$text_prompts" --no-save-pred --no-save-vis --save-json-path ./val_json_output_dir/$save_json

For example:

# SODA
python demo/image_demo.py ./data/CODA/val/  configs/auto_driving_grounding_dino/auto_driving_grounding_dino_swin-t_16xb2_1x_soda.py  --weights weights/auto_gd_soda.pth   --texts "pedestrian . cyclist . car . truck . bus . tricycle . vehicle . roadblock . obstacle ." --no-save-pred --no-save-vis --save-json-path ./val_json_output_dir/auto_gd_soda.json
# BDD
python demo/image_demo.py ./data/CODA/val/  configs/auto_driving_grounding_dino/auto_driving_grounding_dino_swin-t_16xb2_1x_bdd.py  --weights weights/auto_gd_bdd.pth   --texts "person . rider . car . bus . truck . bike . motor . traffic light . traffic sign . train . vehicle . roadblock . obstacle ." --no-save-pred --no-save-vis --save-json-path ./val_json_output_dir/auto_gd_bdd.json

Run the eval_tools/eval.py file to output the metrics.

python eval_tools/eval.py val_json_output_dir/$save_json   json_output_dir/annotations.json 0. $data

For example:

# SODA
python eval_tools/eval.py val_json_output_dir/auto_gd_soda.json   val_json_output_dir/annotations.json 0. soda
# BDD
python eval_tools/eval.py val_json_output_dir/auto_gd_bdd.json   val_json_output_dir/annotations.json 0. bdd

Note: For more training and evaluation details please check the mmdetection reposistory.

Contact

Should you have any question, please contact: [email protected]

Acknowledgments:

This work builds on previous works' code base such as mmdetection, UnSniffer, GroundingDino. Please consider citing these works as well.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
demo		demo
eval_tools		eval_tools
mmdet.egg-info		mmdet.egg-info
mmdet		mmdet
requirements		requirements
tests		tests
tools		tools
work_dirs		work_dirs
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README_mmdet.md		README_mmdet.md
README_zh-CN.md		README_zh-CN.md
batch_eval_shell.sh		batch_eval_shell.sh
docker.sh		docker.sh
eval_shell.sh		eval_shell.sh
requirements.txt		requirements.txt
run_shell.sh		run_shell.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sniffing Threatening Open-World Objects in Autonomous Driving by Open-Vocabulary Models (ACMMM 2024)

Yulin He, Siqi Wang, Wei Chen, Tianci Xun, and Yusong Tan

Abstract

Overview

Installation

Requirements

Backbone features

Code Structure

Data Structure

Dataset Preparation

Training

Training on single node

Evaluation & Result Reproduction

Contact

About

Releases

Packages

Languages

License

harrylin-hyl/AD-OWOD

Folders and files

Latest commit

History

Repository files navigation

Sniffing Threatening Open-World Objects in Autonomous Driving by Open-Vocabulary Models (ACMMM 2024)

Yulin He, Siqi Wang, Wei Chen, Tianci Xun, and Yusong Tan

Abstract

Overview

Installation

Requirements

Backbone features

Code Structure

Data Structure

Dataset Preparation

Training

Training on single node

Evaluation & Result Reproduction

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages