Skip to content

AXERA-TECH/OWLVIT-ONNX-AX650-CPP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

8175948 · Jan 3, 2024

History

10 Commits
Dec 20, 2023
Dec 21, 2023
Jan 3, 2024
Dec 12, 2023
Dec 12, 2023
Dec 12, 2023
Dec 21, 2023
Dec 12, 2023
Dec 12, 2023
Jan 3, 2024
Dec 12, 2023

Repository files navigation

OWLVIT

https://huggingface.co/docs/transformers/model_doc/owlvit

owlvit.1.mp4

Get ONNX Model

README

Build

mkdir build
cd build

if x86 onnxruntime

cmake -DONNXRUNTIME_DIR=${onnxruntime_dir} -DOpenCV_DIR=${opencv_cmake_file_dir} ..

else if ax650

cmake -DONNXRUNTIME_DIR=${onnxruntime_dir} -DOpenCV_DIR=${opencv_cmake_file_dir} -DBSP_MSP_DIR=${msp_out_dir} -DBUILD_WITH_AX650=ON -DCMAKE_TOOLCHAIN_FILE=../toolchains/aarch64-none-linux-gnu.toolchain.cmake ..
make -j4

aarch64-none-gnu library:
onnxruntime
opencv

Run

/opt/test/owlvit # ./main --ienc owlvit-image.axmodel --tenc owlvit-text.onnx -d
 owlvit-post.onnx -v vocab.txt -i ssd_horse.jpg -t text.txt --thread 8
Engine creating handle is done.
Engine creating context is done.
Engine get io info is done.
Engine alloc io is done.
[I][                            init][ 280]: BGR MODEL
[I][              load_image_encoder][  17]: input size 768 768
[I][              load_image_encoder][  29]: image feature len 442368
[I][              load_image_encoder][  32]: pred box cnt  576
[I][               load_text_encoder][ 152]: text feature len 512
[I][                            main][ 120]: image_src [ssd_horse.jpg]
[I][                            main][ 121]: text_src [text.txt]
encode text Inference Cost time : 0.190662s
post Inference Cost time : 0.0550382s
a photo of person 268.899292 20.153463 88.163696 235.837906
a photo of person 428.696014 123.745819 19.836823 55.102310
horse 191.756058 55.418949 229.225601 318.581055
a photo of car 0.000000 98.398750 145.470108 92.571877
a photo of dog 145.470108 203.093140 57.306412 156.490570