From Detection to Narration and Explanation #13817
arcyleung
started this conversation in
Show and tell
Replies: 1 comment 2 replies
-
I'm super interested in this. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello friends, Frigate has been working amazing for my needs, and recently I built a small integration on top of it. Essentially it combines detection capability of TensorRT and another vision-text model like LLaVA to narrate and explain events as they are being recorded, so there's a text-searchable transcript.
I am gauging if there's community interest for this type of integration for multi-modal workflows (image+video) support, much like how TensorRT YOLO is currently integrated. I'm willing to put in effort to polish it further and contribute it upstream here.
You can find a video demo here and code on my repo, thanks!
Beta Was this translation helpful? Give feedback.
All reactions