Replies: 1 comment
-
@jongwook do you think this could be helpful for Whisper? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
Has anyone benchmarked Whisper doing inference? e.g., have you ran Whisper on a suitable large dataset (e.g., 1 to 5 hours) under nvidia nsight systems?
I recently wrote a GPU-accelerated beam search decoder for a customer using a similar architecture model (a Transformer model) for speech recognition. Their original architecture was particularly slow because beam search was done on CPU in python, taking about 50% of the total inference time (!). Whisper also does its beam search on the CPU in Python, so I have a hunch that it may have a similar bottle neck
I am working on including it in https://github.com/nvidia-riva/riva-asrlib-decoder/ so that others can benefit. Whisper comes to mind as a potential beneficiary. My implementation is missing patience, which I know whisper's python CPU implementation has, but that can be added.
If anyone could let me know (1) whether or not beam search is a bottleneck in whisper, (2) if not that, what are the bottlenecks, and (3) whether this repo is open to this sort of contribution, that would definitely help me prioritize, so many thanks.
Beta Was this translation helpful? Give feedback.
All reactions