Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shazam-like Voice Detection AI for line being spoken/sung #391

Open
bhajneet opened this issue Jan 26, 2020 · 4 comments
Open

Shazam-like Voice Detection AI for line being spoken/sung #391

bhajneet opened this issue Jan 26, 2020 · 4 comments
Labels
□ Type Story Feature or requirement written from the user's perspective using non-technical language.

Comments

@bhajneet
Copy link
Member

I don't think this is easily possible, but some users are requesting we use the mics in phones/laptops/tablets to "hear" what is being sung and automatically suggest it to the user or if it's super accurate then automatically present it.

Any thoughts / ideas on how to achieve this? I have marked this as low (would be nice to have) and on hold (not planning to work on this) unless we have some better ideas of how it can be done and how long it would potentially take. Currently there is no roadmap in my head for achieving this.

@bhajneet bhajneet added Status: On Hold □ Type Story Feature or requirement written from the user's perspective using non-technical language. Impacts Few Does not affect many end-users. labels Jan 26, 2020
@Harjot1Singh
Copy link
Member

It might be possible for hukamname, since the clarity of words is much higher, but you still need speech-to-text for punjabi to function. I will say though, if we can get good clarity on some syllables in each word, that could actually be effective.

Requirements:

  1. Some sort of way of searching for a mixture of things in a word. So if the line is har har har gun gaavao, need a way of searching for things - maybe hr h r gn gaao is heard by the speech-to-text, and so if we can feed that in to get a match (basically, a search that is "any letters of each word", a bit like first word)

  2. Some sort of STT for gurmukhi

but if you'd like to even try making this work for kirtan, we could potentially run some ML models on existing kirtan (so you can map many ways of singing a shabad to each shabad itself), and then classify sound and see what it matches. The pros are that this could be effective for singing, but downside is that Shabads that haven't been sung before in our training dataset (I imagine there will be many) will not be classifable/detectable (in any instance, without training).

BUT... here's an idea. SOS opt-in to mic. That way, we could attempt to use Shabad OS input (of sound + whatever is being shown on the projector) to automatically train/improve out dataset. This sounds like something that could be very interesting, but potentially out of mainstream scope for now.

@preetcharan
Copy link

@Harjot1Singh @bhajneet wow just stumbled across this conversation when making my slack messages read. My uncle is a very wealthy businessman and wanted to give me money to get this feature done. I said I work with you guys and i don't think it's so practical, like I know when the tune of har raam naam jap laha starts on the vaja i already got the shabad up, and there are so many similar shabads with the 2nd half different...technically have no idea how to achieve it as you know i'm not in coding, AI or speech recognition, BUT from an audio perspective if you could make shabad os listen to a usb interface taken in for the broadcast, you should have a clear enough sound there to decipher.

This a very interesting project and if you do know of someone who we could pay to get this done then I could potentially get funds for this.

@bhajneet
Copy link
Member Author

bhajneet commented Feb 6, 2020

Depends on shabados/gurmukhi-utils#22

@bhajneet
Copy link
Member Author

bhajneet commented Feb 6, 2020

BUT... here's an idea. SOS opt-in to mic. That way, we could attempt to use Shabad OS input (of sound + whatever is being shown on the projector) to automatically train/improve out dataset. This sounds like something that could be very interesting, but potentially out of mainstream scope for now.

Though interesting, it seems a bit too much like snooping/breach of privacy. Also unnecessary with the vast amount of keertan/kirtan audio files/videos readily available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
□ Type Story Feature or requirement written from the user's perspective using non-technical language.
Projects
Status: Triage
Development

No branches or pull requests

3 participants