Shazam-like Voice Detection AI for line being spoken/sung #391

bhajneet · 2020-01-26T16:54:51Z

I don't think this is easily possible, but some users are requesting we use the mics in phones/laptops/tablets to "hear" what is being sung and automatically suggest it to the user or if it's super accurate then automatically present it.

Any thoughts / ideas on how to achieve this? I have marked this as low (would be nice to have) and on hold (not planning to work on this) unless we have some better ideas of how it can be done and how long it would potentially take. Currently there is no roadmap in my head for achieving this.

Harjot1Singh · 2020-02-04T16:33:57Z

It might be possible for hukamname, since the clarity of words is much higher, but you still need speech-to-text for punjabi to function. I will say though, if we can get good clarity on some syllables in each word, that could actually be effective.

Requirements:

Some sort of way of searching for a mixture of things in a word. So if the line is har har har gun gaavao, need a way of searching for things - maybe hr h r gn gaao is heard by the speech-to-text, and so if we can feed that in to get a match (basically, a search that is "any letters of each word", a bit like first word)
Some sort of STT for gurmukhi

but if you'd like to even try making this work for kirtan, we could potentially run some ML models on existing kirtan (so you can map many ways of singing a shabad to each shabad itself), and then classify sound and see what it matches. The pros are that this could be effective for singing, but downside is that Shabads that haven't been sung before in our training dataset (I imagine there will be many) will not be classifable/detectable (in any instance, without training).

BUT... here's an idea. SOS opt-in to mic. That way, we could attempt to use Shabad OS input (of sound + whatever is being shown on the projector) to automatically train/improve out dataset. This sounds like something that could be very interesting, but potentially out of mainstream scope for now.

preetcharan · 2020-02-04T16:42:09Z

@Harjot1Singh @bhajneet wow just stumbled across this conversation when making my slack messages read. My uncle is a very wealthy businessman and wanted to give me money to get this feature done. I said I work with you guys and i don't think it's so practical, like I know when the tune of har raam naam jap laha starts on the vaja i already got the shabad up, and there are so many similar shabads with the 2nd half different...technically have no idea how to achieve it as you know i'm not in coding, AI or speech recognition, BUT from an audio perspective if you could make shabad os listen to a usb interface taken in for the broadcast, you should have a clear enough sound there to decipher.

This a very interesting project and if you do know of someone who we could pay to get this done then I could potentially get funds for this.

bhajneet · 2020-02-06T16:04:33Z

Depends on shabados/gurmukhi-utils#22

bhajneet · 2020-02-06T16:06:20Z

BUT... here's an idea. SOS opt-in to mic. That way, we could attempt to use Shabad OS input (of sound + whatever is being shown on the projector) to automatically train/improve out dataset. This sounds like something that could be very interesting, but potentially out of mainstream scope for now.

Though interesting, it seems a bit too much like snooping/breach of privacy. Also unnecessary with the vast amount of keertan/kirtan audio files/videos readily available.

bhajneet added Status: On Hold □ Type Story Feature or requirement written from the user's perspective using non-technical language. Impacts Few Does not affect many end-users. labels Jan 26, 2020

bhajneet removed the Impacts Few Does not affect many end-users. label Feb 6, 2020

bhajneet mentioned this issue Mar 7, 2020

Add extra toEnglish transliterate/pronunciation keys shabados/gurmukhi-utils#102

Closed

bhajneet removed the Status: Hold label Oct 18, 2021

saihaj added this to Project Management Oct 24, 2021

saihaj moved this to Triage in Project Management Oct 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shazam-like Voice Detection AI for line being spoken/sung #391

Shazam-like Voice Detection AI for line being spoken/sung #391

bhajneet commented Jan 26, 2020

Harjot1Singh commented Feb 4, 2020

preetcharan commented Feb 4, 2020

bhajneet commented Feb 6, 2020

bhajneet commented Feb 6, 2020

Shazam-like Voice Detection AI for line being spoken/sung #391

Shazam-like Voice Detection AI for line being spoken/sung #391

Comments

bhajneet commented Jan 26, 2020

Harjot1Singh commented Feb 4, 2020

preetcharan commented Feb 4, 2020

bhajneet commented Feb 6, 2020

bhajneet commented Feb 6, 2020