-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect wrongly-set caption language (LLM?) #5709
Comments
The idea is great, but I dont know about any device embedded LLMs @nicolas-raoul can you point me to some LLMs that you may know about? |
@shankarpriyank I think Pixel 8+ and recent Samsung devices are planned to get that soon. Hopefully the emulator will get it soon too. https://developer.android.com/ai/aicore https://blog.google/products/pixel/pixel-feature-drop-december-2023/ |
We consider working on it with @vtalos. Maybe tools like Apache Tika can be suitable for language recognition. Αlternatively, there are Github repositories that offer similar functionality, like https://github.com/shuyo/language-detection or https://github.com/optimaize/language-detector. What do you think? @nicolas-raoul |
@karyotakisg These projects would add weight to the APK, and as I said it is low-priority. Also, embedded LLM sounds like a fun thing to try. :-) |
We are a team of 5 students working on a University project to contribute to OSS projects. We would love to take on this issue and solve it. |
@ChristoJobyAntony To check whether every needed component is available, would you mind running this sample app? |
I built https://github.com/android/ai-samples and am currently running it on my Pixel 9 Pro.
So I restarted the app, and am now getting this:
Looks like I am not alone: android/ai-samples#3 Any luck on your side @ChristoJobyAntony? |
@nicolas-raoul, thank you for the assignment. Unfortunately, I don't have a physical Pixel 9 to run the ai-samples application, the closest alternative I have is a Pixel 7. I was hoping I would be able to emulate the Gemini Nano through AVD, turns it out that is not possible. Is it possible that we look at a more general solution that is compatible with a wider range of devices ? |
No worries!
If you get any idea feel free to share, but I am afraid no solution exists yet that would not increase the APK size. :'-( |
This might be a bit of rudimentary approach, but would it be sufficient to scan the caption for characters from a different script ? I do understand that several languages shares the same alphabets sets (especially Latin) however we could still provide them with a warning or UI feedback when the language selected has little to no characters in the caption provided. This potentially would at least prompt the user to correct their selection and could filter out a majority of the errors. |
There are hundreds of languages and dozens of scripts, so even the mapping of this would take some APK size as well as a lot of effort... I think it is worth waiting for embedded LLMs to become more widespread, as this is not an urgent issue. |
I understand, thank you for your prompt feedback. I understand that the focus is to use an LLM based solution. I will try asking my team if we can access any device that has access to the new AI-Core SDK. If not, I will unassign myself from this issue. |
Since none of our team members have access to a device with AI-Core feature. I will be unassigning myself from this issue. |
https://developers.google.com/ml-kit/language/identification We can try this. Wdyt? |
@neeldoshii Thanks for the tip! That could be another option indeed, possibly worth comparing with. |
The first task could be to get a list of 10 captions in each language supported by Commons, possibly via a SPARQL query. This can then be used for benchmarking and unit testing. |
(unfortunately you will need a physical Pixel 8 or above to implement this)
Many Commons contributors contribute in various languages, for instance in Urdu when posting a picture of a local dish then in English when posting a picture showing a technology. That's great, but they often forget to select the right language for the caption:
The app should try to detect when there seems to be a language mismatch, and show a popup such as:
Implementation: Per our privacy policy we can not call third-party APIs. This task is probably not important enough to justify specific ML/LLM model training, but it is a great use case for device-embedded LLM on devices where that feature is available.
The text was updated successfully, but these errors were encountered: