Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect wrongly-set caption language (LLM?) #5709

Open
nicolas-raoul opened this issue May 2, 2024 · 16 comments
Open

Detect wrongly-set caption language (LLM?) #5709

nicolas-raoul opened this issue May 2, 2024 · 16 comments
Assignees
Labels

Comments

@nicolas-raoul
Copy link
Member

nicolas-raoul commented May 2, 2024

(unfortunately you will need a physical Pixel 8 or above to implement this)

Many Commons contributors contribute in various languages, for instance in Urdu when posting a picture of a local dish then in English when posting a picture showing a technology. That's great, but they often forget to select the right language for the caption:

Screenshot_20240502-095800.png

The app should try to detect when there seems to be a language mismatch, and show a popup such as:

Your caption seems to be in Japanese, but English is declared as the caption language. Do you want to declare the caption language as being Japanese?

Implementation: Per our privacy policy we can not call third-party APIs. This task is probably not important enough to justify specific ML/LLM model training, but it is a great use case for device-embedded LLM on devices where that feature is available.

@shankarpriyank
Copy link
Contributor

The idea is great, but I dont know about any device embedded LLMs @nicolas-raoul can you point me to some LLMs that you may know about?

@nicolas-raoul
Copy link
Member Author

@shankarpriyank I think Pixel 8+ and recent Samsung devices are planned to get that soon. Hopefully the emulator will get it soon too.

https://developer.android.com/ai/aicore

https://blog.google/products/pixel/pixel-feature-drop-december-2023/

https://www.samsung.com/us/galaxy-ai/

@karyotakisg
Copy link
Contributor

We consider working on it with @vtalos. Maybe tools like Apache Tika can be suitable for language recognition. Αlternatively, there are Github repositories that offer similar functionality, like https://github.com/shuyo/language-detection or https://github.com/optimaize/language-detector. What do you think? @nicolas-raoul

@nicolas-raoul
Copy link
Member Author

@karyotakisg These projects would add weight to the APK, and as I said it is low-priority. Also, embedded LLM sounds like a fun thing to try. :-)
If anyone has a device with AiCore (I think that currently means Samsung S23 Ultra or Pixel 8 Pro) please let us know.

@ChristoJobyAntony
Copy link
Contributor

We are a team of 5 students working on a University project to contribute to OSS projects. We would love to take on this issue and solve it.

@nicolas-raoul
Copy link
Member Author

@ChristoJobyAntony To check whether every needed component is available, would you mind running this sample app?
https://github.com/android/ai-samples
Please describe below your experience/process while doing it, thanks a lot! 🙂

@nicolas-raoul
Copy link
Member Author

nicolas-raoul commented Oct 11, 2024

I built https://github.com/android/ai-samples and am currently running it on my Pixel 9 Pro.
Somehow I was still getting this message after 10 minutes:

model is unavailable yet and downloading in background

So I restarted the app, and am now getting this:

Failed to check model availability.
com.google.ai.edge.aicore.UnknownException: AICore failed with error type 2-INFERENCE_ERROR and error code 8-NOT_AVAILABLE: Required LLM feature not found
at com.google.ai.edge.aicore.GenerativeAIException$Companion.from$java_com_google_android_apps_aicore_client_client(com.google.ai.edge.aicore:aicore@@0.0.1-exp01:7)
at com.google.ai.edge.aicore.GenerativeModel$prepareInferenceEngine$2.invokeSuspend(com.google.ai.edge.aicore:aicore@@0.0.1-exp01:9)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
at java.lang.Thread.run(Thread.java:1012)
Caused by: com.google.ai.edge.aicore.InferenceException: AICore failed with error type 2-INFERENCE_ERROR and error code 8-NOT_AVAILABLE: Required LLM feature not found
at com.google.ai.edge.aicore.GenerativeAIException$Companion.from$java_com_google_android_apps_aicore_client_client(com.google.ai.edge.aicore:aicore@@0.0.1-exp01:4)
at com.google.ai.edge.aicore.GenerativeModel.getAiFeature(com.google.ai.edge.aicore:aicore@@0.0.1-exp01:9)
at com.google.ai.edge.aicore.GenerativeModel.access$getAiFeature(com.google.ai.edge.aicore:aicore@@0.0.1-exp01:1)
at com.google.ai.edge.aicore.GenerativeModel$getAiFeature$1.invokeSuspend(Unknown Source:14)
... 5 more

Looks like I am not alone: android/ai-samples#3
Maybe because my device is rooted: android/ai-samples#3 (comment)

Any luck on your side @ChristoJobyAntony?

@ChristoJobyAntony
Copy link
Contributor

@nicolas-raoul, thank you for the assignment. Unfortunately, I don't have a physical Pixel 9 to run the ai-samples application, the closest alternative I have is a Pixel 7. I was hoping I would be able to emulate the Gemini Nano through AVD, turns it out that is not possible.

Is it possible that we look at a more general solution that is compatible with a wider range of devices ?

@nicolas-raoul
Copy link
Member Author

@ChristoJobyAntony

No worries!
Feel free to unassign yourself and choose a different issue if you want.

Is it possible that we look at a more general solution that is compatible with a wider range of devices ?

If you get any idea feel free to share, but I am afraid no solution exists yet that would not increase the APK size. :'-(

@ChristoJobyAntony
Copy link
Contributor

ChristoJobyAntony commented Oct 11, 2024

If you get any idea feel free to share, but I am afraid no solution exists yet that would not increase the APK size. :'-(

This might be a bit of rudimentary approach, but would it be sufficient to scan the caption for characters from a different script ? I do understand that several languages shares the same alphabets sets (especially Latin) however we could still provide them with a warning or UI feedback when the language selected has little to no characters in the caption provided. This potentially would at least prompt the user to correct their selection and could filter out a majority of the errors.

@nicolas-raoul
Copy link
Member Author

scan the caption for characters from a different script

There are hundreds of languages and dozens of scripts, so even the mapping of this would take some APK size as well as a lot of effort... I think it is worth waiting for embedded LLMs to become more widespread, as this is not an urgent issue.

@ChristoJobyAntony
Copy link
Contributor

I understand, thank you for your prompt feedback. I understand that the focus is to use an LLM based solution. I will try asking my team if we can access any device that has access to the new AI-Core SDK. If not, I will unassign myself from this issue.

@ChristoJobyAntony
Copy link
Contributor

ChristoJobyAntony commented Oct 14, 2024

Since none of our team members have access to a device with AI-Core feature. I will be unassigning myself from this issue.

@ChristoJobyAntony ChristoJobyAntony removed their assignment Oct 14, 2024
@nicolas-raoul nicolas-raoul self-assigned this Jan 9, 2025
@neeldoshii
Copy link
Contributor

https://developers.google.com/ml-kit/language/identification

We can try this.
Step 1 : Check the language which the user has set the caption on our app.
Step 2 : Use Google MLkit and parse the caption string and identify the caption language.
Step 3 : If both the language code of Step-1 and Step-2 are different then we can alert the user using alert dialog else no alert dialog.

Wdyt?

@nicolas-raoul
Copy link
Member Author

@neeldoshii Thanks for the tip! That could be another option indeed, possibly worth comparing with. mlkit:language-id seems to require an additional ~1MB download (either as a bigger APK or as a download via Play), whereas if we are already using AICore for caption tokenization then there is no additional download needed.

@nicolas-raoul
Copy link
Member Author

The first task could be to get a list of 10 captions in each language supported by Commons, possibly via a SPARQL query. This can then be used for benchmarking and unit testing.

@nicolas-raoul nicolas-raoul added gsoc Google Summer of Code and removed good first issue labels Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants
@nicolas-raoul @neeldoshii @shankarpriyank @ChristoJobyAntony @karyotakisg and others