Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate title/abstract from source language #4

Open
1 of 6 tasks
pvgenuchten opened this issue Jun 6, 2024 · 7 comments
Open
1 of 6 tasks

Translate title/abstract from source language #4

pvgenuchten opened this issue Jun 6, 2024 · 7 comments
Assignees

Comments

@pvgenuchten
Copy link
Contributor

pvgenuchten commented Jun 6, 2024

If a metadata has a non-english language, add a english translation.

This should be prior to indexation, so any english filters also include this record Indicate on UI that translation is machine generated, with option to switch to original

Use EU translation service, google, deepl, ... or llm for translation

DoD

  • Implement API to call (EU) translation service
  • Update SWR data model to support translations
  • Develop translator component that can work at harvest time, in batch mode etc
  • Implement workflow for title/abstract translation
  • Show translations and indicator in catalogue UI
  • Operationalize and run translations
@pvgenuchten
Copy link
Contributor Author

pvgenuchten commented Jun 6, 2024

Runs as a micro service fetching strings from source database, stores its result in a translations table as hash, key, translation, language, if a hash is already in database returns instant translation

There is a risk (for smaller strings) that the string exists on two places, but with a different meaning, strings to be translated should have a minimal size?

@robknapen
Copy link

These are two Python packages that it can be based on. They already cover a lot.

https://deep-translator.readthedocs.io/en/latest/?badge=latest
https://pypi.org/project/translators/

We would need to add an API, database, and external API key(s) (when using a payed service in the background).

@robknapen
Copy link

@pvgenuchten
Copy link
Contributor Author

initial prototype of a service wrapper on top of EU service has been developed and deployed at https://api.soilwise-he.containers.wur.nl/translate/docs

@roblokers
Copy link

What still remains is to operationalize the translation, particularly how to setup the workflow and get the results to the UI

@roblokers
Copy link

DoD added to issue description

@robknapen
Copy link

I think setting up workflows and operationalisation of them is a bigger issue and discussion that needs happen before. E.g. do we base the whole system on an event bus, or service discovery, or batch processing and polling, and so on?

Definitely not something that can just be assigned to me :-)

(Additionally: I am used to a Definition of Done to be more about quality aspects that indicate when an increment is complete. Not functional or component requirements. Those I consider to be a different thing.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants