Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add solution to check consistency of translations #72

Open
benoit74 opened this issue Sep 17, 2024 · 22 comments
Open

Add solution to check consistency of translations #72

benoit74 opened this issue Sep 17, 2024 · 22 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@benoit74
Copy link
Collaborator

With #69, we have introduced i18n.

However, currently there is no tooling to ensure consistency:

  • identify unused strings (strings present in JSON files but not used - anymore - in JS or Python codebase)
  • identify missing strings (strings present in JS or Python codebase but not present in JSON files - must be present in at least en for proper operation and in qqq for proper TranslateWiki operation)
@benoit74 benoit74 added enhancement New feature or request good first issue Good for newcomers labels Sep 17, 2024
@advait-zx
Copy link

Flag any keys that are present in the JSON files but not used in the code
how can i contribute to this issue

@benoit74
Copy link
Collaborator Author

I think we first need to check if such a tool does not already exist. Be aware that while all translated strings are stored in JSON files, they are used both in .vue and Jinja2 templates. They could theoretically soon be used in .py files.

If it does not exist, then I would suggest to develop a small Python script doing the job. Be aware that we need both directions, i.e. flag any keys present in the JSON but not used in the code AND flag any keys used in the code but not present in at least en and qqq JSON files.

This Python script should exit with non-zero exit code when problem(s) are detected, and we can add this script to invoke tasks.py and the the QA CI.

I will soon merge #69, so this will be easier to work on.

@benoit74
Copy link
Collaborator Author

@advait-zx do you confirm you will work on it, shall I assign you the issue?

@advait-zx
Copy link

yeah but i will need help for this issue

@benoit74
Copy link
Collaborator Author

Isn't what I've described a sufficient start point to make progress and if needed prepare a PR? If not unfortunately I can't help more than that.

@Prakhar-Shankar
Copy link

Hi @benoit74, can I work on this issue?

@benoit74
Copy link
Collaborator Author

benoit74 commented Jan 8, 2025

Hi @Prakhar-Shankar, sure. Note that at first, no code is needed, you need to search for a tool capable of fulfilling this issue requirements.

@Prakhar-Shankar
Copy link

Sure, I will first search for the tool, and will try to tell the approach. If the tool does not exist I will try to write a python script. I will try to provide descriptions before committing anything.

@Prakhar-Shankar
Copy link

Hi @benoit74, I found a tool called vue-i18n-extract. This tool can flag all the string that are unused or are missing. I ran the tool and here is the output -
Image

Also we can make a JSON file where the outputs can be stored.

@benoit74
Copy link
Collaborator Author

benoit74 commented Jan 9, 2025

Seems promising!

Is it possible to check all languages at once?

The problem is that :

  • some of these keys are in fact used, I don't get why your tool report they are not used, see e.g.
    text: this.t('newRequest.fetchingDefinition')
  • some of these keys are used in Python backend, we need a tool capable to handle both Vue.JS and Python. Or two tools and we merge their results: it might be better to rely on a tool to detect which path are unused because this is probably complex, and then merge their result with a custom tool of our own
  • we should be able to ignore some key(s): for now, language is not used on purpose
  • it seems to do only the first point of my initial comment ; we need to do the other direction as well (check which path are used in the code but are not present in at least qqq and en)

Note that we need to integrate this check to the CI, so we need to check all languages and we need a clean return code (different than 0 when there is a problem, so that it will fail the CI).

@Prakhar-Shankar
Copy link

Got it, the issue with this tool is, it can check only .vue files and not the python files.
I am still searching for a tool which can analyze the python files as well.

Therefore, we may have to use two different tools and later combine the result as you said. If nothing works out we may have to write a python script to check the .py files.

Coming to your third point, this tool is capable of finding missing keys as well, I don't know why I didn't work, I will surely work on it and verify the process. Also I will keep in mind about the CI part.
Thanks for the instructions, I will try to get the solution soon.

@Prakhar-Shankar
Copy link

Hii @benoit74 , I have been extensively searching for similar tool which can be helpful with the .py files but, I think there is not any tool available at this moment.

@benoit74
Copy link
Collaborator Author

OK, then we will have to build our own logic in Python, probably based on regexp. Luckily in Python we do not have many pattern (we use either i18n.t in Python code or translate in Jinja2 templates - and currently only Jinja2 templates in fact), and supporting only what we use is OK in a first version.

Given that we want to have an overview of both Python and Vue.JS usage, and Vue.JS usage is also limited to few patterns, I wonder if it doesn't make more sense to search for labels used only with our custom Python script for both languages, and avoid spending time integrating vue-i18n-extract results with Python results. No strong opinion, on my side, more an open question. WDYT?

@Prakhar-Shankar
Copy link

Yes, I think if we are building a python script, we can use it for both kind of files(if we face any issue then we already have a tool for .vue which can be used anytime). I will try my best to implement these.
Thanks a lot.

@Prakhar-Shankar
Copy link

Hey @benoit74 , I have written the script. It check all the json files and also check all the .ts, .vue, .py and .html files.
This is how the output looks like -

Image

Image

Also shows all the missing keys -

Image

There are more results but I have to paste many screenshots.

@benoit74
Copy link
Collaborator Author

Thank you

A first feedback:

  • please add an option to ignore some known unused keys, language and @metadata so far, ignoring the key itself and all its children )
  • faq, unit, ... are used (they have children) so they should not be reported as unused
  • I did not checked all missing keys, but I'm pretty sure most of them do exists
  • the missing ., id and key keys looks like parsing bugs

Note that you do not need to paste screenshots, you can simply copy-paste text?

@Prakhar-Shankar
Copy link

sure thanks a lot for guiding me. I will look into it.

@Prakhar-Shankar
Copy link

Hii @benoit74, I have been trying to write a script. But I am facing a problem.
When I write a regex pattern it lists, units.timeLimit.plural as a missing key. While

"units": {
    "timeLimit": {
      "singular": "hour",
      "plural": "hours"
    }

It is already present in the json file. Can you suggest some solution for this kind of problem.

@benoit74
Copy link
Collaborator Author

benoit74 commented Feb 2, 2025

The solution is probably to flatten the JSON file, since your regex pattern will build a list. You will hence have two lists: the list of keys found with regex, and the list of keys found in the JSON(s). Or even better, two sets. Then it is just a matter of making a substraction of the two sets in both direction to find keys which are missing in the translations and keys which are unused in the code.

@Prakhar-Shankar
Copy link

@benoit74 , I have written the script and this is what the result I am getting. Please take a look into it.

Unused Keys:

Missing Keys:

  • mainStore.loading
  • mainStore.offlinerNotFound
  • mainStore.snackbarDisplayed
  • mainStore.taskData
  • mainStore.taskData.downloadLink
  • mainStore.taskData.hasEmail
  • mainStore.taskData.limitHit
  • mainStore.taskFailed
  • mainStore.taskNotFound
  • mainStore.taskProgression
  • mainStore.taskRequested
  • mainStore.taskSucceeded
  • mainStore.taskUrl
  • selectedLanguage.rtl

@benoit74
Copy link
Collaborator Author

benoit74 commented Feb 6, 2025

All the missing keys which are reported are wrong, these "values" are not used as translation keys in the codebase at all, your regex is probably catching too much.

You should also intentionally add an unused key in a JSON file (both a top-level one and a nested one) and check that it pops up in your tool. And intentionally add a missing key both in Python codebase and in TS codebase and check that it pops up in your tool.

@Prakhar-Shankar
Copy link

Sure, I got it. I will try to intentionally add both type of keys.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants