Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Localization - vertical files #794

Open
matyaskopp opened this issue Sep 25, 2023 · 1 comment
Open

Localization - vertical files #794

matyaskopp opened this issue Sep 25, 2023 · 1 comment
Assignees
Labels
help wanted Extra attention is needed question Further information is requested
Milestone

Comments

@matyaskopp
Copy link
Collaborator

I still need help with agreeing with the localization of vertical files. Sample from ParlaMint-BE

<speech id="ParlaMint-BE_2022-06-09-voorlopig-55-plenair-ip185x.u1" 
        text_id="ParlaMint-BE_2022-06-09-voorlopig-55-plenair-ip185x" 
        subcorpus="War" 
        lang="Multilingual" 
        body="Eerste Kamer" 
        term="55" 
        session="-" 
        meeting="ip185" 
        sitting="-" 
        agenda="-" 
        date="2022-06-09" 
        title="Belgisch parlementair corpus ParlaMint-BE, plenaire zitting van 09-06-2022" 
        speaker_role="Voorzitter" 
        speaker_id="TillieuxEliane" 
        speaker_name="Tillieux, Eliane" 
        speaker_mp="MP" 
        speaker_minister="notMinister" 
        speaker_party="PS" 
        speaker_party_name="Parti Socialiste" 
        party_status="Coalition" 
        party_orientation="Centre-left to left" 
        speaker_gender="F" 
        speaker_birth="1966">
...

I can see multiple problems:

  1. The corpus is partially translated so that the query will contain mixed languages en/nl in values
  2. The corpus is multilingual (fr/nl), so the user can expect French in values
  3. if someone decides to improve translations(use a different term / add missing translation) in future releases (ParlaMint 4/5 ??), then old queries will not work
  4. What is the plan for all-in-one (ParlaMint-XX) in noSkech? Will we use the English values?
@matyaskopp matyaskopp added help wanted Extra attention is needed question Further information is requested labels Sep 25, 2023
@TomazErjavec TomazErjavec added this to the Future milestone Sep 30, 2023
@TomazErjavec
Copy link
Collaborator

I still need help with agreeing with the localization of vertical files.

In short: it isn't perfect but it is the first step. I think for ideopolitical reasons, if nothing else, the researchers in country XX looking at the parliament of XX deserve to have the metadata in their native language. And given that we have the most of the metadata in both en and xx, why not display it in xx?

That said:

I can see multiple problems:

1. The corpus is partially translated so that the query will contain mixed languages en/nl in values

True - but most is translated (or at least should be, depending on the partner), I think everything except for "Multilingual", "MP", "minister" and "F".

2. The corpus is multilingual (fr/nl), so the user can expect French in values

Yes, this is a limitation, I agree. Then again, at least for the concordancers, we probably wouldn't want to have two corpora for some countries with the only difference in the langauge of the metadata; or, maybe even worse, all the metadata available in two languages as separate attributes. Would get messy.

3. if someone decides to improve translations(use a different term / add missing translation) in future releases (ParlaMint 4/5 ??), then old queries will not work

If somebody changes the English term they won't work either. Anyway, thinking that only the version of the corpus can be changed and all the rest works isn't the case now either, e.g. between 2.1 and 3.0, and 3.0 and 4.0 attributes have changed.

4. What is the plan for all-in-one (ParlaMint-XX) in noSkech? Will we use the English values?

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants