-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finalize ParlaMint script #157
Comments
The finalization script can be split into two tasks:
Do not use |
The script currently contains country-specific(/ParlaMint I corpus) modifications
I think that all these changes can be moved to fixings/v2tov3 and validate-parlamint.xsl validation should be extended to cover these known issues. The new partners should add this content themselves. |
…or both root and component files (#157)
Now made new script parlamint2release in 084d3ec. ParlaMint/Scripts/parlamint2release.xsl Lines 10 to 29 in 084d3ec
|
@TomazErjavec I am now checking old issues, and I discovered a suggestion about deleting and reintroducing brackets in notes: #195
Would you like me to do it? |
We have to be a bit careful now, as the notes have already been transtated to English, and I match them to originals based of the form of the original note. But if the transformation is deterministic and commented, I guess I can apply the same transformation in the matching process, so, yes, pls. do it. I guess parlamint2release is the right script for this. |
Understand. I will implement function Giving it a second thought - do we want to normalize all notes with remove and reintroduce parentheses, as suggested here: #195 (comment) At the time of discussing this, we did not have an experience with this kind of long-sequence of notes: ParlaMint/Data/ParlaMint-SE/ParlaMint-SE_2017-12-12-prot-201718--48.xml Lines 85 to 116 in 53c4c19
I suggest normalizing spaces based on context:
|
Hm, good points. Thinking about this further, maybe we should:
Namely, I am a bit frightned of having complicated and context dependent rules for the transformation, and we are bound to overlook something in some of the corpora, i.e. make a mess. |
ok, then removing pairing boundary brackets is the best way. I will implement that. |
OK, great, thanks.
Far from it that my suggestions are always right but nice of you to say so :) |
I think we are done here (37d6946) so, closing. |
This issue collects ideas on what should (and probably shouldn't) do the finalization script.
tagUsage
numbersextent/measure
. (Numbers of speeches in component files should be provided by partners)reference/covid
The text was updated successfully, but these errors were encountered: