-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactorise parlamint-add-common-content and parlamint2final #675
Comments
My suggestions:
|
Nice ones; I'd call it parlamint4v3 :) |
Here is my proposal for what parlamint-add-common-content should set or correct (both in root and component files, and taking into account number and date formatting, where applicable):
Anything else @matyaskopp? |
As next week is pretty busy, and there is time pressure to finalise the distro scripts (it will take 3 weeks to reprocess all the corpora), I did this myself, with the final dev commit being cc7adde. I think add-common-content works fine (except for some doubts abut BE commission sessions), cf. ParlaMint/Scripts/parlamint-add-common-content.xsl Lines 687 to 691 in cc7adde
Also, parlamint2release might be ok too. The idea was also to leave the "fixing" of metadata (like removal GB "special" speakers) in add-common-content (even though it isn't really), but to move everything to do with annotated words into parlamint2release.xsl. What doesn't work is factorisation, this might be my fault (and, to an extent probably is), but right now the parlamint2distro script has factorisation commented out: ParlaMint/Scripts/parlamint2distro.pl Lines 335 to 337 in cc7adde
The reason is, it produces a mess. Options:
Thoughts? |
It never worked in a partially factorized corpus, because it checks whether any of the types listOrg, listPerson, taxonomy is factorized: ParlaMint/Scripts/parlamint2distro.pl Lines 324 to 329 in cc7adde
If any of these types is factorized, then factorization is skipped: ParlaMint/Scripts/parlamint2distro.pl Lines 331 to 338 in cc7adde
I believe the factorization script is working on partially factorized files, but it does not copy files in |
So maybe the fix here would be to check if all the above are factorised?
Yes please! |
I changed the distro script to factorize every time - even if it is already factorized. Not tested !!! ParlaMint/Scripts/parlamint2distro.pl Lines 324 to 342 in feba24b
Done and tested (copy firstly parse xml - so indentation can change in the file) |
Alas, after another 3 hours trying parlamint2distro.pl still doesn't work. I don't really understand the factorisation stuff it seems, and the whole distro with add-common, parlamint2release and factorisation has gotten so complicated that my head hurts. The distro script now dies with
I would be of coruse grateful for any help, can be directly on tantra of course. |
This has all been now fixed, closing. |
We now have to somewhat similar scripts parlamint-add-common-content and parlamint2final, i.e. they both:
It is not good practice anyway to have simiar code in two places, but parlamint2final also has the bug (cf. #662 (comment))) that it calculates tagUsages on the original data, but then changes various tags in the bodies of components, leading to incorrect final tagUsages.
I propose that we change this so that:
@matyaskopp, do you agree with this change? I can start implementing it, if so.
The text was updated successfully, but these errors were encountered: