Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schemaSpec: affiliation has to affiliate person with organization #177

Closed
matyaskopp opened this issue Mar 3, 2022 · 3 comments
Closed
Labels
🕮 Documentation Improvements or additions to documentation help wanted Extra attention is needed

Comments

@matyaskopp
Copy link
Collaborator

matyaskopp commented Mar 3, 2022

Element <affiliation> should affiliate a person with an organization with a role. I don't want to discuss here @role, it should be done in a separate issue. I want to discuss the relationship between person and organization. I believe there are three ways how it can be implemented:

  1. Best solution is using an affiliation/@ref attribute to refer org/@xml:id, which can be supported by affiliation/@ana (@corresp would be better, but it is for another discussion improper usage of ana attibute #80) that points to the event related to such org.
  2. Another solution can use affiliation/@key where the value should be some well-known name of an organization such as "OSN". This can be used when we don't want to introduce this organization in listOrg, but the name is quite determinate.
  3. The organization is stored in the text value of the affiliation element. This should be admitted when other options fail.

With respect to above observations, I have discovered the following bugs (the list is not definitely complete)

<affiliation ref="#parliament.PSP9" role="MP" from="2021-10-09T14:00:00"/>

where the @ref target is event:

<event xml:id="parliament.PSP9" from="2021-10-09">
  <label xml:lang="cs">9. volební období</label>
  <label xml:lang="en">term 9</label>
 </event>

and should be organization, the correct solution is:

<affiliation  ref="#parliament" ana="#parliament.PSP9" role="MP" from="2021-10-09T14:00:00"/> <!-- ana or corresp -->
             <affiliation from="2018-04-10" ref="#parla.lower" role="MP"/>
<affiliation ana="#S.8" from="2016-12-13" role="chairperson" to="2017-03-10"/>

@TomazErjavec, do we want to fix these issues? I wanted to make @ref obligatory, but I am not sure about that - as it would break most of the corpora.

@matyaskopp matyaskopp added help wanted Extra attention is needed 🕮 Documentation Improvements or additions to documentation labels Mar 3, 2022
@TomazErjavec
Copy link
Collaborator

Quite a mess, thanks for spotting it. I would say:

  • I would definetely not use option 2, i.e. affiliation/@key, for many reasons: we introduce a new mechanism, as t is still kindof pointing, but not in any specified way, and you can't even specify the language of the value (as it is an attribute). I am less sure whether to have both options 1 and 3, i.e. have affiliation/@ref or, for cases where we can't be bothered to make an org, have affiliation/text(). But am leaning towards having just 1, because the whole point of the ParlaMint encoding is to have one way of encoding a certain piece of information and because it is easy to make a script that changes 3 to 1.

  • For the affiliations currently without @ref, there is no simple answer, because having <affiliation role="member"/> is completely useless, and should be corrected or the whole affiliation removed. Others are semi-useful, like 'minister', 'chairperson', i.e. you know what they were, which can be useful, but not exactly of what. And for 'MP' the @ref is more or less redundant, i.e. it is obvious of what they are an MP of, and this could be even automatically inserted.

In general, I would propose - and this is most likely another issue - to have per-corpus "fix" scripts, so we can change encodings for the better, without breaking validity. And then, when the ParlaMint I partners start making their new corpora, tell them to either fix their scripts, or simply use the fix-XX.xsl script for V3.

@matyaskopp
Copy link
Collaborator Author

I would definetely not use option 2, i.e. affiliation/@key, for many reasons: we introduce a new mechanism, as t is still kindof pointing, but not in any specified way, and you can't even specify the language of the value (as it is an attribute).

Ok, I have carried away due to my future ideas about attribute @key. I will (in future, not sure that I will be able to deliver it in ParlaMint 3 data) use the attribute @key in named entity normalization for NER (it is a small step towards named entity linking). It can solve the situation of namesakes, i.e. Václav Klaus in CZ data - we can't be sure if it is a name of father or son without a bigger analysing of context.

In the case of organizations in teiHeader: If we assume that if the name of the organization is similar, the organizations are identical, then we can merge all organizations to solution 1.
Great observation about the language -> it makes @key in affiliation context strange.

In general, I would propose - and this is most likely another issue - to have per-corpus "fix" scripts, so we can change encodings for the better, without breaking validity. And then, when the ParlaMint I partners start making their new corpora, tell them to either fix their scripts, or simply use the fix-XX.xsl script for V3.

Ok, creating an issue on v2tov3 fixings #183

@TomazErjavec
Copy link
Collaborator

The way affiliations refer to organisations is now fixed, so nothing more to do here, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🕮 Documentation Improvements or additions to documentation help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants