Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse and back-map values from XML document #58

Closed
etj opened this issue Mar 2, 2021 · 2 comments
Closed

Parse and back-map values from XML document #58

etj opened this issue Mar 2, 2021 · 2 comments

Comments

@etj
Copy link
Member

etj commented Mar 2, 2021

Investigate about which elements are read from an uploaded XML metadata document and set into the geonode model.
Check whether more fields can be imported.
Check for possible improvements within geonode, and if parsing can be refined by other django apps

@etj etj self-assigned this Mar 2, 2021
@etj
Copy link
Member Author

etj commented Mar 2, 2021

Current parsing is performed here:
https://github.com/GeoNode/geonode/blob/3.1/geonode/layers/metadata.py#L72-L117

Parsed fields:

  • metadata language (not data language)
  • hierarchy
  • metadata datestamp
  • title
  • abstract
  • purpose
  • supplemental information
  • temporal extent
  • topic category
  • keywords (thesauri are not parsed)
  • othercostraints (very naive parsing)
  • purpose
  • lineage

@etj
Copy link
Member Author

etj commented Apr 13, 2021

  • Remove existing keyword mapping and create a new keyword mapping by parsing keywords again.

    • Thesauri
      Default geonode parser will compare Thesaurus.title against //gmd:thesaurusName/gmd:CI_Citation/gmd:title/gco:CharacterString/text().
      RNDT (applying INSPIRE TG2) metadata may have the thesaurus title expressed as //gmd:thesaurusName/gmd:CI_Citation/gmd:title/gmx:Anchor, so the title xpath may not match.
      Since RNDT assumes that matching is performed through the Thesaurus href and not the title, we'll check:
      • if the Thesaurus title is a CharacterString, we'll use that title
      • if the Thesaurus title is an Anchor, search for a thesaurus so that Thesaurus.about == gmx:Anchor/@href
        • if such a thesaurus exists, create a thesaurus dict having title as the title of the thesaurus in the db, so that when saving the dataset's keywords the thesaurus will be matched.
        • else, use as title gmx:Anchor/text()
    • Keywords
      For the keywords the same considerations as for the thesauri apply.
      Default geonode parser will compare ThesaurusKeyword.alt_label or ThesaurusKeywordLabel.label against gmd:MD_Keywords/gmd:keyword/gco:CharacterString/text()
      RNDT metadata may have the keyword expressed as //gmd:MD_Keywords/gmd:keyword/gmx:Anchor, so the keyword text xpath may not match.
      Since RNDT assumes that matching is performed through the Keyword href and not the text():
      • if the Keyword is a CharacterString, we'll use that text()
      • if the Keyword is an Anchor, search for a keyword in the declared thesaurus so that ThesaurusKeyword.about == gmx:Anchor/@href
        • if such a keyword exists, add an entry to the given thesaurus dict setting the alt_label of the Keyword found in the DB.
        • else, discard the keyword and log an error (or save it as a free keyword???)
  • Add parsing for access costraints --> custom val
    We have to check for a block like that:

    <gmd:resourceConstraints>
     <gmd:MD_LegalConstraints>
        <gmd:accessConstraints>
           <gmd:MD_RestrictionCode codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#MD_RestrictionCode"
                                   codeListValue="otherRestrictions"/>
        </gmd:accessConstraints>
        <gmd:otherConstraints>
           <gmx:Anchor xlink:href="http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations">No limitations to public access</gmx:Anchor>
        </gmd:otherConstraints>
     </gmd:MD_LegalConstraints>
    </gmd:resourceConstraints>

    Take the element in xpath:

    gmd:resourceConstraints/gmd:MD_LegalConstraints[gmd:accessConstraints/gmd:MD_RestrictionCodecodeListValue="otherRestrictions"]/gmd:otherConstraints
    
    • If such element contains a gmx:Anchor, then the format is RNDT compliant:
      • take gmx:Anchor/@xlink:href
      • if the URI exists in the related Thesaurus LimitationsOnPublicAccess, store its value in val['constraints_other'];
      • else store the value http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations
    • else gco:CharacterString should exist
      • store its value in a local var, we'll use it back in useConstraints.
  • add parsing for use costraints --> custom rndt
    Check for a block:

     <gmd:resourceConstraints>
        <gmd:MD_LegalConstraints>
           ...
           <gmd:useConstraints>
              <gmd:MD_RestrictionCode 
                 codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#MD_RestrictionCode"
                 codeListValue="otherRestrictions">altri vincoli</gmd:MD_RestrictionCode>
           </gmd:useConstraints>
           <gmd:otherConstraints>
              <gmx:Anchor
                 xlink:href="http://inspire.ec.europa.eu/metadata-codelist/ConditionsApplyingToAccessAndUse/noConditionsApply">Nessuna condizione applicabile</gmx:Anchor>
           </gmd:otherConstraints>
        </gmd:MD_LegalConstraints>
     </gmd:resourceConstraints>

    Take the element in xpath:

    gmd:resourceConstraints/gmd:MD_LegalConstraints[gmd:useConstraints/gmd:MD_RestrictionCodecodeListValue="otherRestrictions"]/gmd:otherConstraints
    
    • If such element contains a gmx:Anchor
      • take gmx:Anchor/@xlink:href
      • if the URI exists in the related Thesaurus ConditionsApplyingToAccessAndUse, store its value in custom {"rndt": {'constraints_other': value}}
      • else store its text() + the textr value extracted in "access costraints" if any in custom {"rndt": {'constraints_other': value}}
    • else gco:CharacterString should exist
      • store its text() + the text value extracted in "access costraints" if any in custom {"rndt": {'constraints_other': value}}
  • add parsing for resolutions --> custom rndt

  • add parsing for accuracy --> custom rndt

For other info about constraints also see #59.

mattiagiupponi added a commit that referenced this issue Apr 14, 2021
mattiagiupponi added a commit that referenced this issue Apr 14, 2021
@etj etj closed this as completed in 91a5ca6 Apr 26, 2021
etj added a commit that referenced this issue Apr 26, 2021
* [Fixes #58] First skeleton of RNDTParser

* [Fixes #58] Improve keyword and thesaurus parsing for RNDTMetadataParser

* [Fixes #58] Improve keyword and thesaurus parsing for RNDTMetadataParser

* [Fixes #58] Improve keyword and thesaurus parsing for RNDTMetadataParser

* [Fixes #58] Add comments and change gathering of thesauri titles

* [Fixes #58] Improements of gathering thesauri titles

* [Fixes #58] Migration to ElementTree + migration fixes, add parsing for use and access_constrains

* #58 Fix typos

* #58 Add resolutions and accuracy handling

* Task #81: add storer skeleton

* Task #58: reversing custom and vals for access and use costraints

* Task #81 rndt storer

* Task #81 Check fi the dict is not None

Co-authored-by: Emanuele Tajariol <[email protected]>
etj added a commit that referenced this issue Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants