Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace XSLT with TS #11

Merged
merged 50 commits into from
Feb 29, 2024
Merged

Replace XSLT with TS #11

merged 50 commits into from
Feb 29, 2024

Conversation

nleanba
Copy link
Collaborator

@nleanba nleanba commented Feb 21, 2024

Open todos:

  • figures
  • refactor code to be more sensible (currently, a lot is a 1:1 translation from XSLT and thus very convoluted. If it was immediately obvious to me how to streamline somthing, it was done, but XSLT is hard)
  • integration into pipeline
  • running a test run over all treatments and comparing for (unwanted) changes

it only reports on removed/changed lines (i.e. additions are presumed to be fine)
note that this makes some assumptions as to where xml and rdf are located
in case of defining treatments
neccessairy because some information needs to be collated
also various fixes
@retog
Copy link
Collaborator

retog commented Feb 28, 2024

I didn't think changing it for makePublication would be worth it.

I tend to disagree. It makes the NOTES at the top of the file inaccurate, and even if that is fixed it still increases complexity. I think the additional time needed when making future changes far outweighs the tedious but relatively small amount of work to make it consistent now.

@nleanba
Copy link
Collaborator Author

nleanba commented Feb 29, 2024

I've added fish to the container but I still can't execute the script

root@fbc1a93c34b7:/workspaces/gg2rdf# ./test_noxslt.fish ./ex.xml 
File ./ex.ttl doesn't exist! Aborting
root@fbc1a93c34b7:/workspaces/gg2rdf# ls ex.xml 
ex.xml
root@fbc1a93c34b7:/workspaces/gg2rdf# ./test_noxslt.fish `pwd`/ex.xml 
File /workspaces/gg2rdf/ex.ttl doesn't exist! Aborting

The error is because it checks for {$ttlReferenceDir}/ex.ttl, which probaly does not exist in the docker container.

./test_noxslt.fish was never intended to be run in the container, but is only for manual testing. If you wish to use it in a container, you should already be running an interactive shell in the container, install fish from there.

If you wish to simply check what output gg2rdf.ts produces, just call deno run --allow-read --allow-write ./gg2rdf.ts -i <xml filename> -o <ttl filename>

Dockerfile Outdated Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
@nleanba
Copy link
Collaborator Author

nleanba commented Feb 29, 2024

I didn't think changing it for makePublication would be worth it.

I tend to disagree. It makes the NOTES at the top of the file inaccurate, and even if that is fixed it still increases complexity. I think the additional time needed when making future changes far outweighs the tedious but relatively small amount of work to make it consistent now.

i have changed this now

@retog
Copy link
Collaborator

retog commented Feb 29, 2024 via email

@retog
Copy link
Collaborator

retog commented Feb 29, 2024

Experimenting with 000040332F2853C295734E7BD4190F05. I see the current ttl version has 106 triples,the one generated by the new transformer 118.

These are the additional triples:

 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://plazi.org/vocab/treatment#hasTaxonName> <http://taxon-name.plazi.org/id/Animalia/Saigona> .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://rs.tdwg.org/dwc/terms/class> "Insecta" .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://rs.tdwg.org/dwc/terms/family> "Dictyopharidae" .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://rs.tdwg.org/dwc/terms/genus> "Saigona" .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://rs.tdwg.org/dwc/terms/kingdom> "Animalia" .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://rs.tdwg.org/dwc/terms/order> "Hemiptera" .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://rs.tdwg.org/dwc/terms/phylum> "Arthropoda" .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://rs.tdwg.org/dwc/terms/rank> "genus" .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://rs.tdwg.org/dwc/terms/scientificNameAuthorship> "Matsumura, 1910" .
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_Matsumura_1910> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://filteredpush.org/ontologies/oa/dwcFP#TaxonConcept> .
43a54
 <http://taxon-concept.plazi.org/id/Animalia/Saigona_baiseensis_Zheng_2021> <http://rs.tdwg.org/dwc/terms/scientificNameAuthorship> "Zheng & Chen, 2021" .
100a112
 <http://treatment.plazi.org/id/000040332F2853C295734E7BD4190F05> <http://purl.org/dc/elements/1.1/title> "Saigona baiseensis Zheng & Chen 2021, sp. nov." .

The additional triples look sound, the previously missing title matches the one on https://treatment.plazi.org/id/000040332F2853C295734E7BD4190F05

I do see two error messages:

Error: Invalid Authority for <http://taxon-concept.plazi.org/id/Animalia/fulgoroidesINVALID>
Error: Invalid Authority for <http://taxon-concept.plazi.org/id/Animalia/fulgoroidesINVALID>

@retog
Copy link
Collaborator

retog commented Feb 29, 2024

Regarding

`output(...)` should not be assumed to run synchronous,
  and all data passed to it should still be valid under reordering of calls.

In turtle the order plays a role with regard to base and prefix, so I think this requirement should be dropped.

@nleanba
Copy link
Collaborator Author

nleanba commented Feb 29, 2024

I do see two error messages:

Error: Invalid Authority for <http://taxon-concept.plazi.org/id/Animalia/fulgoroidesINVALID>
Error: Invalid Authority for <http://taxon-concept.plazi.org/id/Animalia/fulgoroidesINVALID>

Those Errors indicate that it found mentions of these taxa, but it could not figure out their authorities to turn them into citations (augments or deprecates). The rdf output matches the behaviour of the xslt, but with more insight into why it was generated in the way it was.

@retog retog merged commit fc44dea into main Feb 29, 2024
1 check passed
@retog retog deleted the no_xslt branch February 29, 2024 13:58
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see the point of having this?
I'd rather test as directly on a bunch files from a current clone of the treatment-xml repo, given that no single file can cover all edge cases and the xml may change

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it greatly facilitates debugging

@nleanba
Copy link
Collaborator Author

nleanba commented Feb 29, 2024

@retog did you rebase this into main?

this is rather inelegant, as now all my commits are marked as unverified.

i think a merge commit would have been better

@nleanba nleanba restored the no_xslt branch February 29, 2024 14:04
@retog
Copy link
Collaborator

retog commented Feb 29, 2024

@retog did you rebase this into main?

this is rather inelegant, as now all my commits are marked as unverified.

i think a merge commit would have been better

I did. I didn't know about this implication.

This was referenced Feb 29, 2024
@nleanba nleanba deleted the no_xslt branch March 11, 2024 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants