Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic LDNodes and RO-Crate parsing rework #496

Merged
merged 59 commits into from
Mar 10, 2025
Merged

Generic LDNodes and RO-Crate parsing rework #496

merged 59 commits into from
Mar 10, 2025

Conversation

HLWeil
Copy link
Member

@HLWeil HLWeil commented Feb 27, 2025

Reworked Generic JSON-LD Objects

This is a generic JSON-LD implementation that represents those aspects of the JSON-LD specification which are relevant for handling ARC-RO-Crate:

  • Datamodel
    • LDContext (Contains mappings from compacted terms to extended IRIs)
    • LDNode (Our basic objects with @id, types and fields)
    • LDGraph (Basically a collection of nodes; Top-level object in the RO-Crate)
    • LDRef (Object containing only @id, meant for linking)
    • LDValue Expanded, self-contextualizing values
  • Algorithms
    • Flatten
    • Compact
    • WIP Unflatten
  • Generic JSON encoders and decoders

Added ARC-RO-Crate profile layer

This layer acts like a ARC-RO-Crate view on the generic JSON-LD model. All actual handled objects are either FSharp primitive types or generic JSON-LD objects. Through making use of static methods which in turn use static IRIs, a loose typing is emulated. These static IRI mostly consist of types and properties from schema.org and bioschemas.org.

  • Added a set of static classes, containing functions for

    • getting properties
    • setting properties
    • generating default @ids
    • validating given LDNode against requirements for this type
    • create functions for LDNodes matching the requiremenets for this type
  • For the following types/profiles:

    • Comment
    • DefinedTerm
    • PropertyValue
    • PostalAddress
    • Organization
    • Person
    • ScholarlyArticle
    • Sample
    • File
    • LabProtocol
    • LabProcess
    • Dataset
  • Added contexts of RO-Crate specification 1.1 and 1.2DRAFT

  • Added context for all other fields exclusive to the ARC-RO-Crate profile

ARC Scaffold <-> ARC-RO-Crate conversion layer

Added functions for Conversion between Scaffold and RO-Crate representation types.
E.g.

  • Ontology Annotation <-> DefinedTerm
  • ArcTable <-> LabProcess Sequence
  • ArcInvestigation <-> Dataset

Added top-level members to ARC Scaffold object to parse to and from an RO-Crate JSON-LD metadata string.


To-do:

  • Top level parsers
  • DOI and PubmedID as identifier
  • Bioschemas context
  • Some @id are empty (LabProtocol)
  • Top-level roundabout tests
  • Link datafiles instead of fragment selectors from datasets

To-do after this PR is merged:

  • Datamap support
  • Make use of @list for ordered process inputs and outputs
  • CI job, running the RO-Crate output for ARC(Prototype?) against RO-Crate validator

closes #326
closes #400
closes #429
closes #434
closes #465
closes #494
closes #495

kMutagene and others added 30 commits February 10, 2025 15:30
- Json: parse `@type`/`additionalType` from either string (-> singleton) or array field, always write array
- add tests
@HLWeil
Copy link
Member Author

HLWeil commented Feb 27, 2025

@floWetzels could you also take a look onto whether #326 is actually fixed or not (e.g. by inspecting a written RO-Crate metadata json)

@floWetzels
Copy link
Collaborator

I tested the metadata json file generated for the ArcPrototype with the rocrate-validator. I found the following errors that need to be fixed:

  • Studies und assays have wrong ids - the path to the isa-xlsx file. It should be the path to the directory ending with a /.
  • Studies and assays are missing the hasPart property
  • The root data entity has no license
  • The root data entity has no datePublished

Furthermore, I noticed the following when parsing the file manually:

  • Paths must be relative to the root of the ro-crate
  • Ad-hoc ids (e.g. for processes, paramters, parameter values) should begin with a # symbol
  • The id of PropertyValue objects looks weird to me. It contains the propertyID, which could be a potential source for errors. Previously, it was generated only from name and value, if I am not mistaken.

@floWetzels
Copy link
Collaborator

Looks a lot better now! The validator accepts the RO-Crate. When looking through the file, I noticed a few more things:

  • Some @id values contain spaces. This appears almost throughout all types, e.g. Organization, PropertyValue, Process, Protocol, Sample etc. This is not necessarily a problem, the json-ld playground is able to handle it for example. It is also not completely avoidable, since file paths can contain spaces. I would prefer to replace them in ad-hoc ids though. The old export did so with a simple replace statement.
  • For some types, ad-hoc ids are generated although a URI or similar identifier exists. I found two examples in the ArcPrototype, DefinedTerm and Person. A DefinedTerm should use its termCode as id, if it exists. For Person, I found an instance where an email address exists, but an ad-hoc id was assigned.
  • Some termCode entries of DefinedTerm contain multiple URIs separated by an underscore. This should not be the case, right? Here's an example: "termCode": "http://purl.obolibrary.org/obo/scoro_http://purl.org/spar/scoro/principal-investigator".

@HLWeil
Copy link
Member Author

HLWeil commented Mar 4, 2025

I incorporated your suggested fixes and included the test file in d3d786e

@HLWeil HLWeil marked this pull request as ready for review March 4, 2025 12:04
@kMutagene
Copy link
Member

@HLWeil i think adding a test that performs the validation of the ro-crate file that @floWetzels did manually is needed. If not in this PR then via separate issue

@HLWeil
Copy link
Member Author

HLWeil commented Mar 6, 2025

Yes!

I included it in the "to-do after PR" so I'll open up the issues when I merge this

Copy link
Member

@kMutagene kMutagene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

static member genId(name : string) =
$"{name}"

static member validate(dt : LDNode, ?context : LDContext) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the equivalent of the missing file profile for nfdi4plants/isa-ro-crate-profile#57?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I mean yesn't.

It isn't missing, File = Data

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the missing organization profile for nfdi4plants/isa-ro-crate-profile#57 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the missing postaladdress profile for nfdi4plants/isa-ro-crate-profile#57 ? If we have 0 mandatory fields, maybe we just don't need to support this type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well there were discussions about it for the specification. Also requests for the implementation: #465

I currently handle it in ARC Scaffold by just writing the json blob in the string field.

static member setValueReference(pv : LDNode, valueReference : string, ?context : LDContext) =
pv.SetProperty(PropertyValue.valueReference, valueReference, ?context = context)

static member validate(pv : LDNode, ?context : LDContext) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this missing validate functions for PropertyValue - DOI and PropertyValue - PubMedID ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's currently not a validate function for doi and pubmedID.

The convertes can be found here:

static member tryDecomposeDOI (doi : LDNode, ?context : LDContext) : string option =

static member setAuthors(s : LDNode, authors : ResizeArray<LDNode>, ?context : LDContext) =
s.SetProperty(ScholarlyArticle.author, authors, ?context = context)

static member tryGetUrl(s : LDNode, ?context : LDContext) =
Copy link
Member

@kMutagene kMutagene Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dropped url as property of this profile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you deleting all these types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a misunderstanding between me, myself and I

@@ -14,7 +14,7 @@ let mandatory_properties = Person(
let all_properties = Person(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these tests still passing when you deleted all the types tested? maybe not seeing the forest for the trees here

Tests.ScholarlyArticle.main
Tests.LDContext.main
Tests.LDNode.main
//Tests.Dataset.main
Copy link
Member

@kMutagene kMutagene Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm i think i found the answer, you are not running the tests. Why fix them then?

@HLWeil
Copy link
Member Author

HLWeil commented Mar 7, 2025

Alright the comments should be resolved.

  • I reincluded the initial ROCrate Types, and renamed the static classes using the LD prefix.
  • LDPropertyValue now has DOI and PubMedID helper functions
  • Implemented and tested DOI and PubMedID backwards compatability (using url and sameAs)

@HLWeil HLWeil merged commit b3d1537 into main Mar 10, 2025
2 checks passed
@HLWeil HLWeil deleted the ld-object-context branch March 10, 2025 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment