Experiments with RDA PIDINST and PIDINST-SCHEMA

This repository contains some simple experiments with the RDA PIDINST schema and the creation of a simple ontology based on it. The schema is available at rdawg-pidinst. The ontology is available at pidinst.ttl as well as SHACL shapes at pidinst-shapes.ttl. It contains a sample pidinst JSON schema extracted from the ODIS Architecture GitHub Issue in the file pidinst-schema.json and pidinst-example.json which is a sample instance of the schema. A data prototype is in the Sarowar Hossain Github repository. Analysis was done with ChatGPT GPT-4 with bing web plugin for Retrieval Augmented Generation. Guidance for Dataset follows ESIP Science on schema.org (SOSO) and Google Dataset Search.

Tools

devcontainers for VSCode
kglab for graph manipulation using:
- pyshacl for SHACL validation
- rdflib for RDF manipulation
nbdev

Sample Use Case

The owner of the instrument is the NSF facility MagLab (https://ror.org/03s53g630). Josiah Carberry’s (https://orcid.org/0000-0002-1825-0097) data acquisition info:

The experimental data (spectra) are acquired by a spectrometer (#1 HRS750, #2 IsoPlane, Teledyne Princeton Instruments). The spectrometers are (almost) fully automated and controlled via the LightField software (Teledyne Princeton Instruments). LightField automatically saves the acquired data and all experiment settings (spectrometer settings) in one file. https://www.princetoninstruments.com/products/software-family/lightfield

LightField saves files in *.SPE format (whatever it means).

DS’ file- / folder- name convention:

Folder name: PI name_Experiment ID_Magnet system-Instrument_Start date

  Josiah_Carberry_P19401-E011-DC_SCM3-HRS750_02-12-2023
  Josiah_Carberry_none_B114-IsoPlane_02-12-2023

File name: Type of the experiment: PL, Ra(man), Re(flectance), Tr(ansmittance) Sample short name: **** Magnetic field: ***T (or from to ) Temperature: ***K Light source: SC, 532nm, 785nm, … - Power: ***mW or uW, or percentage Central frequency / wl/energy: ***cm-1, nm, eV Slit: value: *** um Acq.time: ***min or sec Objective NA: ***NA Other: gate voltage, pressure, …

PL_WSe2-MoSe2_00.0T_to_05.2T_ 10K_633nm-100uW_720nm_30um_2min_0.65NA.SPE Ra_CsPr_30T_7.2K_532nm-2mW_550cm-1_30um_3x2min_0.82NA.SPE Re_InSe_0T_5K_SC-20%600meV_50um_5sec 0.65NA_Gate Sweep -10V to +20V.SPE

Thematic references in sample data

ChatGPT was used to do the thematic analysis of the sample data. PIDINST JSON Chat is the link to the chat.

The sample data provided has the following semantic themes that potentially map to persistent identifiers:

Observation (sosa:Observation, schema:Observation)
Experiment (Activity)
Instrument (PIDINST, sosa, handle.net registry)
Sample (RRID)
Agent (Person, Organization, Software)
- Person (ORCID)
- Organization (ROR)
- Software (SBoM, Codemeta )
Location (w3c locn, schema.org)
Dataset (DOI)
Physical Parameters: There are various physical parameters involved in the experiments, such as type of experiment (PL, Ra, Re, Tr), sample name, magnetic field, temperature, light source, power, central frequency/wavelength/energy, slit value, acquisition time, objective numerical aperture (NA), and others like gate voltage, pressure. Ontology patterns representing these physical parameters, their units of measurement (T, K, nm, mW/uW/%, cm-1/nm/eV, um, min/sec, NA, V), and their roles in the experiment would be necessary.
Experimental Procedure: The narrative also implies a certain procedure or workflow that is followed in conducting the experiments and saving the data. This could be represented by an ontology pattern describing scientific procedures or workflows.

Thematically, we have something that is defined as an Affordance in terms of variables that we can measure (this is confusing LLM agents and potential instrument settings. We need a specification of these “affordances” in the PIDINST doc. This is separate from the the settings that were used in the “Observation” which is an instantiation of a particular set of affordances in a “Activity” that is an experiment.

graph LR
    subgraph "Agent"
        P[Person] -. "rdfs:subClassOf" .-> A[Agent]
        O[Organization] -. "rdfs:subClassOf" .-> A
        S[Software] -. "rdfs:subClassOf" .-> A
    end
    subgraph "Entity"
        I[Instrument] -. "rdfs:subClassOf" .-> E[Entity]
        Sm[Sample] -. "rdfs:subClassOf" .-> E
        D[Dataset] -. "rdfs:subClassOf" .-> E
        PP[PhysicalParameters] -. "rdfs:subClassOf" .-> E
    end
    subgraph "Activity"
        Ob[Observation] -. "rdfs:subClassOf" .-> Act[Activity]
        Ex[Experiment] -. "rdfs:subClassOf" .-> Act
    end
    subgraph "Plan"
        EP[ExperimentalProcedure] -. "rdfs:subClassOf" .-> Pl[Plan]
    end

    P -- "prov:wasAssociatedWith" --> Ob
    Ob -- "prov:wasPartOf" --> Ex
    I -- "prov:used" --> Ob
    Sm -- "prov:used" --> Ob
    D -- "prov:wasGeneratedBy" --> Ob
    O -- "prov:actedOnBehalfOf" --> P
    S -- "prov:actedOnBehalfOf" --> I
    L[Location] -- "prov:atLocation" --> Ob
    L -- "prov:atLocation" --> I
    L -- "prov:atLocation" --> Sm
    PP -- "prov:used" --> Ob
    EP -- "prov:wasUsedBy" --> Ex

The narrative describes a complex experimental setup involving various physical parameters, instruments, and naming conventions. Here are the key themes or components present in the narrative, along with corresponding ontology patterns that might be necessary to describe them:

Instruments: Spectrometers (#1 HRS750, #2 IsoPlane, Teledyne Princeton Instruments) and the controlling software (LightField software). This would require an ontology pattern that describes scientific instruments, their models, manufacturers, and associated software.
Experimental Data: The acquired data and experiment settings saved in a .spe file. An ontology pattern representing data objects, their formats (.spe), and the parameters/settings they contain would be necessary.
File and Directory Naming Conventions: The narrative describes a detailed naming convention for both directories and files. This could be represented using an ontology pattern that encapsulates the structure and semantics of these naming conventions.
Physical Parameters: There are various physical parameters involved in the experiments, such as type of experiment (PL, Ra, Re, Tr), sample name, magnetic field, temperature, light source, power, central frequency/wavelength/energy, slit value, acquisition time, objective numerical aperture (NA), and others like gate voltage, pressure. Ontology patterns representing these physical parameters, their units of measurement (T, K, nm, mW/uW/%, cm-1/nm/eV, um, min/sec, NA, V), and their roles in the experiment would be necessary.
Experimental Procedure: The narrative also implies a certain procedure or workflow that is followed in conducting the experiments and saving the data. This could be represented by an ontology pattern describing scientific procedures or workflows.
Samples: Different samples are used in the experiments, as seen in the file naming convention. An ontology pattern representing scientific samples, their types, and characteristics would be needed.

Remember, the role of an ontology is to provide a common framework for representing knowledge. Depending on the complexity and specific requirements of your use case, you might use existing ontologies (like the OBO Foundry ontologies), or develop custom ontologies tailored to your needs.

References

Development of the Ocean Data and Information System (ODIS) architecture

Science on Schema.org

Schema.org IoT

IoT and Schema.org: Getting Started

BioSchemas

Other Ontologies

Refactor Use case Narrative

The experimental data (spectra) are acquired by a spectrometer (#1 HRS750, #2 IsoPlane, Teledyne Princeton Instruments). The spectrometers are (almost) fully automated and controlled via the LightField software (Teledyne Princeton Instruments). LightField automatically saves the acquired data and all experiment settings (spectrometer settings) in one file (spe file format). Each experiment is saved in a folder with the following directory naming convention: “PI name_Experiment ID_Magnet system-Instrument_Start date”. Inside the folder is stored the spe file with the following file naming convention:

Type of the experiment: PL, Ra(man), Re(flectance), Tr(ansmittance): - Sample short name: **** - Magnetic field: ***T (or from to ) - Temperature: ***K - Light source: SC, 532nm, 785nm, … - Power: ***mW or uW, or percentage - Central frequency / wl/energy: ***cm-1, nm, eV - Slit: value: *** um - Acq.time: ***min or sec - Objective NA: ***NA - Other: gate voltage, pressure, …

For example: PL_WSe2-MoSe2_00.0T_to_05.2T_ 10K_633nm-100uW_720nm_30um_2min_0.65NA.SPE Ra_CsPr_30T_7.2K_532nm-2mW_550cm-1_30um_3x2min_0.82NA.SPE Re_InSe_0T_5K_SC-20%600meV_50um_5sec 0.65NA_Gate Sweep -10V to +20V.SPE

LLMs

Use Keywords and descriptions for parameters
Model the instrument as a product
Model the settings more formally.

ChatGPT summary

Certainly, the following provides a range of options that the MagLab could consider when developing their semantic infrastructure, taking into consideration both minimal and more comprehensive approaches.

Minimal Approach - Schema.org metadata for experiments:
- Description: Use Schema.org terms to describe the key features of your experiments. This involves adding appropriate Schema.org types (like Dataset, Person, Organization, etc.) and properties to your HTML metadata.
- FAIR perspective: This approach would increase the Findability and Interoperability of your datasets, as Schema.org is widely recognized and used by major search engines, aiding in data discovery. However, this is a general-purpose vocabulary and may lack the specificity required to fully describe scientific experiments.
Mid-level Approach - Schema.org + ESIP Science-on-Schema.org guidelines:
- Description: Extend the minimal approach by following ESIP’s guidelines for using Schema.org in a scientific context. This provides a richer description of your data and includes terms from QUDT to quantify units and measuredVariables.
- FAIR perspective: By adhering to a community-accepted standard like ESIP, you enhance the Interoperability and Reusability of your data, as others in the scientific community will have a better understanding of the metadata and can use it more effectively.
Advanced Approach - Adopt a more descriptive Ontology Design Pattern (ODP):
- Description: Implement a more formal ontology structure by creating a pattern that separates Experiment, Instrument, and Sample as distinct entities that participate in the Experiment process. This can be achieved with Semanticscience Integrated Ontology (SIO), which provides a rich semantic framework for describing scientific entities and processes. Here, Instrument and Sample participate in Experiment, which is a subclass of SIO:Process (or prov:Activity for alignment with PROV-O).
- FAIR perspective: This approach would greatly enhance the Interoperability and Reusability of your data by providing a robust and expressive semantic model for your experiments. This detailed semantic information can improve data discovery and understanding, aiding both human users and automated processes.
Comprehensive Approach - Integrate with existing ontologies and identifiers:
- Description: Further expand the advanced approach by aligning with existing ontologies and identifier systems. This could include leveraging the PhysicalSamples ontology and Instrument identifiers (PIDINST) from RDA, BioSchemas recommendations, and RRIDs for tracking samples. Use OWL-Time and W3C locn for representing time and location, respectively.
- FAIR perspective: This approach ensures maximal Findability, Accessibility, Interoperability, and Reusability of your data by adhering to well-established standards and practices. It promotes data integration, comparison, and reuse across different studies and scientific fields.

In summary, the chosen approach depends on the specific needs and resources of the MagLab. A more comprehensive approach offers greater FAIR compliance and utility for data users but requires more effort to implement and maintain. Consider the FAIR principles as guiding concepts, promoting maximum use and reuse of your valuable data. Remember that being FAIR is a journey, not a destination, and even small steps towards better metadata can have a significant impact on data utility.

Work

We should probably use https://w3id.org/pidinst/schema/ for namespace
Sanity check against need “platform” / “instrument” / “sensor” field in top-level JSON-LD
Sample PIDINST landing Page + Metadata
Guidance Document Instrument.md
PIDINST Schema dereference via w3id.org
ODP for “experiment” leveraging SIO and Prov-O
RO-Crate for experiment
HDF5 for experiment
Complete Wikidata for Princeton Instruments HRS750
Complete Wikidata for Princeton Instruments IsoPlane
Complete Wikidata for LightField
Prompt engineering for @context

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
chatgpt-logs		chatgpt-logs
data		data
images		images
notebooks		notebooks
pidinst_experiments		pidinst_experiments
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
settings.ini		settings.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experiments with RDA PIDINST and PIDINST-SCHEMA

Tools

Sample Use Case

Thematic references in sample data

References

Development of the Ocean Data and Information System (ODIS) architecture

Science on Schema.org

Schema.org IoT

BioSchemas

Other Ontologies

Refactor Use case Narrative

LLMs

ChatGPT summary

Work

About

Releases

Packages

Languages

License

charlesvardeman/pidinst-experiments

Folders and files

Latest commit

History

Repository files navigation

Experiments with RDA PIDINST and PIDINST-SCHEMA

Tools

Sample Use Case

Thematic references in sample data

References

Development of the Ocean Data and Information System (ODIS) architecture

Science on Schema.org

Schema.org IoT

BioSchemas

Other Ontologies

Refactor Use case Narrative

LLMs

ChatGPT summary

Work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages