Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add forge methods to acces/query datasets from external sources #367

Draft
wants to merge 50 commits into
base: master
Choose a base branch
from

Conversation

crisely09
Copy link
Contributor

This is not complete, there are inconsistencies to be fixed, but I didn't manage to do it on time.
Basically, IT DOESN'T WORK YET.
I leave it open for now.

ssssarah and others added 24 commits July 5, 2024 16:13
* load local configuration when configuration tests

* keep commons imports together

* fix duplicate keyword argument issue

* use context path from default config in test_to_resource test

* rm extra store_config.pop

* refactor store config
* pass view when sparql, elastic call, todo search

* rm unless constants from store

* turn view to endpoint

* endpoint to view param

* rename param

* rename param2

* keyword param from forge call

* missing underscore

* git status

* make endpoint refac

* edit querying notebook to showcase feature, todo set up better view

* refac notebook edit

* change view creation mapping

* check filters not provided as keyword arg

* fix querying notebook, retrieve using resource url

* test make endpoint function

* use *filters for the store interface and implementations
* added timeouts to every requests. call

* centralise default request timeout

* rm import

* use constant in file get
* change signatures to allow for boolean change_schema on update

* change_schema implemented for single resource update

* refactor batch actions

* lint

* began batch update schema

* change schema many

* progress

* rm / join

* rm useless change

* fix

* change schema to update schema

* update instead of change

* example notebook for schema method, todo update

* notebook example with update

* lint

* improve notebook

* fix one test

* keep unconstrained schema only for update endpoint else _

* same url building in one and many

* add timeout

* schema id optional in update

* rename local parse

* rename keep_unconstrained to use_unconstrained_id

* rm extra docstring param
* change signatures to allow for boolean change_schema on update

* change_schema implemented for single resource update

* refactor batch actions

* lint

* began batch update schema

* change schema many

* progress

* rm / join

* rm useless change

* fix

* change schema to update schema

* update instead of change

* example notebook for schema method, todo update

* notebook example with update

* lint

* improve notebook

* fix one test

* keep unconstrained schema only for update endpoint else _

* same url building in one and many

* add timeout

* schema id optional in update

* rename local parse

* rm second request for metadata

* add query param annotate only if retrieve source is true

* retrieval error if issue creating resource from response

* rm cross bucket check with source

* add todo

* separate metadata call if cross bucket and source

* refac

* fixes and notebook update

* check deployment endpoint self, may need to be checking multiple values

* revert to self.endpoint

* updated notebook to show retrieval

* fix response name

* add query param annotate only if retrieve source is true

* separate metadata call if cross bucket and source

* rename keep_unconstrained to use_unconstrained_id

* rm extra docstring

* better comments

* clarify comment

* comment fix

* improve markdown
* re-do metadata fetch until endpoint is fixed

* better notebook

* rename variables

* code style

* fix replace in comments

* update comments
…False (#382)

* return resource as_json optionally when forge.elastic

* as_resource instead of as_json, default True

* skeleton to enable building resources from different values in the es response payload

* example of forge.elastic as_resource = False in getting started Querying notebook
* Set jsonld context version to 1.1
* this enables non IRI-delimiting character not present in rdflib.plugins.shared.jsonld.context.URI_GEN_DELIMS (e.g '_' in "NCBITaxon": "http://purl.obolibrary.org/obo/NCBITaxon_") to be used when defining jsonld context prefix
Currently the pySHACL throws a ReportableRuntimeError("Evaluation path too deep!\n{}".format(path_str)) exception when evaluating a shape if the length of its transitive closure of the sh:node property is bigger or equal to 30.

Give a node shape, this PR addresses this by:

* Fixes pyshacl deep nodeshape path eval error by first recursively collecting all the property shapes directly defined (through sh:property) by the node shape or inherited from the node shape parents then link those collected property shapes to the node shape through sh:property
and finally remove the node shape <-> parent shape relatioinships
* Aligned the expected data model for a shacl shape between the RDF StoreService and DirectoryService:
* Fixed coupled of issues:
* forge.prefixes() was raising pandas.Dataframe error "if using all scalar values, you must pass an index"
* fixed forge.types() to properly collect types from rdfsercice.class_to_shape
* fixed forge.template() when using an rdf  model based on a store service: Unable to generate template:'tuple' object has no attribute 'traverse'
* Added support for inference when using the RdfModel:
     * support for importing ontologies from schemas using owl:imports
     * use forge.validate(resource, inference="inference_value", type_='AType') with inference_value as in https://github.com/RDFLib/pySHACL/blob/v0.25.0/pyshacl/validate.py#L81. inference_value="rdfs" seems to be enough to extend the resource with the transitive closures of type subClassOf and/or property subPropertyOf relations as per the
                          RDFS entailment rules (https://www.w3.org/TR/rdf-mt/).

* Validation now fails when a type not in the resource is provided as value of the type_ argument unless inference is enabled (with inference='rdfs' for example) and the resource type is a subClassOf of type_
MFSY and others added 22 commits July 5, 2024 16:32
…cursive_resolve when resolving a str jsonld context (#402)
* Add alternateName to agent resolver

* Added property also to the agent-to-resource mapping
* Make add_image method of a Dataset, and not of KnowledgeGraphForge

* Update notebooks with example of dataset.add_image method
resolve with target species returns only species and strain returns strains
…able (#406)

* Add method when initializing forge to export environment variable

* Remove addition in setup.py and try os.environ instead of os.system
…-file-content-length` to header (#403)

* Remove nexus_sdk from nexus store when uploading files

* Change content length header to be
* split file get call and property access

* split prepare download one

* lint

---------

Co-authored-by: Leonardo Cristella <[email protected]>
* rm sdk usage from bluebrainexus store file

* rm sdk usage from utils

* rm sdk usage from service

* rm nexussdk from test

* lint

* fix patch

* change usage of project_fetch function for successful patching in tests

* rename module of sdk methods

* remove leftover image from store

* missing file

* restore config

* remove s from file name

* missing the file, again

---------

Co-authored-by: Cristina E. González-Espinoza <[email protected]>
* Check for schema id to use schema endpoint

* Add example in unit test

* Add missing  parameter inside _register_one. Add notebook example of schema handling

* check if resource_id is given
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants