Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create python module to validate processed data #18

Closed
wants to merge 28 commits into from

Conversation

horstf
Copy link
Collaborator

@horstf horstf commented Apr 13, 2022

No description provided.

@horstf horstf changed the title Validation modularization Create python module to validate processed data Apr 13, 2022
@horstf horstf marked this pull request as ready for review May 3, 2022 08:41
@horstf horstf requested a review from joergfunger May 3, 2022 08:41
Copy link
Member

@joergfunger joergfunger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the merge request is uptodate, at least test files are currently not working.

@horstf horstf marked this pull request as draft May 6, 2022 10:01
@joergfunger
Copy link
Member

@horstf we have updated the structure, could you try to adapt your code the way it is now working in this new structure, such that we could merge it into main?

@horstf
Copy link
Collaborator Author

horstf commented Jul 19, 2022

@horstf we have updated the structure, could you try to adapt your code the way it is now working in this new structure, such that we could merge it into main?

Is that the structure in the /lebedigital folder? I.e. refactor my code in an appropriate /lebedigital/validation folder?

@horstf
Copy link
Collaborator Author

horstf commented Jul 26, 2022

I've copied the validation submodule into the new lebedigital package. I haven't updated the dodo.py file yet because the current version is not using the new package for now.

@horstf horstf marked this pull request as ready for review July 26, 2022 10:04
@horstf horstf requested a review from joergfunger August 3, 2022 10:11
Copy link
Member

@joergfunger joergfunger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the pull request, there are some modifications required to adapt to the new structure and some documentation issues.

lebedigital/validation.py Outdated Show resolved Hide resolved
lebedigital/validation.py Outdated Show resolved Hide resolved
@@ -0,0 +1,78 @@
from pyshacl import validate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure that that file does, but all files should eiter be in lebedigital (the actual functions), in tests (the tests for these functions) or in minimumWorkingExample (the pydoit workflow that processes the data for the minimum working example). The folder usecases/Concrete is deprecated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That file is not used anymore, I think that was an accidental commit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is still in Concrete, please remove it?

@horstf horstf requested a review from joergfunger August 24, 2022 11:48
Copy link
Member

@joergfunger joergfunger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, I still have a few comments. Where are the tests related to validation.py? In addition, the folder usecases/Concrete is deprecated.

@@ -0,0 +1,78 @@
from pyshacl import validate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is still in Concrete, please remove it?

@@ -8,5 +8,6 @@ SPARQLWrapper==1.8.5
requests==2.22.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is also outdated, the conda environment at the top level is the one and only to be used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can ignore these commits then, all the libraries I need are already part of the environment file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also deleted the validation.py from Concrete

Copy link
Collaborator Author

@horstf horstf Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joergfunger For the tests, is there a graph that I can use to test my code? The usecases/MinimumWorkingExample/emodul/processed_data folder is empty but I would need an EM_Graph.ttl file or so to test the validation. I can also just create my own otherwise from the data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you talk to @PoNeYvIf on that? For the tests, it would also be possible (and maybe even better) to create your own very simple test cases (it does not have to be related to a very specific ontology, but rather a very general one such as foaf or prov). In particular make sure that there are errors existing such that those can be returned and processed. We could also have a short phone call on that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. if we know that we test for machine ids via shacl, we could get the machine IDs from the global KG that are referred to in the local KG. This would of course mean that every time we add a rule we have to think about the SPARQL query and the information that could be contained globally and not locally, i don't know if this is feasible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's try to create the queries, let me know when it's okay for you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I would always download the complete global graph. It would rather be necessary if particular instances that are referred to in the local graph (e.g. the testing machine that has already be created, or the mix_design we are referring to when adding a new Youngs modulus test) is existing. So for the specific tests, we would first generate the mix design, and upload it to the global KG. Then we would create a Youngs modulus test that references an instance of the mix design (without creating the same instance again). By uploading that then to the KG, these two data sets (mix design and Youngs modulus test) are automatically connected. And before uploading the Youngs modulus KG, we would have to check, if this instance of the mix design is already existing in the global KG (and potentially further properties apart from the id, but that is not necessary right now).

Copy link
Member

@joergfunger joergfunger Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the sparql query, we somehow know in the local KG generation what classes we expect. So the sparql query should return the ids of all instances of a given class - that should be quite general and would not require rewriting the sparql query each time we have a new test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have a talk together with @firmao to discuss the specifics?

@firmao
Copy link
Collaborator

firmao commented Oct 14, 2022 via email

@joergfunger
Copy link
Member

@horstf What is the status with this merge request? It seems that only the integration into the dodo file in minimum working example is missing.

@joergfunger joergfunger linked an issue Dec 5, 2022 that may be closed by this pull request
@joergfunger
Copy link
Member

@horstf The actions are not working, so that prevents me from merging the branch. In addition, I checked the other files. I guess there was a misunderstanding with the dodo.py, since you changed that in concrete. I guess we would have to remove this directory (it was already deprecated, but not removed, because it included all the old stuff). The actual dodo.py we are working with is in usecases/MinimumWorkingExample. Could you update that there as well. I guess the problem is that we do not have a working generation of the KG there yet, @AidaZt @soudehMasoudian @alFrie how are we going to handle this merge request? Or should @horstf integrate that in pydoit without actually including in the tasks, such that you could add that afterwards once the KG is up and running?

@horstf
Copy link
Collaborator Author

horstf commented Jan 10, 2023

Hi, is there any update on how i should proceed?

@joergfunger
Copy link
Member

Could we have a short teams meeting? Maybe that is easier

@horstf
Copy link
Collaborator Author

horstf commented Jan 10, 2023

Could we have a short teams meeting? Maybe that is easier

Yes, the date you suggested in your teams invitation works fine.

@horstf horstf closed this Jan 16, 2023
@horstf horstf deleted the validation_modularization branch January 16, 2023 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

shacl validation
4 participants