Verification #133

domna · 2023-06-22T14:08:07Z

This is a simple verification. I just load everything from a provided file and write it into a dict. There are still several issues:

domna · 2023-06-22T15:28:59Z

This nyaml entry still gives an error SOLUTE(NXsample_component).
The template verification demands the path

ValueError: The data entry corresponding to /ENTRY[entry]/SAMPLE[sample]/SOLUTE/solvent 
is required and hasn't been supplied by the reader.

while the script here adds this key:

"/ENTRY[entry]/SAMPLE[sample]/SAMPLE_COMPONENT[solute2]/solvent": "solvent1"

Which one of these should it be: SOLUTE[solute2], SAMPLE_COMPONENT[solute2] or SOLUTE?

domna · 2023-06-22T15:33:14Z

@sherjeelshabih @sanbrock What do you think of this approach? It is roughly working already (except for the error in the previous comment but I'm actually not sure if this is a general bug in the template generation, @sherjeelshabih ?) and IMO it would be helpful to have a verification tool for nexus files. If you think this is a good path I would continue adding more verification features also in other parts of the code (i.e., the points addressed above) and tests.

sherjeelshabih · 2023-06-23T09:54:41Z

This nyaml entry still gives an error SOLUTE(NXsample_component). The template verification demands the path
ValueError: The data entry corresponding to /ENTRY[entry]/SAMPLE[sample]/SOLUTE/solvent 
is required and hasn't been supplied by the reader.
while the script here adds this key:
"/ENTRY[entry]/SAMPLE[sample]/SAMPLE_COMPONENT[solute2]/solvent": "solvent1"
Which one of these should it be: SOLUTE[solute2], SAMPLE_COMPONENT[solute2] or SOLUTE?

I think SOLUTE[solute2] might work. The parts outside the brackets are used to look for the NXDL node. So the function get_node_at_nxdl_path receives a concatenation of what is outside the brackets. You can check if that function finds this NXDL path correctly: /ENTRY/SAMPLE/SOLUTE/solvent. I keep forgetting the exact caps/no caps format that goes in there. But as far as the converter sees this, the outside will be used to get the nxdl node for nexus info it neds. Then it just writes whatever you have in the [brackets] to the hdf5 file.

sherjeelshabih · 2023-06-23T10:06:48Z

@sherjeelshabih @sanbrock What do you think of this approach? It is roughly working already (except for the error in the previous comment but I'm actually not sure if this is a general bug in the template generation, @sherjeelshabih ?) and IMO it would be helpful to have a verification tool for nexus files. If you think this is a good path I would continue adding more verification features also in other parts of the code (i.e., the points addressed above) and tests.

I really like this approach. I already see that you are using the same validation routine used while converting. This will help improve verification there too then. That's awesome.

I think the not loading of large datasets should have a nice wrapper to allow readers, from the other side, to also somehow tell this verfication routine to not load large sets until it actually writes the hdf5. Then for writing we can have a streamIO object or whatever one calls it. For the verification of hdf5 files this will be easier as we can rely on the h5py's lazyload functionalities. Maybe for this PR we can focus on the simple hdf5 case and just have this in mind.

I think shape and dim aren't checked but types were being checked. Or you mean the types don't deal with what comes out of h5py.dtype?

pynxtools/dataconverter/verify.py

domna · 2023-06-23T20:56:13Z

I think SOLUTE[solute2] might work. The parts outside the brackets are used to look for the NXDL node. So the function get_node_at_nxdl_path receives a concatenation of what is outside the brackets. You can check if that function finds this NXDL path correctly: /ENTRY/SAMPLE/SOLUTE/solvent. I keep forgetting the exact caps/no caps format that goes in there. But as far as the converter sees this, the outside will be used to get the nxdl node for nexus info it neds. Then it just writes whatever you have in the [brackets] to the hdf5 file.

ok I will just check what works. SOLUTE[solute2] will be a bit trickier to implement in the reverse way as I'm currently just reading the NX_class attribute for this. I would need to get the actual group name from the nxdl to resolve this from the file. Maybe it would be helpful to allow SAMPLE_COMPONENT[solute2] for a group named SOLUTE in the validation step and just check if the actual naming is correct for the upper/lower combinations (e.g. something like SOLUTE_fixedpart should just allow names like test1_fixedpart, test2_fixedpart etc).

domna · 2023-06-23T21:07:11Z

@sherjeelshabih @sanbrock What do you think of this approach? It is roughly working already (except for the error in the previous comment but I'm actually not sure if this is a general bug in the template generation, @sherjeelshabih ?) and IMO it would be helpful to have a verification tool for nexus files. If you think this is a good path I would continue adding more verification features also in other parts of the code (i.e., the points addressed above) and tests.

I really like this approach. I already see that you are using the same validation routine used while converting. This will help improve verification there too then. That's awesome.

Yes this is why I started this because I felt most of the parts for validation are already there. I just did not have an easy way to use this from the command line.

I think the not loading of large datasets should have a nice wrapper to allow readers, from the other side, to also somehow tell this verfication routine to not load large sets until it actually writes the hdf5. Then for writing we can have a streamIO object or whatever one calls it. For the verification of hdf5 files this will be easier as we can rely on the h5py's lazyload functionalities. Maybe for this PR we can focus on the simple hdf5 case and just have this in mind.

Yes sounds good 👍️This would also help building this for verification as we could just build a small class to pass instead of the actual data and let the verification routine check its contents against the nxdl.

I think shape and dim aren't checked but types were being checked. Or you mean the types don't deal with what comes out of h5py.dtype?

I did not actually check the details of what is being checked and what not by the code already (the points are also a somehow chaotic collection of the points I did not want to forget to address here 😅)

coveralls · 2023-07-04T15:16:59Z

Pull Request Test Coverage Report for Build 8845625971

Details

210 of 239 (87.87%) changed or added relevant lines in 7 files are covered.
5 unchanged lines in 4 files lost coverage.
Overall coverage increased (+0.2%) to 78.534%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pynxtools/nexus/nexus.py	0	1	0.0%
tests/dataconverter/test_helpers.py	32	34	94.12%
pynxtools/dataconverter/helpers.py	166	192	86.46%

Files with Coverage Reduction	New Missed Lines	%
pynxtools/definitions/dev_tools/utils/nxdl_utils.py	1	74.23%
pynxtools/dataconverter/convert.py	1	75.69%
pynxtools/dataconverter/helpers.py	1	88.85%
pynxtools/dataconverter/template.py	2	86.05%

Totals
Change from base Build 8833244405:	0.2%
Covered Lines:	2817
Relevant Lines:	3587

💛 - Coveralls

…tion

domna · 2024-06-10T07:24:11Z

This is superseeded by #333, which includes everything of this PR but uses the new validation algorithm.

domna added 6 commits June 14, 2023 11:57

Adds verification cli

f44b925

Simple working verification

4739463

Don't replace non-variadic group names

de27590

Happyfy linting

712dbd4

Adds support for bytes NX_class attributes

0ae9671

Autoformatting

09d3b4b

domna marked this pull request as draft June 22, 2023 15:33

sherjeelshabih reviewed Jun 23, 2023

View reviewed changes

pynxtools/dataconverter/verify.py Outdated Show resolved Hide resolved

Cleanup

43c65a9

domna added 6 commits July 5, 2023 12:48

Adds nexus unit registry

fcd1c43

Fixes linting

4200c71

Sets defs to latest fairmat

16b070f

Adds basic unit check

1671a13

Check general validity of units

20efda4

Resolve also parents for units

cca87bc

domna mentioned this pull request Oct 24, 2023

Allow report of missing fields #167

Open

domna added 7 commits February 5, 2024 15:41

Merge commit '0c69581b014d0ef7a65e54e9cc8a2e25916c26c8' into verifica…

9dfbd03

…tion

autoformat

5c2dd4e

Merge commit '8bd900e8c520dacc67ef7b644d29dba1d5fe221e' into verifica…

2f520cc

…tion

Adds missing import

529331f

Merge branch 'master' into verification

7c95311

Merge branch 'master' into verification

33cb7fd

Update to latest definitions

13e2670

domna added 9 commits April 19, 2024 17:28

Fix tests

10b1c44

Use if checks instead of try..except

d4dc235

Add routine to check required fields for repeating groups

1c68848

Delete temporary file

feb973e

Fix path in data dict test

17eb061

Fix tests

d83a6b8

Cleanup

b3a0f1b

Remove debugging line

0c9a0f4

Add collector class

dcb4d9b

domna mentioned this pull request Apr 24, 2024

Correct validation for required fields in variadic groups #314

Merged

domna added 16 commits April 24, 2024 15:52

Remove commented lines

56bf3a6

Check validation return type and logging

a0ae259

Add tests for repeating groups

46f122b

Fix report of variadic groups set to all None

2b0144f

Merge branch 'master' into verification

9e29d5d

Merge branch 'fix-required-under-optional' into verification

59106f2

Add validity report at the end

617d86f

Add validation logging for units

bd98fad

Fixes undocumented units and reporting of all none required groups

2529789

Use dict paths everywhere

f7a64db

Add pint to dependencies

59b1798

Catch and log undefined units

e6dad7c

Add unit checks for nx transformations

7734256

Log wrong transformation_type

b55ebcc

Merge branch 'master' into verification

1261e6a

Renaming

485ffd9

domna mentioned this pull request May 17, 2024

Validation based on hdf tree traversal #333

Open

2 tasks

domna mentioned this pull request Jun 3, 2024

Bugs in NXmpes definition FAIRmat-NFDI/nexus_definitions#230

Closed

domna closed this Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verification #133

Verification #133

domna commented Jun 22, 2023 •

edited by RubelMozumder

Loading

domna commented Jun 22, 2023

domna commented Jun 22, 2023

sherjeelshabih commented Jun 23, 2023

sherjeelshabih commented Jun 23, 2023

domna commented Jun 23, 2023

domna commented Jun 23, 2023

coveralls commented Jul 4, 2023 •

edited

Loading

domna commented Jun 10, 2024

Verification #133

Verification #133

Conversation

domna commented Jun 22, 2023 • edited by RubelMozumder Loading

domna commented Jun 22, 2023

domna commented Jun 22, 2023

sherjeelshabih commented Jun 23, 2023

sherjeelshabih commented Jun 23, 2023

domna commented Jun 23, 2023

domna commented Jun 23, 2023

coveralls commented Jul 4, 2023 • edited Loading

Pull Request Test Coverage Report for Build 8845625971

Details

💛 - Coveralls

domna commented Jun 10, 2024

domna commented Jun 22, 2023 •

edited by RubelMozumder

Loading

coveralls commented Jul 4, 2023 •

edited

Loading