Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve search for fields when they are not directly in the appdef #386

Open
domna opened this issue Jul 18, 2024 · 2 comments
Open

Improve search for fields when they are not directly in the appdef #386

domna opened this issue Jul 18, 2024 · 2 comments

Comments

@domna
Copy link
Collaborator

domna commented Jul 18, 2024

When we try to found out if a key is documented in any of the parent appdefs or base classes we check all children names and then try to namefit the best match. However, this part does not into account probable NX classes provided with the dict. get_all_direct_children_names has an nx_class parameter now which can be used.

A second problem is that concept may be shielded. We had this for NXmpes/NXsample/temperature(NXenvironment) and NXsample/temperature(NX_FLOAT), which in principle should both be visible but get_all_direct_children_names is just a set with names so it only shows the one closest to the appdef.

This is the code part which needs to be updated:

children = node.get_all_direct_children_names()
best_name = best_namefit_of(name, children)

Probably it's best to search through the inheritance chain with lxml directly, instead of using get_all_direct_children_names in combination with search_add_child_for, which just adds the node based on name. With lxml the node should be searched, namefitted and the NexusNode shall be created with add_node_from, which takes an xml element directly (hence no ambiguity). It is a similar technique to what I employed for the sibling namefitting:

def _check_sibling_namefit(self):

cc @RubelMozumder @mkuehbach (as new maintainers of the validation part)

@rettigl
Copy link
Collaborator

rettigl commented Jul 18, 2024

I generally don't understand how such an algorithm should find out the intended match. See e.g. DATA and AXISNAME in NXdata. As soon as there are more than one unnamed Field/Group, the match cannot be unique, no? As this can lead to wrong behavior then, I would suggest to not do any automatic concept fitting, but always require naming the concept for unnamed fields.
Otherwise, how can we catch e.g. a typo of a named field?
I noticed this as after renaming "bias" to "bias_env" in NXsample of NXmpes, there was no complaint for the old "bias" field, but it was namefitted as an NXbeam, which does not make any sense. I propose this has to raise a warning instead.
Also, this wrongly namefitted group did not have any NXclass attribute.

@domna
Copy link
Collaborator Author

domna commented Jul 18, 2024

I generally don't understand how such an algorithm should find out the intended match.

I agree in most cases this is problematic and this should also be reflected in the changes to the parts above.

See e.g. DATA and AXISNAME in NXdata.

This ones are even more special as we do not even have an NXclass attribute available. However, there is a special treatment for NXdata which associates this concepts based on the @signal and @axes attributes. Generally, resolving fields associated to ambiguous uppercase groups is not really possible in nexus (if we don't have the special treatment as for NXdata). We could resolve this on the writing side by using an uppercase notation there, too. This would, however, create problems on the reading side as the reader would not have the same information available and hence the association of the concept will be arbitrary (i.e. based on the reader implementation - but there is no "right" way to figure it out).

Also, this wrongly namefitted group did not have any NXclass attribute.

For this we should remove that the algorithm tries to namefit if it doesn't find a fitting exact name and only allow association based on uppercase parts in the path. It eventually turned out that is a design flaw that the algorithm even tries (as there is no benefit, because we almost always have multiple upper case concepts we could fit to).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants