-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String parsing and expression evaluation in json_map reader #252
base: master
Are you sure you want to change the base?
Conversation
Pull Request Test Coverage Report for Build 7990178235Details
💛 - Coveralls |
Hey Matthias! Thank you for your pull request. I roughly went over your changes. They look very nicely done. I'll check the details of the date parsing function later. It seems quite nice and clear already. I just wanted to ask you the motivation behind using eval for expressions. We don't use eval on purpose in most of the code. It can, in many cases, be a security threat. Especially on one of the use cases of this tool within web apps such as Nomad. If you would like to add some programmability in the mapping file, we will need to strongly sanitize the expressions before we pass them to eval. And almost no sanitation is good enough. I will instead suggest forming our own syntax around this like with the date parsing and only calling safe numpy functions from the code, like np.linspace. Allowing dynamic code with eval is very dangerous. |
Hey Sherjeel I understand that there may be security concerns. For my purposes, I just need linspace. I wrote the code more general so that it could accommodate more use cases and keep the syntax simple. It is up to you as the maintainer to decide how general or specific the syntax should be. I'm happy to change the code according to your suggestions. |
Hello @mmpsi, thank you for your PR. I think these are all very nice suggestions to incorporate into the reader. I agree with @sherjeelshabih that we should avoid using Actually, we are currently redesigning our generic readers, which includes the json map reader, into one which supports and aligns our use cases. The idea is also that we have a config file with special notations which allows conveniently reading data from another source (see https://github.com/FAIRmat-NFDI/pynxtools-mpes/blob/main/tests/data/config_file.json for an example). The change is handled in #250. I will also add formula parsing and your string parsing routines there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed the date part of the code.
I would say let us get the generic reader a bit more formalized before you refactor the linspace functionality. It should be quite straightforward though. I just don't want you to refactor it twice when we already are shaping out the generic reader.
try: | ||
value = parse_opts["parse_string"] | ||
if is_path(value): | ||
value = get_val_nested_keystring_from_dict(value[1:], data) | ||
except (KeyError, TypeError): | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the cases here will result in an exception and hit continue. It will be better to replace this try except block with an if statement. It should perform better in this case.
try: | |
value = parse_opts["parse_string"] | |
if is_path(value): | |
value = get_val_nested_keystring_from_dict(value[1:], data) | |
except (KeyError, TypeError): | |
continue | |
if "parse_string" not in parse_opts: | |
continue | |
value = parse_opts["parse_string"] | |
if is_path(value): | |
value = get_val_nested_keystring_from_dict(value[1:], data) | |
|
||
if "regexp" in parse_opts: | ||
match = re.match(parse_opts["regexp"], value) | ||
groups = match.groups('') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for consistency you can do this:
groups = match.groups('') | |
groups = match.groups("") |
I propose two extensions of the json_map reader to handle custom date-time formats and expression evaluation. These changes serve the following use cases:
The proposed code introduces additional syntax to the json mapping file. It is fully backward compatible with existing mappings. Documentation in the code and README is included.