-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: No error message when no value calculated for HSE #402
Comments
We could add something NaN-like in such cases, doing a check before storing both node and edge features. We can add a warning message in the exposure.py script as well. @DaniBodor we'll discuss who will pick this up next week |
Maybe we can look for a way to check for NaN/missing values systematically across all features after generating the graph, and output an error with missing features and/or an option to set such values to 0. |
There are several opinions about how to set nan values, and it depends a lot on the feature (e.g. different values for different features) so I wouldn't enforce any default. I would say is up to the user to decide how to fill them up. Integrating this in the code base giving at the same time great flexibility about which value to fill in for each feature, without breaking anything and doing it properly, I think is not trivial at all. Also, we need to think about a way that doesn't increase the overhead too much, that's why I would do the check before writing the features to the hdf5 files. We could also just add a nan count on each histogram, or create a dict during the graphs generation and at the end print out how many nans are present in each feature (so something the user can have access to and can notice). Together with this, we can improve the warnings in the feature modules in such cases. |
Good point about defaulting NaNs. I still think it would be a good idea to have a default check for NaNs during graph creation (maybe after each feature module is called or something) and before hdf5 is created, so that for future/custom feature modules, if it is not handled within the module, there is still a default error message that makes it clear what the problem is/where it happened. |
This issue is stale because it has been open for 30 days with no activity. |
This issue is stale because it has been open for 30 days with no activity. |
@gcroci2 , has this been addressed/solved yet? |
Nope. We can add a check for |
This issue is stale because it has been open for 30 days with no activity. |
Describe the bug
When generating and saving graphs, made from a sample set of pdbs containing created micro-envirnonments from pMHC structures, to a hdf5 file the following error occured for me:
This was not a very clear error message to figure out what was actualy going wrong. After adding some print statements myself it turned out that this was caused because in the following steps from the graph.py script:
Because for some of the pdb files no HSE could be calculated, this give something that is a None or empty value that can/is not (be) added, causing a discrapancy in node features between the graphs resulting in the error shown above.
It would be nice that if the HSE feature, or any of the other features, is not calculated or can not be calculated a clear error message that indicates which feature runs into this problem, so that the use can easily determine which feature was the problem without having to spit through the process themself.
If you want to reproduce this error, I have an example file of a pdb that works:
1AKJ_1_ILE.txt
And a file that runs into this error:
1AKJ_2_LEU.txt
Running the second file with:
Should reproduce the error if wanted. The main "issue" is mainly the lack of error message from the exposure.py script where no message/error was given to indicate that the problem lay there.
EDITED by @DaniBodor to fix the code blocks.
The text was updated successfully, but these errors were encountered: