Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to filter out all NaN values #65

Open
sferics opened this issue Aug 9, 2023 · 2 comments
Open

Make it possible to filter out all NaN values #65

sferics opened this issue Aug 9, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@sferics
Copy link

sferics commented Aug 9, 2023

Is your feature request related to a problem? Please describe.

I tried to use the "filters" flag of the read_bufr function to filter out NaN values.
My filter was a very simple lambda function: filter = lambda x : pandas.notna(x)

When I used it to get rid of missing data of a single parameter, it worked fine. But as I took many parameters, the returned pandas DataFrame shrunk and did not contain the desired data anymore, or it was even empty.

I suspect that this is due to the nature of the filter conditions. In the documentation, you mention that they are connected with logical AND: https://pdbufr.readthedocs.io/en/latest/read_bufr.html#combining-conditions

The problem for me is that without filtering I get a quite big DataFrame with many missing values which I have to get rid of afterwards. I've noticed that a lot of columns actually just contain NaN values.

Describe the solution you'd like

It would be nice to have the option to connect conditions with logical OR instead. Maybe that could already solve my problem.

Describe alternatives you've considered

Another solution I can imagine is having the option to use the equivalent of "df.loc[:, parameter].notna().any()" on each column (parameter) before returning the DataFrame. If this condition returns True for a column, i.e., it only consists of missing values, the column gets dropped.

Ideally, this would be done before the DataFrame is created internally.

Additional context

My solution for now is that I call df.dropna(how="all") on both axis after I've created the DataFrame. But this is not a very efficient way to do it, especially for large amount of data.

Organisation

Meteo Service weather research

@sferics sferics added the enhancement New feature or request label Aug 9, 2023
@sandorkertesz
Copy link
Collaborator

Please see #58

@sferics
Copy link
Author

sferics commented Aug 9, 2023

Oh, thanks! I overlooked that... Yes, that is exactly what I meant. I would be really happy to see such a feature in this great piece of software in future.
Keep up the good work! Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants