You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I tried to use the "filters" flag of the read_bufr function to filter out NaN values.
My filter was a very simple lambda function: filter = lambda x : pandas.notna(x)
When I used it to get rid of missing data of a single parameter, it worked fine. But as I took many parameters, the returned pandas DataFrame shrunk and did not contain the desired data anymore, or it was even empty.
The problem for me is that without filtering I get a quite big DataFrame with many missing values which I have to get rid of afterwards. I've noticed that a lot of columns actually just contain NaN values.
Describe the solution you'd like
It would be nice to have the option to connect conditions with logical OR instead. Maybe that could already solve my problem.
Describe alternatives you've considered
Another solution I can imagine is having the option to use the equivalent of "df.loc[:, parameter].notna().any()" on each column (parameter) before returning the DataFrame. If this condition returns True for a column, i.e., it only consists of missing values, the column gets dropped.
Ideally, this would be done before the DataFrame is created internally.
Additional context
My solution for now is that I call df.dropna(how="all") on both axis after I've created the DataFrame. But this is not a very efficient way to do it, especially for large amount of data.
Organisation
Meteo Service weather research
The text was updated successfully, but these errors were encountered:
Oh, thanks! I overlooked that... Yes, that is exactly what I meant. I would be really happy to see such a feature in this great piece of software in future.
Keep up the good work! Best regards
Is your feature request related to a problem? Please describe.
I tried to use the "filters" flag of the read_bufr function to filter out NaN values.
My filter was a very simple lambda function: filter = lambda x : pandas.notna(x)
When I used it to get rid of missing data of a single parameter, it worked fine. But as I took many parameters, the returned pandas DataFrame shrunk and did not contain the desired data anymore, or it was even empty.
I suspect that this is due to the nature of the filter conditions. In the documentation, you mention that they are connected with logical AND: https://pdbufr.readthedocs.io/en/latest/read_bufr.html#combining-conditions
The problem for me is that without filtering I get a quite big DataFrame with many missing values which I have to get rid of afterwards. I've noticed that a lot of columns actually just contain NaN values.
Describe the solution you'd like
It would be nice to have the option to connect conditions with logical OR instead. Maybe that could already solve my problem.
Describe alternatives you've considered
Another solution I can imagine is having the option to use the equivalent of "df.loc[:, parameter].notna().any()" on each column (parameter) before returning the DataFrame. If this condition returns True for a column, i.e., it only consists of missing values, the column gets dropped.
Ideally, this would be done before the DataFrame is created internally.
Additional context
My solution for now is that I call df.dropna(how="all") on both axis after I've created the DataFrame. But this is not a very efficient way to do it, especially for large amount of data.
Organisation
Meteo Service weather research
The text was updated successfully, but these errors were encountered: