Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add area_limit to `fillna #60161

Open
2 of 3 tasks
joshdunnlime opened this issue Oct 31, 2024 · 3 comments
Open
2 of 3 tasks

ENH: Add area_limit to `fillna #60161

joshdunnlime opened this issue Oct 31, 2024 · 3 comments
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action

Comments

@joshdunnlime
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

The pandas methods interpolate, ffill and bfill all have the area_limit options, however, fillna does not. It would be nice to add this.

Feature Description

DataFrame.fillna(value=None, *, method=None, axis=None, inplace=False, limit=None, area_limit=None, downcast=<no_default>)

Alternative Solutions

Interpolate with method='constant'. The somewhat obvious downsides to this are that constant isn't included in the scipy interpolation API.

Additional Context

See #56492 for this functionality added to ffill and bfill. It would be nice to have better API consistency between thee methods and also interpolate.

@joshdunnlime joshdunnlime added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 31, 2024
@rhshadrach
Copy link
Member

rhshadrach commented Oct 31, 2024

interpolate, ffill, and bfill all fill values using values near the given location. With the method argument being deprecated, fillna does not. It doesn't seem appropriate to have limit_area because fillna does not work with nearby values.

@rhshadrach rhshadrach added Needs Info Clarification about behavior needed to assess issue Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 31, 2024
@joshdunnlime
Copy link
Author

I get that it doesn't fill with nearby values but I don't see why it couldn't fill missing values with some knowledge of the rows around it? After all it has the limit kwarg.

What about my other suggestion of having a constant term on interpolate? (I still think it makes more sense to have limit_area on fillna as its a very valid use case as shown by ffill etc)

@rhshadrach
Copy link
Member

I get that it doesn't fill with nearby values but I don't see why it couldn't fill missing values with some knowledge of the rows around it?

I'm not saying it couldn't. But this increases the scope of the function which I think is undesirable.

What about my other suggestion of having a constant term on interpolate?

I don't think that fits the definition of "interpolate", so I find this undesirable from an API design perspective.

One can implement this behavior as follows:

df = pd.DataFrame({"a": [np.nan, 1, np.nan, 2], "b": [1, np.nan, 2, np.nan]})
isna = df.isna()

# inside
mask = isna & (~isna).cummax() & (~isna).loc[::-1].cummax()
print(df.mask(mask, 5.0))
#      a    b
# 0  NaN  1.0
# 1  1.0  5.0
# 2  5.0  2.0
# 3  2.0  NaN

# outside
mask = isna & (isna.cummin() | isna.loc[::-1].cummin())
print(df.mask(mask, 5.0))
#      a    b
# 0  5.0  1.0
# 1  1.0  NaN
# 2  NaN  2.0
# 3  2.0  5.0

Perhaps this is too technical to be expected from users though. cc @pandas-dev/pandas-core for any thoughts.

@rhshadrach rhshadrach added Needs Discussion Requires discussion from core team before further action and removed Needs Info Clarification about behavior needed to assess issue labels Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

2 participants