-
Notifications
You must be signed in to change notification settings - Fork 21
How to convert Series to compliant Column? #190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
One alternative would be to pass a namespace to the function: def calculate_std_of_series(ser: Any, namespace):
ser_compliant = namespace.dataframe_from_dict({'col': ser}).get_column_by_name('col')
return ser_compliant.std(correction=1) but this feel unnecessarily complicated (not to mention that the particular |
Didn't we just add It's hard to say more without more context; it sounds like this is a private function? And is there a need for backwards compatibility with it? Your proposed Perhaps we should talk through how |
|
it doesn't, the return value here is just a single value: >>> ser = pd.Series([1,2,3])
>>> ser.mean()
2.0
>>> ser.rename('col').to_frame().__dataframe_standard__().get_column_by_name('col').mean()
2.0 We already have a way to go from non-compliant DataFrame to compliant DataFrame: |
To clarify: we have that as an example on how to go about adoption, and not in the standard itself.
That's fine to add in the purpose and scope section I'd say, same as for dataframes. And then in I think you can just implement it there and see how it works? It's not a standardization question. |
But there is no equivalent |
I don't think it's just an example: if a library has a compliant namespace, then dataframe-api/protocol/purpose_and_scope.md Lines 53 to 67 in 04650ba
For now I'll implement it the same way that I've implemented
(this is good enough for testing, but longer term we would upstream |
Oh wait, I was going by Marco's statement and a quick grep. It's actually in the protocol "purpose and scope" section, not in the API standard one:
So there's no analogy at all here, the API standard has nothing for a dataframe either? And to go to a standard object, the answer so far is "use a constructor function". |
That's an example, not a required method. |
Strictly speaking it's indeed not a requirement, as we can't "require" something in third-party libraries that are not adhering to a specification. But it's still the only way for those third-party libraries to opt-in to supporting the standard through a separate object. In my mind we discussed that as essentially a requirement, though, at #169 (otherwise there is no way you can know if some object supports the standard, assuming that users will use a pandas/polars/modin/ibis/... dataframe that does not itself supports the standard, but only supports converting itself into one) |
I just noticed the same (looking for that example), but shouldn't we move that to the spec docs?(or have it in both places) |
If Let's take the example from the blog post: def remove_outliers(df, column: str):
z_score = (df[column] - df[column].mean())/df[column].std()
return df[z_score.between(-3, 3)] Could you please write out exactly how to make this function library-agnostic without using |
If it's not required, then we need to align on this as soon possible as it looks like there's been some serious miscommunication |
FWIW this blog post doesn't appear at https://data-apis.org/blog/ or is not linked on the home page, so it's not easy to find (unless you know it linked from some other post) |
Thanks, gh-169 does clarify this. We've indeed agreed on it, but we have simply forgotten to document that agreement. So I guess we need to do that, and then having the analogous method for Column seems pretty obvious (I hope it'd be called |
agree on I'll document this better then, but to summarise:
|
Do we need a separate Other option for |
sure, we can rename later, for now I'm just trying to plug holes in what we've got so that the standard can be useful |
Thanks, that list LGTM! If we had that bullet list in a single place, that would really clarify things. |
Just to clarify - the current DataFrame constructor Just pointing this out double-check we're all on the same page here, I hope I'm not coming across as confrontational 😄 Thanks all for discussing |
Good point, I suspect that we may return to that point again and will want a direct Column constructor at some point.
not at all! |
This issue also mentioned a "column_namespace" dunder, which isn't resolved yet? |
isn't it? dataframe-api/spec/API_specification/dataframe_api/column_object.py Lines 23 to 25 in 2c94312
|
Say I have a function which takes a pandas Series:
I'd like to convert this to support any standard-compliant library. How can I do that?
I'd like to be able to do something like
Currently, I can't do that, because we don't have
__series_standard__
.Are we OK to require the libraries implementing the Standard have a
__series_standard__
method in whichever object they use to back theirColumn
objects?Context of why this is necessary: in plotly, it's possible to pass a Series to
'x'
/'y'
/ etc:I expect there'll initially be pushback (just something I've come to expect 😄 ) - if you do disagree that we should do have
__series_standard__
, could you please suggest an alternative for writing library-agnostic functions which operate on Series?And if the answer is that we don't want to support that, then could you please suggest how plotly could support
in a library-agnostic manner?
The text was updated successfully, but these errors were encountered: