-
Notifications
You must be signed in to change notification settings - Fork 21
address issue with __dataframe_namespace__
and non-compliant dataframes
#169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you expand on this? This sounds like the exact opposite of what I expect to be actually implemented in the near term. |
Hmm, I'm not sure why? I think looking at @MarcoGorelli's Seaborn POC (MarcoGorelli/seaborn#1) shows that internally, the API standard methods are used. You kinda have to do that, otherwise how is it going to work with multiple dataframe libraries? So it's something like: # in a library that consumes dataframes:
def is_dataframe_api_obj(x):
return hasattr(x, '__dataframe_namespace__')
def func1(df):
if is_dataframe_api_obj(df):
pdx, dfx = df.__dataframe_namespace__()
else:
# do whatever the library did pre-standard here if we can't convert, or raise?
# a shim library a la array_api_compat should help to ensure we don't
# really need a second code path here for, say pandas 1.5.0
return _func2(dfx).dataframe # same df type as input
def _func2(dfx):
"internal function, use only API standard methods on dfx"
....
return dfx_out |
The tail end of the OP reads to me like it makes a difference whether pd.DataFrame is built on top of pd.consortium.DataFrame or the other way around. I don't think it should make a difference for seaborn. |
A few thoughts:
|
I think all you need is something like df_compliant = df.__dataframe_standard__() Then, if you want the namespace, you can do namespace = df_compliant.__dataframe_namespace__() and then use it as namespace.concat([df_compliant, other_df_compliant]) EDIT: the suggestion in the original post seems fine too, I just think this might be a bit cleaner than returning a tuple. Both are fine though |
Any thoughts on the above? I'd suggest:
|
dataframe_standard->dataframe_consortium |
let's go further - how about naming aside, OK for separating between EDIT: having slept on it, I do like |
Agreed.
I agree with this too - and that is already documented indeed: https://data-apis.org/dataframe-api/draft/purpose_and_scope.html#checking-a-dataframe-object-for-compliance.
I think that reads a little strangely. It's the name of an organization, and a pretty long/cumbersome name at that (I list my affiliation as |
Sure, shall we go back to that then? @jbrockmendel what's your objection to dataframe_standard? I don't really mind, hardly anyone would see this method anyway, just hoping we can agree on something and document it |
"standard" implies everything else is non-standard, which has normative connotations. |
The whole spec is called dataframe API standard though? It's just what the thing is? I don't understand the reasoning, but do you like |
never mind, i dont care that much |
The
__dataframe_namespace__
method is currently modeled on the corresponding method in the array API. However, we're in a slightly different situation here, with a separate developer-facing API. There is a need to convert to/from theDataFrame
and namespace from a non-standard-compliant user facing dataframe object. We already have the reverse, but this bit is missing - see #156 (comment).Probably the pattern should be:
when going from public API in a df-using library to the standard-using internals. Inside the internals there should then be only usage of the standard-compliant dataframe ideally.
The text was updated successfully, but these errors were encountered: