-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support casting to and from spark-like structs #1991
base: main
Are you sure you want to change the base?
Conversation
thanks!
😄 sorry could you elaborate please? |
Sure, sorry 😄 Ideally we would want to: def test_cast_struct(request: pytest.FixtureRequest, constructor: Constructor) -> None:
if any(
- backend in str(constructor) for backend in ("dask", "modin", "cudf", "pyspark")
+ backend in str(constructor) for backend in ("dask", "modin", "cudf")
): However pyspark converts the following input in a column of type data = {
"a": [
{"movie ": "Cars", "rating": 4.5},
{"movie ": "Toy Story", "rating": 4.9},
]
} and conversion via cast is not supported. I didn't have time today, but I can add a dedicated test for pyspark which initializes a dataframe with a column already of type Struct, but changes the Fields type. Do you think that would be enough as a test? (Here is the link to the above test) narwhals/tests/expr_and_series/cast_test.py Lines 238 to 240 in fd8ccac
|
sure thanks! |
I had already forgotten 🙈 pushed now! |
Great work! I had done something very similar on my side! For testing however, I had a slightly different strategy. Instead of creating a new test, I used the existing
As you can see, when the consutrctor is PySpark, we need to re-define the column However, I still had an issue when calling the last Have you seen the same thing when you run your test? |
Thanks @osoucy and I am sorry to hear we did duplicate work 🥲
Not really, locally I have no issue with your code as well - If you fancy sharing your github commit email I can add you as a co-author |
Here is my email: [email protected] In that case, it must be an issue with my specific environment python vs pyspark vs pyarrow version. I'm glad it's only me! |
The one used for commits should be something like:
We did some refactor + new features, let us know if you keep having problems with the env in the future 🤔 |
Co-authored-by: Olivier Soucy <[email protected]>
Reason
There are multiple reason for this PR to happen 😁
Schema.to_pyspark
nw.struct
emulatingpl.struct
.struct.unnest()
and/orFrame.unnest
What type of PR is this? (check all applicable)
Related issues
SparkLike
#1743Checklist
If you have comments or can explain your changes, please do so below
I am having a hard time testing this 🤔