-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add all
, any
and null_count
Spark Expressions
#1724
feat: add all
, any
and null_count
Spark Expressions
#1724
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for looking into this! 🙏
Hey @EdAbati , Took an initial swing at implementing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating! :)
I left another couple of small comments
IMO we can just merge all
, any
and null_count
here and worry about the rest in a follow up
narwhals/_spark_like/expr.py
Outdated
|
||
return self._from_call(_null_count, "null_count", returns_scalar=True) | ||
|
||
def replace_strict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am tempted to say that this should not be implemented for now and just raise a NotImplementedError
. (as we do in Dask)
We would need to be able to access the dataframe (and collect the results) to get the distinct values of the column.
@FBruzzesi and @MarcoGorelli any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am tempted to say that this should not be implemented for now and just raise a NotImplementedError. (as we do in Dask)
Sure we can evaluate if and how to support replace_strict
later on. Super excited to ship the rest for now 🙌🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree!
Co-authored-by: Edoardo Abati <[email protected]>
Co-authored-by: Edoardo Abati <[email protected]>
Thanks everyone - committed the changes above. Lmk if anything else needs to be changed |
awesome, thanks! just looks like there's some merge conflicts |
Might've committed some git crimes resolving that merge conflict :') apologies in advance |
result = df.select(nw.all().any()) | ||
result = df.select(nw.col("a", "b", "c").any()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli pyspark fails to maintain the original name with nw.all().<expression_method>
. @EdAbati maybe you know why and where this is happening already.
Here it would result in:
FAILED tests/expr_and_series/any_all_test.py::test_any_all[pyspark] - AssertionError: Expected column name a at index 0, found bool_or(a)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's ship this🚀 ! Thanks @lucas-nelson-uiuc and apologies for such a long back and forth on this PR
all
, any
and null_count
Spark Expressions
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
Added the following methods to
SparkLikeExpr
andSparkLikeNamespace
:any
all
null_count
any_horizontal
Copied respective tests over - couldn't run them without Java on my machine but running them locally on their respective test datasets worked for me.
Let me know if anything needs to be updated!