Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Revise and rework user-guide/expressions #19360

Merged
merged 13 commits into from
Nov 8, 2024

Conversation

rodrigogiraoserrao
Copy link
Collaborator

@rodrigogiraoserrao rodrigogiraoserrao commented Oct 21, 2024

PLEASE do not review this PR by looking at the diff.
Checkout the PR and build the docs with mkdocs serve.

EDIT: See this comment of mine for outstanding work / possible improvements.

@rodrigogiraoserrao rodrigogiraoserrao changed the title docs: revise and rework user-guide/expressions docs: Revise and rework user-guide/expressions Oct 21, 2024
@github-actions github-actions bot added documentation Improvements or additions to documentation python Related to Python Polars rust Related to Rust Polars and removed title needs formatting labels Oct 21, 2024
Copy link

codecov bot commented Oct 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.71%. Comparing base (9c08893) to head (0359efa).
Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #19360      +/-   ##
==========================================
- Coverage   79.72%   79.71%   -0.01%     
==========================================
  Files        1542     1542              
  Lines      212197   212197              
  Branches     2449     2449              
==========================================
- Hits       169169   169156      -13     
- Misses      42473    42486      +13     
  Partials      555      555              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46
Copy link
Member

@stinodego could you review this one this week?

Copy link
Contributor

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the points below for some general feedback. Take these as suggestions, not law :)

  • Overview page: There are quite a lot of subchapters - perhaps we can group them in a sensible way at least in the overview (e.g. put all the namespaces (struct, string, ...) together, put all operation types together (window, folds, ...), you get the picture).
  • Basic operations: n_unique feels a bit out of place here - it would fit better in the Aggregation section.
  • Expression expansion: The naming sections feels out of place - it's good to mention suffix/prefix but otherwise it's not related to expression expansion.
  • Expression expansion: We shouldn't list the full selector API in the user guide - we have the API reference for this.
  • Feels like plugins should be its own chapter - not part of Expressions.
  • NumPy functions: feels like this shouldn't be under expressions, rather in an 'interop' chapter somewhere.
  • A lot of Rust examples seem to be missing still.
  • Nice that we can better display math now!

@rodrigogiraoserrao
Copy link
Collaborator Author

rodrigogiraoserrao commented Nov 5, 2024

Thanks a lot for all of the review comments.
I dismissed w/o comment all of those that I fixed in accordance with your suggestions.

Some things I'd like to push back on:

  • Basic operations: n_unique feels a bit out of place here - it would fit better in the Aggregation section.

n_unique is currently even more out of place in “Functions”. Now it's grouped with a couple of other similar functions n_unique, approx_n_unique, value_counts, in a generic section that just lists a couple of random functions to get users started. That's how I look at the new “Basic operations” section.

  • Expression expansion: The naming sections feels out of place - it's good to mention suffix/prefix but otherwise it's not related to expression expansion.

Strongly disagree. The namespace name is only relevant when you're using expression expansion. In all other cases you just use .alias, right? Or what am I missing here..?
In other words, you only ever need to use name because you're using expression expansion and I think it makes total sense to teach both together.

  • Expression expansion: We shouldn't list the full selector API in the user guide - we have the API reference for this.

Yeah, true, but in here I'm grouping the selectors by purpose whereas the API reference just lists everything in alphabetical order.
I thought this could be useful.
If you really think it's not / if others think it's a bad idea, I'll just delete those tables.


As for the missing Rust examples, I wanted my prose / examples reviewed before I put in the effort to translate everything to Rust.
I can write, refactor, delete, recreate, etc the Python code very easily and quickly, but not the Rust code 😆

@rodrigogiraoserrao
Copy link
Collaborator Author

rodrigogiraoserrao commented Nov 5, 2024

As far as the NumPy page goes, should the “Ecosystem” chapter be named “Interoperability” and include NumPy there?
(In which case the current “Ecosystem” page could be split up a bit.)

@stinodego
Copy link
Contributor

n_unique is currently even more out of place in “Functions”.

Well, it just doesn't really feel like a 'basic' operation. approx_n_unique even less so. They are aggregations, we and we have an Aggregations section. So that just makes sense to me. But it's up to you.

The namespace name is only relevant when you're using expression expansion. In all other cases you just use .alias, right? Or what am I missing here..?

I'd argue doing something like pl.col("HELLO").name.lowercase() is perfectly valid outside of expression expansion. Also the struct methods name.map_fields are fine to use on a single Struct column. But as I said, it's just a suggestion - feel free to follow your own instincts here.

Yeah, true, but in here I'm grouping the selectors by purpose whereas the API reference just lists everything in alphabetical order.

Hm, looking at the API reference, we can definitely improve the selectors section. But sure, let's keep the overview in the user guide for now at least!

As for the missing Rust examples, I wanted my prose / examples reviewed before I put in the effort to translate everything to Rust.

Yes, definitely. Those can be done in another PR / by someone else. I was just pointing it out.

As far as the NumPy page goes, should the “Ecosystem” chapter be named “Interoperability” and include NumPy there?
(In which case the current “Ecosystem” page could be split up a bit.)

I don't think the purpose of the Ecosystem chapter really aligns with listing various library-specific interoperability quirks. I would envision an actual Interoperability chapter where we list the considerations for interop with various formats, e.g. which datatypes can be zero-copy to/from NumPy, PyArrow, etc.

@rodrigogiraoserrao
Copy link
Collaborator Author

I'm marking this PR as ready now because it just keeps growing and this already introduces quite some useful improvements.

Outstanding work / possible future improvements:

  • Move “Expressions > NumPy” into a new Interop chapter (cf. this review comment and ensuing conversation)
  • Improve selectors API reference so that we can remove the grouped/tables reference in “Expressions > Expression expansion” (cf. this review comment and ensuing conversation)
  • Consider extracting the subsection on naming in “Expressions > Expression expansion” into its own section on the namespace name while linking from “Expressions > Expression expansion” directly there (cf. this review comment and ensuing conversation)
    • Add note specifying that the namespace name works outside of expression expansion
  • Revise “Expressions > Plugins” as instructions there are not clear / trivial to follow
    • Consider moving into own chapter + add guide page on IO plugins (as requested on Discord already)
  • Revise page on user-defined Python functions
  • Add missing Rust examples

@stinodego stinodego marked this pull request as ready for review November 7, 2024 15:46
Copy link
Contributor

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to merge if CI is green 👍

@stinodego stinodego merged commit 772d023 into pola-rs:main Nov 8, 2024
29 checks passed
@rodrigogiraoserrao rodrigogiraoserrao deleted the docs-revise-expressions branch November 8, 2024 10:53
@rodrigogiraoserrao rodrigogiraoserrao self-assigned this Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation python Related to Python Polars rust Related to Rust Polars
Projects
None yet
3 participants