Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pref(rust!, python): Unify sort with SortOptions and SortMultipleOptions #15590

Merged
merged 40 commits into from
Apr 12, 2024

Conversation

CanglongCl
Copy link
Contributor

@CanglongCl CanglongCl commented Apr 11, 2024

Unify all sort, arg_sort and related functions in rust. Sort by single series functions use SortOptions as options whereas sort by multi series functions use SortMultipleOptions. Add related options for python side.

Closes #9820, closes #10365.

SortOptions and SortMultipleOptions

SortOptions

pub struct SortOptions {
    /// If true sort in descending order.
    /// Default `false`.
    pub descending: bool,
    /// Whether place null values last.
    /// Default `false`.
    pub nulls_last: bool,
    /// If true sort in multiple threads.
    /// Default `true`.
    pub multithreaded: bool,
    /// If true maintain the order of equal elements.
    /// Default `false`.
    pub maintain_order: bool,
}

Example:

let s = Series::new("a", [Some(5), Some(2), Some(3), Some(4), None].as_ref());
let sorted = s
    .sort(
        SortOptions::default()
            .with_order_descending(true)
            .with_nulls_last(true)
            .with_multithreaded(false),
    )
    .unwrap();
assert_eq!(
    sorted,
    Series::new("a", [Some(5), Some(4), Some(3), Some(2), None].as_ref())
);

SortMultipleOptions

pub struct SortMultipleOptions {
    /// Order of the columns. Default all `false``.
    pub descending: Vec<bool>,
    /// Whether place null values last. Default `false`.
    pub nulls_last: bool,
    /// Whether sort in multiple threads. Default `true`.
    pub multithreaded: bool,
    /// Whether maintain the order of equal elements. Default `false`.
    pub maintain_order: bool,
}

Example:

let df = df! {
    "a" => [Some(1), Some(2), None, Some(4), None],
    "b" => [Some(5), None, Some(3), Some(2), Some(1)]
}?;

let out = df
    .sort(
        ["a", "b"],
        SortMultipleOptions::default()
            .with_maintain_order(true)
            .with_multithreaded(false)
            .with_order_descendings([false, true])
            .with_nulls_last(true),
    )?;

let expected = df! {
    "a" => [Some(1), Some(2), Some(4), None, None],
    "b" => [Some(5), None, Some(2), Some(3), Some(1)]
}?;

assert_eq!(out, expected);

Rust API changes

Deprecation: DataFrame::sort_with_options.

New implementation:

For DataFrame

pub fn sort_in_place(
    &mut self,
    by: impl IntoVec<SmartString>,
    sort_options: SortMultipleOptions,
)  -> PolarsResult<&mut Self>
pub fn sort(
    &self,
    by: impl IntoVec<SmartString>,
    sort_options: SortMultipleOptions,
) -> PolarsResult<Self>
pub fn top_k(
    &self,
    k: usize,
    by_column: impl IntoVec<SmartString>,
    sort_options: SortMultipleOptions,
) -> PolarsResult<DataFrame>

For LazyFrame

pub fn sort(self, by: impl IntoVec<SmartString>, sort_options: SortMultipleOptions) -> Self
pub fn sort_by_exprs<E: AsRef<[Expr]>>(
    self,
    by_exprs: E,
    sort_options: SortMultipleOptions,
) -> Self
// as well as bottom k
pub fn top_k<E: AsRef<[Expr]>>(
    self,
    k: IdxSize,
    by_exprs: E,
    sort_options: SortMultipleOptions,
) -> Self

For Expr

pub fn sort(self, options: SortOptions) -> Self
pub fn sort_by<E: AsRef<[IE]>, IE: Into<Expr> + Clone>(
    self,
    by: E,
    sort_options: SortMultipleOptions,
) -> Expr
pub fn arg_sort(self, sort_options: SortOptions) -> Self

For Series

pub fn sort(&self, sort_options: SortOptions) -> PolarsResult<Self>

For Functions

pub fn arg_sort_by<E: AsRef<[Expr]>>(by: E, sort_options: SortMultipleOptions) -> Expr

Python API Changes

Added

  • nulls_last: bool = False
  • multithreaded: bool = True
  • maintain_order: bool = False

for all sort related apis if not exists.

@github-actions github-actions bot added breaking rust Change that breaks backwards compatibility for the Rust crate internal An internal refactor or improvement python Related to Python Polars rust Related to Rust Polars labels Apr 11, 2024
@CanglongCl CanglongCl changed the title refactor(rust!, python): Unify sort with SortOptions and SortMultipleOptions. pref(rust!, python): Unify sort with SortOptions and SortMultipleOptions. Apr 11, 2024
Copy link

codecov bot commented Apr 11, 2024

Codecov Report

Attention: Patch coverage is 80.57971% with 134 lines in your changes are missing coverage. Please review.

Project coverage is 81.10%. Comparing base (003f4f2) to head (83be101).
Report is 4 commits behind head on main.

❗ Current head 83be101 differs from pull request most recent head 31b6bbf. Consider uploading reports for the commit 31b6bbf to get more accurate results

Files Patch % Lines
crates/polars-ops/src/chunked_array/top_k.rs 43.24% 21 Missing ⚠️
...rs-core/src/chunked_array/ops/sort/arg_bottom_k.rs 74.54% 14 Missing ⚠️
.../polars-core/src/chunked_array/ops/sort/options.rs 86.25% 11 Missing ⚠️
...n/src/logical_plan/optimizer/simplify_functions.rs 0.00% 10 Missing ⚠️
...ates/polars-core/src/chunked_array/ops/sort/mod.rs 86.00% 7 Missing ⚠️
crates/polars-plan/src/logical_plan/tree_format.rs 0.00% 7 Missing ⚠️
...s-core/src/series/implementations/binary_offset.rs 0.00% 6 Missing ⚠️
...lars-core/src/series/implementations/dates_time.rs 0.00% 6 Missing ⚠️
...polars-core/src/series/implementations/datetime.rs 0.00% 6 Missing ⚠️
...polars-core/src/series/implementations/duration.rs 0.00% 6 Missing ⚠️
... and 12 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15590      +/-   ##
==========================================
- Coverage   81.15%   81.10%   -0.06%     
==========================================
  Files        1367     1369       +2     
  Lines      174928   175204     +276     
  Branches     2530     2530              
==========================================
+ Hits       141971   142099     +128     
- Misses      32482    32629     +147     
- Partials      475      476       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

codspeed-hq bot commented Apr 11, 2024

CodSpeed Performance Report

Merging #15590 will not alter performance

Comparing CanglongCl:sort (31b6bbf) with main (c8e26ca)

Summary

✅ 22 untouched benchmarks

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good. Can you do a rebase?

@CanglongCl CanglongCl changed the title pref(rust!, python): Unify sort with SortOptions and SortMultipleOptions. pref(rust!, python): Unify sort with SortOptions and SortMultipleOptions Apr 12, 2024
@ritchie46 ritchie46 merged commit 23791bd into pola-rs:main Apr 12, 2024
27 checks passed
@CanglongCl CanglongCl deleted the sort branch April 12, 2024 21:09
@wsyxbcl
Copy link
Contributor

wsyxbcl commented Apr 15, 2024

Regarding with_order_descendings in SortMultipleOptions, what's the preferred use case? For instance, if one aims to sort three columns ["a", "b", "c"] in a specific order, what should the input be?

@CanglongCl
Copy link
Contributor Author

CanglongCl commented Apr 15, 2024

Regarding with_order_descendings in SortMultipleOptions, what's the preferred use case? For instance, if one aims to sort three columns ["a", "b", "c"] in a specific order, what should the input be?

Just be consistent with the columns input. e.g. for ["a", "b", "c"] with order acceding, descending, ascending, input should be SortMultipleOptions::default().with_order_descendings([false, true, false])

If all a, b, c are using descending order, we can also use SortMultipleOptions::default().with_order_descending(true) to apply for all columns

@wsyxbcl
Copy link
Contributor

wsyxbcl commented Apr 16, 2024

@CanglongCl Oops, my mistake. I didn't notice the descending aspect and assumed the order referred to the sequence in which the columns were being sorted. Thanks for the response anyway!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking rust Change that breaks backwards compatibility for the Rust crate internal An internal refactor or improvement python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make sort APIs consistent, ergonomic and extensible Unify sort and sort_with_options
3 participants