Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump Polars to v0.41.3 #917

Merged
merged 32 commits into from
Jul 24, 2024
Merged

Bump Polars to v0.41.3 #917

merged 32 commits into from
Jul 24, 2024

Conversation

lkarthee
Copy link
Member

@lkarthee lkarthee commented Jun 2, 2024

No description provided.

Copy link
Member Author

@lkarthee lkarthee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished in 3.5 seconds (3.5s async, 0.00s sync)
610 doctests, 1 property, 1485 tests, 71 failures, 20 excluded, 1 skipped

Randomized with seed 279745

Most of the test cases fail because:

  • pow dtype change
  • pivot_wider, pivot_longer
  • string slice
  • cross join param
  • window functions
  • commented few tests which are crashing polars

native/explorer/rust-toolchain.toml Show resolved Hide resolved
test/explorer/data_frame/grouped_test.exs Outdated Show resolved Hide resolved
@philss
Copy link
Member

philss commented Jul 19, 2024

We are almost there! But a problem regarding the cloud integration is still blocking this. I'm trying to fix it by updating object_store and our custom CloudWriter (Rust side), but I suspect that the problem is more complex due to some Polars' changes. I will keep trying :)

This makes the "ObjectStore" choose to upload using a single request or
a multi request (multi part) upload depending on the size of the chunks.

This change is needed because Polars is now calling the writer without
buffering, so this was breaking the upload.

There are two tests failing for different reasons:
- the IPC upload to a unknown bucket is failing because the new
  CloudWriter is not propagating the error. This is probably an easy
  fix.
- the Lazy "parquet to cloud" is failing for the reason I wrote above.
  This is probably related to this issue:

  pola-rs/polars#17172
This is an attempt to fix the issues with the sink_parquet_cloud.
The hope is that the new version has fixed the issue.
The idea is to avoid depending on shutting down only on "Drop". This is
also important because we need to verify if the file was created.
@philss
Copy link
Member

philss commented Jul 22, 2024

There is a good news and a bad news. The good news is that we have only one failing test! 🎉 🥳
But the bad news is that we may need to submit changes to Polars in order to fix the failing test.

In a nutshell, the problem is on the LazyFrame#sink_parquet_cloud function from Polars that is failing probably because of the reasons I wrote in this issue: pola-rs/polars#17172 (comment)

We can either wait for the fix to land on Polars, or deactivate the feature until there. This is going to affect people using the :streaming option of to_parquet/2 (cloud version). WDYT @josevalim @lkarthee @billylanchantin?


EDIT: I'm going to play with an idea to use the sink_parquet and another process to read the file, that can be a fifo/named pipe, and write to the cloud storage. I will let you know by the end of the day.

@josevalim
Copy link
Member

What do we for when streaming is not supported? We raise or we just do the best we can? I am worried Polars is not ultimately interested in maintaining these features. We have another related PR that is open for quite some time.

@philss
Copy link
Member

philss commented Jul 23, 2024

What do we for when streaming is not supported? We raise or we just do the best we can?

I think we can raise if it's not possible to support a given feature, but we can try to implement "hacks" when possible, until we need to remove the feature all together. Unfortunately we can't have our patches (via monkey patches) to the Polars codebase, and I don't think it's a good idea to maintain a fork to solve our issues. WDYT?

I am worried Polars is not ultimately interested in maintaining these features. We have another related PR that is open for quite some time.

Yeah, I'm worried too. Maybe it's not their focus to maintain this cloud integration in their Rust codebase.

@josevalim
Copy link
Member

Yeah, let's raise for now and add a note to the CHANGELOG. Meanwhile, if you could send a PR to them, it would be great!

The reason it didn't work is that the sink_parquet function will try to
create the file where the "fifo" file is located. This is not going to
work.
@philss
Copy link
Member

philss commented Jul 23, 2024

@josevalim OK, I will add a note to the change log later. My PoC didn't work out, but I was wondering if we could still "enable streaming", but first stream to a temporary file, and them read from that file and send to the cloud. This could keep the convenience of saving to S3 right from Explorer, with the downside of requiring some space in disk, and probably extra time. WDYT?

Meanwhile, if you could send a PR to them, it would be great!

Yeah, I will try that! Something similar to our CloudWriter may work.

@josevalim
Copy link
Member

Let’s just disable for now. People can stay in the old version until Polars is updated with a fix.

@philss philss changed the title Bump polars to v0.40 Bump Polars to v0.41.3 Jul 24, 2024
@lkarthee
Copy link
Member Author

+1 for disabling and we should document it in the changelog.

@philss philss marked this pull request as ready for review July 24, 2024 14:27
Copy link
Contributor

@billylanchantin billylanchantin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I rewrote the changelog a tiny bit so each bullet was similarly worded (and fixed a typo).

Also I'll be honest, I skimmed the Rust code. But my Rust skills are nascent so I'm only sorta helpful on that side of the house anyway 😅. The Elixir side seems solid, though!

CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: Billy Lanchantin <[email protected]>
@billylanchantin billylanchantin self-requested a review July 24, 2024 16:23
@philss philss merged commit 59504fa into main Jul 24, 2024
4 checks passed
@philss philss deleted the ps-bump-polars-to-v0.39.2 branch July 24, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants