Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_parquet encoding no longer recognized by PBI Service parquet connector after Polars 1.5.0 onwards #18819

Open
2 tasks done
darrylthom opened this issue Sep 18, 2024 · 4 comments
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@darrylthom
Copy link

darrylthom commented Sep 18, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

df.write_parquet("data.parquet")

Log output

No response

Issue description

Just to explain the setup a bit:
Parquet gets written to a network drive. Report published to PBI Service connects to this parquet file using an on-premises gateway.

Refreshing works on local copy of PBI file, but through PBI Service specifically, it is now giving an error:
Data source error: {"error":{"code":"DM_GWPipeline_Gateway_MashupDataAccessError","pbi.error":{"code":"DM_GWPipeline_Gateway_MashupataAccessError","parameters":{},,"details":[{"code":"DM_errorDetailNameCode_UnderlyingErrorCode","detail":{"type":1,"value":"-2147467259"}},{"code":"DM_ErrorDetailNameCode_UnderlyingErrorMessage","detail":{"type":1,"value":"Parquet: class parquet::ParquetException (message: 'Unknown encoding type.'"}}, {"code": "DM_ErrorDetailNameCode_UnderlyingHResult", "detail":{"type":1,"value":"-2147467259"}},"code":"Microsoft.Data.Mashup.ValueError.Reason","detail":{"type":1,"value":"DataFormat.Error"}}]"eceptionCulprit":1}}}

This refreshes fine locally -- the problem is PBI Service specifically. I tested generating my parquet files version to version from Polars 1.2 up until current, and I start getting these messages as of Polars 1.5.0's write_parquet specifically.

I believe something changed specifically in the write_parquet output that is causing it to no longer be compatible with the PBI Service's parquet connector in newer versions. I have analyzed the schema and the meta data and they are exactly the same in the old output versus new output.

Expected behavior

As nothing has changed in my schema or meta data, the files should be refreshing, but it seems like write_parquet's encoding is not recognized by PBI Service as of 1.5.0 onwards.

Installed versions

polars 1.5.0 to 1.7.1 (tested version by version)
@darrylthom darrylthom added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Sep 18, 2024
@darrylthom darrylthom changed the title write_parquet encoding not recognized by PBI Service parquet connector after Polars 1.5.0 onwards write_parquet encoding no longer recognized by PBI Service parquet connector after Polars 1.5.0 onwards Sep 18, 2024
@coastalwhite
Copy link
Collaborator

My guess is that this has to do with the Boolean Hybrid-RLE encoding.

@darrylthom
Copy link
Author

My guess is that this has to do with the Boolean Hybrid-RLE encoding.

Yes, this is it exactly. When I drop my boolean columns from my parquet file in the latest Polars, PBI Service refreshes the file successfully.

@ritchie46
Copy link
Member

It seems that the service only supports older parquet formats/encodings. For now you can circumvent the issue by writing via pyarrow which allows you to select different encodings.

This is something we could also support to a limited extend.

@darrylthom
Copy link
Author

darrylthom commented Oct 1, 2024

It seems that the service only supports older parquet formats/encodings. For now you can circumvent the issue by writing via pyarrow which allows you to select different encodings.

This is something we could also support to a limited extend.

Writing with pyarrow for the meantime worked. I tried creating an issue with the PowerBI team, but it got caught with their triaging vendor who was claiming it had to do with Polars and not Power BI so they wouldn't escalate it to the product team and recommended I downgrade instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

3 participants