Issue with processing parquet files that contains more than 1 RowGroup #364
Closed
andrewlyakh
started this conversation in
General
Replies: 2 comments
-
Sorry for the late reply. I understand the frustration, but the best way to fix this is to raise a PR. Unfortunately row-based API is not a priority for me now, and I'm investing time in replacing them in next versions due to slowness and instability. They were here since v1 but are getting a bit of a pain to support. That being said, if you can use low-level api or class serialization, it will be the best choice moving forward. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thanks for the reply! We'll try to figure out our own solution for this case. If it works, I'll create PR. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi!
We're trying to use this very useful library but we've met with the issue of parsing parquet files containing more than 1 RowGroup.
The ParquetExtensions.ReadAsTableAsync() function doesn't use row validation for the single-row group file (this is not a problem for us now).
But if the file schema contains more that 1 group, this row validation fails starting from the second group because of calling Table.Add(Row item) function (screenshots 1-3):
Validation fails because the EnumerableType doesn't equal the runtime element type (screenshot 4). The TryExtractIEnumerableType extension method returns the Object as an element type instead of the Row and the ValidateList() function throws the ArgumentException($"expected a collection of...") exception (line 74):
We would be very very grateful if this error could be corrected!
And again, thank you so much for this useful library!
Beta Was this translation helpful? Give feedback.
All reactions