awswrangler.s3.read_parquet() chunked=True 1 dataframe per result

Hi there,

I have a question regarding the `chunked=true` option in awswrangler.s3.read_parquet().
I'm looking to load parquet files from S3 in the most memory efficient way possible. Our data has a differing number of rows per parquet file, but the same number of columns (11). I'd like it so the results from `read_parquet()` is separated as a pandas DF on a per-parquet file basis. i.e. if based on the `filter_query` it returns 10 parquet files, I will receive 10 pandas DFs in return. `chunked=True` works if the number of rows is the same every time, but with our data there will be a different number of rows from time to time, so hard-coding the chunk size isn't feasible.  

The documentation says:
```
If chunked=True, a new DataFrame will be returned for each file in your path/dataset.
```
However it also seems to be choosing an arbitrary size to chunk in (in my case it's chunks of 65536)
Is there something I'm missing here with regards to this? Thanks very much for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

awswrangler.s3.read_parquet() chunked=True 1 dataframe per result #2086

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

awswrangler.s3.read_parquet() chunked=True 1 dataframe per result #2086

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions