Since v4.8 support for Microsoft.Data.Analysis
was added.
Due to DataFrame
being in general less functional than Parquet, only primitive (atomic) columns are supported at the moment. If DataFrame
supports more functionality in future (see related links below), this integration can be extended.
When reading and writing, this integration will ignore any columns that are not atomic.
There is magic happening under the hood, but as a user you only need to call WriteAsync()
extension method on DataFrame
and specify the destination stream to write it to, like so:
DataFrame df;
await df.WriteAsync(stream);
As with writing, the magic is already done under the hood, so you can use System.IO.Stream
's extension method to read from parquet stream into DataFrame
DataFrame df = await fs.ReadParquetAsDataFrameAsync();
For your convenience, there is a sample Jupyter notebook available that demonstrates reading parquet files into DataFrame
and displaying them:
In order to run this notebook, you can use VS Code with Polyglot Notebooks extension.
- Original blog post "An Introduction to DataFrame".
- External GitHub Issues