You can write data by constructing an instance of ParquetWriter class or using one of the helper classes. In the simplest case, to write a sample dataset to c:\data\output.parquet
you would write the following code:
using System.IO;
using Parquet;
using Parquet.Data;
var ds = new DataSet(
new DataField<int>("id"),
new DataField<string>("city")
);
ds.Add(1, "London");
ds.Add(2, "Derby");
using(Stream fileStream = File.OpenWrite("c:\\data\\output.parquet"))
{
using(var writer = new ParquetWriter(fileStream))
{
writer.Write(ds);
}
}
DataSet is a rich structure representing the data and is used extensively in Parquet.Net.
You can also do the same thing simpler with a helper method
using(Stream fileStream = File.OpenWrite("c:\\data\\output.parquet"))
{
ParquetWriter.Write(ds, fileStream);
}
You can set some writer options by passing WriterOptions instance to Write
methods, which at the moment are:
- RowGroupsSize size of a row group when writing. Parquet internally can page file in row groups which can decrease the amount of RAM when making queries over the file in systems like Apache Spark. By default we set row group size to 5000 rows.