Skip to content

Latest commit

 

History

History
44 lines (31 loc) · 1.32 KB

writing.md

File metadata and controls

44 lines (31 loc) · 1.32 KB

Writing Data

You can write data by constructing an instance of ParquetWriter class or using one of the helper classes. In the simplest case, to write a sample dataset to c:\data\output.parquet you would write the following code:

using System.IO;
using Parquet;
using Parquet.Data;

var ds = new DataSet(
	new DataField<int>("id"),
	new DataField<string>("city")
);

ds.Add(1, "London");
ds.Add(2, "Derby");

using(Stream fileStream = File.OpenWrite("c:\\data\\output.parquet"))
{
	using(var writer = new ParquetWriter(fileStream))
	{
		writer.Write(ds);
	}
}

DataSet is a rich structure representing the data and is used extensively in Parquet.Net.

You can also do the same thing simpler with a helper method

using(Stream fileStream = File.OpenWrite("c:\\data\\output.parquet"))
{
	ParquetWriter.Write(ds, fileStream);
}

Settings writer options

You can set some writer options by passing WriterOptions instance to Write methods, which at the moment are:

  • RowGroupsSize size of a row group when writing. Parquet internally can page file in row groups which can decrease the amount of RAM when making queries over the file in systems like Apache Spark. By default we set row group size to 5000 rows.