This project is a Haskell native reader and writer for the apache ORC file format; supporting reading and writing all types with snappy, zlib, and zstd compression standards (lzo and lz4 are not currently supported).
We have property based round-tripping tests for Orc files, and golden tests for the examples from the ORC specification. All files from the examples given in the ORC repository work, (apart from the LZO and LZ4 encoded ones). And large files from the tpcds benchmarks are able to be processed.
This project is currently licensed permissively under the BSD 3 clause licence.
We have presented a layered API using a withFile
pattern. Most users
will want to import Orc.Logical
and use withOrcFile
and putOrcFile
.
One of the primary use cases for developing this library was to gather
columnar data, which could be used as a C array. As such, we use
Storable.Vector
for column types, and gather entire stripes into
memory.
This is a different memory model to the C++ and Java versions, which seek through the files a lot more, but keep less data in memory.