Skip to content

progval/datafusion-orc

 
 

Repository files navigation

datafusion-orc

Implementation of ORC file format read/write with Arrow in-memory format

test codecov Crates.io Crates.io

Read Apache ORC in Rust.

  • Read ORC files
  • Read stripes (the conversion from proto metadata to memory regions)
  • Decode stripes (the math of decode stripes into e.g. booleans, runs of RLE, etc.)
  • Decode ORC data to Arrow Datatypes (Async/Sync)

Current Support

Column Encoding Read Write Arrow DataType
SmallInt, Int, BigInt Int16, Int32, Int64
Float, Double Float32, Float64
String, Char, and VarChar Utf8
Boolean Boolean
TinyInt Int8
Binary Binary
Decimal
Date Date32
Timestamp Timestamp(Nanosecond,_)
Timestamp instant
Struct Struct
List List
Map Map
Union

Compression Support

Compression Read Write
None
ZLIB
SNAPPY
LZO
LZ4
ZSTD

Benchmark

Run cargo bench for simple benchmarks.

About

Implementation of ORC file format

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 96.1%
  • Python 3.3%
  • Other 0.6%