Skip to content

Latest commit

 

History

History
51 lines (39 loc) · 2.41 KB

README.md

File metadata and controls

51 lines (39 loc) · 2.41 KB

datafusion-orc

Implementation of ORC file format read/write with Arrow in-memory format

test codecov Crates.io Crates.io

Read Apache ORC in Rust.

  • Read ORC files
  • Read stripes (the conversion from proto metadata to memory regions)
  • Decode stripes (the math of decode stripes into e.g. booleans, runs of RLE, etc.)
  • Decode ORC data to Arrow Datatypes (Async/Sync)

Current Support

Column Encoding Read Write Arrow DataType
SmallInt, Int, BigInt Int16, Int32, Int64
Float, Double Float32, Float64
String, Char, and VarChar Utf8
Boolean Boolean
TinyInt Int8
Binary Binary
Decimal
Date Date32
Timestamp Timestamp(Nanosecond,_)
Timestamp instant
Struct Struct
List List
Map Map
Union

Compression Support

Compression Read Write
None
ZLIB
SNAPPY
LZO
LZ4
ZSTD

Benchmark

Run cargo bench for simple benchmarks.