Implementation of ORC file format read/write with Arrow in-memory format
Read Apache ORC in Rust.
- Read ORC files
- Read stripes (the conversion from proto metadata to memory regions)
- Decode stripes (the math of decode stripes into e.g. booleans, runs of RLE, etc.)
- Decode ORC data to Arrow Datatypes (Async/Sync)
Column Encoding | Read | Write | Arrow DataType |
---|---|---|---|
SmallInt, Int, BigInt | ✓ | Int16, Int32, Int64 | |
Float, Double | ✓ | Float32, Float64 | |
String, Char, and VarChar | ✓ | Utf8 | |
Boolean | ✓ | Boolean | |
TinyInt | ✓ | Int8 | |
Binary | ✓ | Binary | |
Decimal | ✗ | ||
Date | ✓ | Date32 | |
Timestamp | ✓ | Timestamp(Nanosecond,_) | |
Timestamp instant | ✗ | ||
Struct | ✓ | Struct | |
List | ✓ | List | |
Map | ✓ | Map | |
Union | ✗ |
Compression | Read | Write |
---|---|---|
None | ✓ | ✗ |
ZLIB | ✓ | ✗ |
SNAPPY | ✓ | ✗ |
LZO | ✓ | ✗ |
LZ4 | ✓ | ✗ |
ZSTD | ✓ | ✗ |
Run cargo bench
for simple benchmarks.