Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
nqbao authored Jul 7, 2016
1 parent bd0cf14 commit bb5c397
Showing 1 changed file with 5 additions and 8 deletions.
13 changes: 5 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,27 @@
# (Poor man) Python ORC Reader
# Python ORC Reader

## What is it?

This is my attempt to write an ORC reader in python. The situation is that we have a lot of ORC files on local disk to consume
by Python but there is no efficient way to access the file without converting it to CSV or compatible format.
This is my attempt to write an ORC reader in python. The situation is that we have a lot of ORC files on local disk to consume by Python but there is no efficient way to access the file without converting it to CSV or compatible format.

My approach is to use [orc-core](https://orc.apache.org/docs/core-java.html) java library to read ORC file, then use
[py4j](https://github.com/bartdag/py4j) to bridge between Python and Java.

I call it poor man because it may not be a proper approach. This approach may not work or may suffer from performance issue
due to overhead. The proper approach would be using C++ reader from orc-core library. I want to go through this as an
exercise to know more about ORC and py4j.
This approach is not yet validated and or may suffer from performance issue due to overhead. The proper approach would be using C++ reader from orc-core library. I want to go through this as an exercise to know more about ORC and py4j.


## Installation

Until this package is available on PIP, you will have to install the package as following:

1. Compile java gateway
Compile java gateway

``` bash
cd java-gateway
mvn clean compile assembly:single
```

2. Run setup.py script to install the package to the system
Run setup.py script to install the package to the system

``` bash
cd python
Expand Down

0 comments on commit bb5c397

Please sign in to comment.