diff --git a/README.md b/README.md index 3cc3b79..e0162b1 100644 --- a/README.md +++ b/README.md @@ -1,30 +1,27 @@ -# (Poor man) Python ORC Reader +# Python ORC Reader ## What is it? -This is my attempt to write an ORC reader in python. The situation is that we have a lot of ORC files on local disk to consume -by Python but there is no efficient way to access the file without converting it to CSV or compatible format. +This is my attempt to write an ORC reader in python. The situation is that we have a lot of ORC files on local disk to consume by Python but there is no efficient way to access the file without converting it to CSV or compatible format. My approach is to use [orc-core](https://orc.apache.org/docs/core-java.html) java library to read ORC file, then use [py4j](https://github.com/bartdag/py4j) to bridge between Python and Java. -I call it poor man because it may not be a proper approach. This approach may not work or may suffer from performance issue -due to overhead. The proper approach would be using C++ reader from orc-core library. I want to go through this as an -exercise to know more about ORC and py4j. +This approach is not yet validated and or may suffer from performance issue due to overhead. The proper approach would be using C++ reader from orc-core library. I want to go through this as an exercise to know more about ORC and py4j. ## Installation Until this package is available on PIP, you will have to install the package as following: -1. Compile java gateway +Compile java gateway ``` bash cd java-gateway mvn clean compile assembly:single ``` -2. Run setup.py script to install the package to the system +Run setup.py script to install the package to the system ``` bash cd python