You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-8Lines changed: 5 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -1,30 +1,27 @@
1
-
# (Poor man) Python ORC Reader
1
+
# Python ORC Reader
2
2
3
3
## What is it?
4
4
5
-
This is my attempt to write an ORC reader in python. The situation is that we have a lot of ORC files on local disk to consume
6
-
by Python but there is no efficient way to access the file without converting it to CSV or compatible format.
5
+
This is my attempt to write an ORC reader in python. The situation is that we have a lot of ORC files on local disk to consume by Python but there is no efficient way to access the file without converting it to CSV or compatible format.
7
6
8
7
My approach is to use [orc-core](https://orc.apache.org/docs/core-java.html) java library to read ORC file, then use
9
8
[py4j](https://github.com/bartdag/py4j) to bridge between Python and Java.
10
9
11
-
I call it poor man because it may not be a proper approach. This approach may not work or may suffer from performance issue
12
-
due to overhead. The proper approach would be using C++ reader from orc-core library. I want to go through this as an
13
-
exercise to know more about ORC and py4j.
10
+
This approach is not yet validated and or may suffer from performance issue due to overhead. The proper approach would be using C++ reader from orc-core library. I want to go through this as an exercise to know more about ORC and py4j.
14
11
15
12
16
13
## Installation
17
14
18
15
Until this package is available on PIP, you will have to install the package as following:
19
16
20
-
1.Compile java gateway
17
+
Compile java gateway
21
18
22
19
```bash
23
20
cd java-gateway
24
21
mvn clean compile assembly:single
25
22
```
26
23
27
-
2.Run setup.py script to install the package to the system
24
+
Run setup.py script to install the package to the system
0 commit comments