Skip to content

Commit f507ff7

Browse files
author
jf-tech
committed
omniparser initial commit
0 parents  commit f507ff7

File tree

122 files changed

+12233
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

122 files changed

+12233
-0
lines changed

.github/workflows/ci.yml

+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [ master ]
6+
pull_request:
7+
branches: [ master ]
8+
9+
jobs:
10+
11+
build:
12+
name: CI
13+
runs-on: ubuntu-latest
14+
steps:
15+
16+
- name: Set up Go 1.x
17+
uses: actions/setup-go@v2
18+
with:
19+
go-version: ^1.14
20+
id: go
21+
22+
- name: Check out code into the Go module directory
23+
uses: actions/checkout@v2
24+
25+
- name: Get dependencies
26+
run: |
27+
go get -v -t -d ./...
28+
if [ -f Gopkg.toml ]; then
29+
curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh
30+
dep ensure
31+
fi
32+
33+
- name: Build
34+
run: go build -v ./...
35+
36+
- name: Test
37+
run: go test -v ./... -race -coverprofile=coverage.txt -covermode=atomic
38+
39+
- name: Codecov
40+
uses: codecov/[email protected]
41+
with:
42+
# Repository upload token - get it from codecov.io. Required only for private repositories
43+
token: b5b56be5-5c41-4e4f-b789-eb6d39d53b51

.gitignore

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Binaries for programs and plugins
2+
*.exe
3+
*.exe~
4+
*.dll
5+
*.so
6+
*.dylib
7+
op
8+
9+
# Test binary, built with `go test -c`
10+
*.test
11+
12+
# JetBrain IDE
13+
*.idea
14+
15+
# Output of the go coverage tool, specifically when used with LiteIDE
16+
*.out
17+
18+
# Dependency directories (remove the comment below to include it)
19+
# vendor/
20+
21+
# Coverage
22+
coverage.txt

Dockerfile

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
FROM golang:1.14-alpine
2+
WORKDIR /omniparser
3+
COPY . .
4+
RUN go build -o cli/op cli/op.go
5+
RUN cli/op --help
6+
CMD cli/op server

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2020 JF Technology
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+130
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# omniparser
2+
![CI](https://github.com/jf-tech/omniparser/workflows/CI/badge.svg) [![codecov](https://codecov.io/gh/jf-tech/omniparser/branch/master/graph/badge.svg)](https://codecov.io/gh/jf-tech/omniparser) [![Go Report Card](https://goreportcard.com/badge/github.com/jf-tech/omniparser)](https://goreportcard.com/report/github.com/jf-tech/omniparser)
3+
4+
A parser in naive Golang that ingests and transforms input data of various formats (CSV, txt, XML, EDI, JSON)
5+
into desired JSON output based on a schema spec written in JSON.
6+
7+
Golang Version: 1.14.2
8+
9+
## Demo in Playground
10+
11+
Use https://omniparser.herokuapp.com/ (might need to wait for a few seconds for heroku instance to wake up)
12+
for trying out schemas and inputs, yours and from sample library, to see how transform works.
13+
14+
![](./cli/cmd/web/playground-demo.gif)
15+
16+
Take a detailed look at samples here:
17+
- [json examples](./samples/omniv2/json)
18+
- [xml examples](./samples/omniv2/xml).
19+
20+
## Simple Example (JSON -> JSON Transform)
21+
- Input:
22+
```
23+
{
24+
"order_id": "1234567",
25+
"tracking_number": "1z9999999999999999",
26+
"items": [
27+
{
28+
"item_sku": "ab123",
29+
"item_price": 12.34,
30+
"number_purchased": 5
31+
},
32+
{
33+
"item_sku": "ck763-23",
34+
"item_price": 3.12,
35+
"number_purchased": 2
36+
}
37+
]
38+
}
39+
```
40+
- Schema:
41+
```
42+
{
43+
"parser_settings": {
44+
"version": "omni.2.0",
45+
"file_format_type": "json"
46+
},
47+
"transform_declarations": {
48+
"FINAL_OUTPUT": { "xpath": ".", "object": {
49+
"order_id": { "xpath": "order_id" },
50+
"tracking_number": { "custom_func": {
51+
"name": "upper",
52+
"args": [ { "xpath": "tracking_number" } ]
53+
}},
54+
"items": { "array": [{ "xpath": "items/*", "object": {
55+
"sku": { "custom_func": {
56+
"name": "substring",
57+
"args": [
58+
{ "custom_func": { "name": "upper", "args": [ { "xpath": "item_sku" }]}},
59+
{ "const": "0", "_comment": "start index" },
60+
{ "const": "5", "_comment": "sub length" }
61+
]
62+
}},
63+
"total_price": { "custom_func": {
64+
"name": "javascript",
65+
"args": [
66+
{ "const": "num * price" },
67+
{ "const": "num:int" }, { "xpath": "number_purchased" },
68+
{ "const": "price:float" }, { "xpath": "item_price" }
69+
]
70+
}}
71+
}}]}
72+
}}
73+
}
74+
}
75+
```
76+
- Code:
77+
```
78+
schema, err := omniparser.NewSchema("schema-name", strings.NewReader("..."))
79+
if err != nil { ... }
80+
transform, err := parser.NewTransform("input-name", strings.NewReader("..."), &transformctx.Ctx{})
81+
if err != nil { ... }
82+
if !transform.Next() { ... }
83+
b, err := transform.Read()
84+
if err != nil { ... }
85+
fmt.Println(string(b))
86+
```
87+
- Output:
88+
```
89+
{
90+
"order_id": "1234567",
91+
"tracking_number": "1Z9999999999999999",
92+
"items": [
93+
{
94+
"sku": "AB123",
95+
"total_price": "61.7"
96+
},
97+
{
98+
"sku": "CK763",
99+
"total_price": "6.24"
100+
}
101+
]
102+
}
103+
```
104+
105+
## Why
106+
- No good ETL transform/parser library exists in Golang.
107+
- Even looking into Java and other languages, choices aren't many and all have limitations:
108+
- [Smooks](https://www.smooks.org/) is dead, plus its EDI parsing/transform is too heavyweight, needing code-gen.
109+
- [BeanIO](http://beanio.org/) can't deal with EDI input.
110+
- [Jolt](https://github.com/bazaarvoice/jolt) can't deal with anything other than JSON input.
111+
- [JSONata](https://jsonata.org/) still only JSON -> JSON transform.
112+
- Many of the parsers/transforms don't support streaming read, loading entire input into memory - not acceptable in some situations.
113+
114+
## Requirements
115+
- Golang 1.14
116+
117+
This is only needed for `javascript` engine integration. Please raise an issue if you think 1.14 is too high, and
118+
you don't need `javascript` custom_func. Then we may consider moving `javascript` custom_func into a separate
119+
extension repo/package; the rest of the library is just golang 1.12.
120+
121+
## Recent Feature Additions
122+
- added trie based high performance `times.SmartParse`.
123+
- command line interface (one-off `transform` cmd or long-running http `server` mode).
124+
- javascript engine integration as a custom_func.
125+
- JSON stream parser.
126+
- Extensibility:
127+
- Ability to provide custom functions.
128+
- Ability to provide custom schema handler.
129+
- Ability to customize the built-in omniv2 schema handler's parsing code.
130+
- Ability to provide a new file format support to built-in omniv2 schema handler.

cli.sh

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
#!/bin/bash
2+
SCRIPT_DIR=$(pwd `dirname "$0"`)
3+
go run $SCRIPT_DIR/cli/op.go "$@"

cli/cmd/rootCmd.go

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
package cmd
2+
3+
import (
4+
"github.com/spf13/cobra"
5+
)
6+
7+
var rootCmd = &cobra.Command{
8+
Use: "op",
9+
Long: "op is a CLI of omniparser that ingests data input (such as CSV/XML/JSON/EDI/etc) and transforms into desired output by a schema.",
10+
}
11+
12+
func init() {
13+
rootCmd.AddCommand(transformCmd)
14+
rootCmd.AddCommand(serverCmd)
15+
}
16+
17+
// Execute executes the root command.
18+
func Execute() error {
19+
return rootCmd.Execute()
20+
}

0 commit comments

Comments
 (0)