Name		Name	Last commit message	Last commit date
parent directory ..
project		project
sbt		sbt
src		src
test		test
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt
run.sh		run.sh
run_parallel.sh		run_parallel.sh
setup.sh		setup.sh

README.md

Parser

Run setup.sh to install dependencies and build the parser.

We assume that your input has the following format. There's one line per document and each document is a JSON object with a key and content field.

{ "item_id":"doc1", "content":"Here is the content of my document.\nAnd here's another line." }
{ "item_id":"doc2", "content":"Here's another document." }

You can run the NLP pipeline on 1 core as follows:

cat input.json | ./run.sh -i json -k "item_id" -v "content" > output.tsv

You can run the NLP pipeline on 16 cores as follows:

./run_parallel.sh -in="input.json" --parallelism=16 -i json -k "item_id" -v "content"

You can run the NLP pipeline as a REST service as follows:

./run.sh -p 8080

The output will be files in tsv-format that you can directly load into the database.

Setup

This package requires Java 8.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parser

parser

README.md

Parser

Setup

Files

parser

Directory actions

More options

Directory actions

More options

Latest commit

History

parser

Folders and files

parent directory

README.md

Parser

Setup