-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathStanfordNLP
15 lines (7 loc) · 1.33 KB
/
StanfordNLP
1
2
3
4
5
6
7
8
9
10
11
12
StanfordNLP(http://stanfordnlp.github.io/CoreNLP/cmdline.html):
annie@annie-Lenovo ~/Downloads/stanford-corenlp-full-2015-12-09 $ java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref -file input.txt
If this command is run from the distribution directory, it processes the included sample file input.txt. We use a wildcard "*" after -cp to load all jar files in the current directory – it needs to be in quotes. This command writes the output to an XML file named input.txt.out in the same directory.
Notes:
Processing a short text like this is very inefficient. It takes a minute to load everything before processing begins. You should batch your processing.
Stanford CoreNLP requires Java version 1.8 or higher.
Specifying memory: -Xmx2g specifies the amount of RAM that Java will make available for CoreNLP. On a 64-bit machine, Stanford CoreNLP typically requires 2GB to run (and it may need even more, depending on the size of the document to parse). On a 32 bit machine (in 2016, this is most commonly a 32-bit Windows machine), you cannot allocate 2GB of RAM; probably you should try with -Xmx1800m. You are probably okay with a minimum of -Xmx1500m, but this amount of memory is a bit marginal. You probably can’t run some annotators, such as the statistical coref.