Legend single threaded extractor written in Perl. External libraries such as SVMheaderparse, ParsCit, and PDFBox are involked.
To build and run project: ant jar # builds jar in dist directory and copies other resources there ant run # starts program These commands should be run from the application's root directory
Everything in the cpy directory gets copied to the dist folder along with the jar
when ant jar
or ant run
command is run.
The config options should be set appropriate in the config/config.properties file. Note that these settings are only read once at startup and changing them while the program is running won't have any effect.
The various perl modules in the lib directory also have Config files where some options can be set.
Modify the dist/runtime.properties file so the 'stopProcessing' property is set to true.
project root
|
/build # java class files generated by compiler - directory created automatically on build
build.xml # ant build file
/config # holds the config.properties file which contains project settings
/converters # contains binaries and needed files for pdf to text converters
/cpy # all files in here got copied to the dist directory on `ant jar` or `ant run` command
/crfpp # contains crf_learn and crf_test binaries as well as traindata folder. Used by parsCit I think
/dist # where the built jar file is placed as well as working resources during runtime - generated on `ant jar`
/lib # contains perl libraries for parsing, jar files required by the java program, and parseDocuments.pl script executed by jar
/logs # contains log files from each run of jar
/resources # contains resources such as dictionaries used by perl scripts during parsing
/src # contains java source code
/svm-light # ??? holds stuff used for something
/tmp # holds inconsequential files used temporarily
If an error like this appears when trying to run TET:
/lib/ld-linux.so.2: bad ELF interpreter: No such file or directory
it's most likely caused by a lack of proper 32-but libraries. See http://stackoverflow.com/questions/8328250/centos-64-bit-bad-elf-interpreter for a solution