Writes selected parts of a MARC XML bibliographic data file into a .csv.gz file.
Requires standard library support for the following C++11 extensions:
- range-based for loop
- auto type specifier
- nullptr identifier
Input files (XML) should be placed in folder 'data'. Output goes there as well.
For instance for Fennica, see ~/data/fennica/raw/fennica_*.xml.gz
Note that the XML files should be uncompressed before parsing!
FOR THE CURRENT CSV CONVERSION PIPELINE, SEE https://github.com/COMHIS/estc-raw-csv-prepicker.
For the old versions, use:
make estc
./estc
make fennica
./fennica
bash split-kungliga.sh
make kungliga
./kungliga
make cerl
./cerl
Contributions by Leo Lahti
MIT License
RapidXML library by Marcin Kalicinski and licensed under the MIT License
Gzstream library by Deepak Bandyopadhyay and Lutz Kettner and licensed under LGPL 2.1
Jul 12 Odd behavior in language field (008) parsing was observed with Kungliga. It turned out that the last digits 38/39 are sometimes missing in Kungliga 008 field, so the parser was changed to start reading from the beginning of the line instead of the end of the line (as in other catalogs). This yields recognizable language codes for 99.88% of the entries now.