MARCdata

Writes selected parts of a MARC XML bibliographic data file into a .csv.gz file.

Preliminaries

GNU Make (build automation)
Clang (compiler)
zlib.h (header)

C++11

Requires standard library support for the following C++11 extensions:

range-based for loop
auto type specifier
nullptr identifier

Usage

Input files (XML) should be placed in folder 'data'. Output goes there as well.

For instance for Fennica, see ~/data/fennica/raw/fennica_*.xml.gz

Note that the XML files should be uncompressed before parsing!

ESTC

FOR THE CURRENT CSV CONVERSION PIPELINE, SEE https://github.com/COMHIS/estc-raw-csv-prepicker.

For the old versions, use:

make estc
./estc

Fennica

make fennica
./fennica

Kungliga

bash split-kungliga.sh
make kungliga
./kungliga

Göttingen

make cerl
./cerl

Author

Niko Ilomäki

Contributions by Leo Lahti

License

MIT License

RapidXML library by Marcin Kalicinski and licensed under the MIT License

Gzstream library by Deepak Bandyopadhyay and Lutz Kettner and licensed under LGPL 2.1

Log

Jul 12 Odd behavior in language field (008) parsing was observed with Kungliga. It turned out that the last digits 38/39 are sometimes missing in Kungliga 008 field, so the parser was changed to start reading from the beginning of the line instead of the end of the line (as in other catalogs). This yields recognizable language codes for 99.88% of the entries now.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
data		data
gzstream		gzstream
rapidxml		rapidxml
.gitignore		.gitignore
LICENSE		LICENSE
MARCdata.cpp		MARCdata.cpp
MARCdata.hpp		MARCdata.hpp
Makefile		Makefile
README.md		README.md
cerl.cpp		cerl.cpp
estc.cpp		estc.cpp
fennica.cpp		fennica.cpp
kungliga.cpp		kungliga.cpp
split-kungliga.sh		split-kungliga.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MARCdata

Preliminaries

C++11

Usage

ESTC

Fennica

Kungliga

Göttingen

Author

License

Log

About

Uh oh!

Releases

Packages

Languages

License

COMHIS/MARCdata

Folders and files

Latest commit

History

Repository files navigation

MARCdata

Preliminaries

C++11

Usage

ESTC

Fennica

Kungliga

Göttingen

Author

License

Log

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages