Skip to content

Latest commit

 

History

History
6 lines (3 loc) · 722 Bytes

README.md

File metadata and controls

6 lines (3 loc) · 722 Bytes

Geneology pdf document data parser

The primary purpose of the python code in this folder is to parse text data from the geneology register published on www.vanderlinde.org.za into a python dataframe. A dataframe is a very powerfull python format used for manipulating huge datasets. Once in a dataframe it can be exported to many formats, including excel, word, pdf's and a lot of database formatsusing SQLAlchemy in conjuncion with Pandas dataframes.

This is a work in progress at the moment, so extraction is still in porcess. The main python code for this process in in process.py.