Skip to content

pindograma/pdf-poll-parser

Repository files navigation

Welcome to pdf-poll-parser!

This is a series of Python scripts that uses pdfminer to extract polling data
from PDFs published by Brazilian pollsters. The PDFs generally follow patterns
that allow for this process.

Example usage:
>>> from pathlib import Path
>>> from multidados import MultidadosParser
>>> pathlist = Path('.').glob('data/multidados/*.pdf')
>>> MultidadosParser.parse(pathlist, 'multidados')

Currently supported pollsters are:
- Ibope (2012, 2014, 2016, 2018)
- Escutec (2012, 2016) *
- Paraná Pesquisas (2016, 2018) *
- Datafolha (2012, 2014, 2016, 2018) *
- Multidados (2016)
- Veritá (2012, 2014, 2016, 2018) **

Pollster PDFs were obtained from the following sources:
- Ibope: http://eleicoes.ibopeinteligencia.com/
- Escutec: http://www.escutec.com/
- Paraná Pesquisas: https://www.paranapesquisas.com.br/
- Datafolha: http://datafolha.folha.uol.com.br/
- Multidados: https://www.multidadospesquisa.com.br/

*: Parser fails on 5% or less of the PDF set. These outliers will require
manual parsing.

**: Not exactly a PDF parser. Instead, it parses a JSON obtained from
https://institutoverita.com.br/inc/js/psq.php.

About

Free polling data from pollster PDFs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages