-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
32 lines (26 loc) · 1.12 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Welcome to pdf-poll-parser!
This is a series of Python scripts that uses pdfminer to extract polling data
from PDFs published by Brazilian pollsters. The PDFs generally follow patterns
that allow for this process.
Example usage:
>>> from pathlib import Path
>>> from multidados import MultidadosParser
>>> pathlist = Path('.').glob('data/multidados/*.pdf')
>>> MultidadosParser.parse(pathlist, 'multidados')
Currently supported pollsters are:
- Ibope (2012, 2014, 2016, 2018)
- Escutec (2012, 2016) *
- Paraná Pesquisas (2016, 2018) *
- Datafolha (2012, 2014, 2016, 2018) *
- Multidados (2016)
- Veritá (2012, 2014, 2016, 2018) **
Pollster PDFs were obtained from the following sources:
- Ibope: http://eleicoes.ibopeinteligencia.com/
- Escutec: http://www.escutec.com/
- Paraná Pesquisas: https://www.paranapesquisas.com.br/
- Datafolha: http://datafolha.folha.uol.com.br/
- Multidados: https://www.multidadospesquisa.com.br/
*: Parser fails on 5% or less of the PDF set. These outliers will require
manual parsing.
**: Not exactly a PDF parser. Instead, it parses a JSON obtained from
https://institutoverita.com.br/inc/js/psq.php.