-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elections only available in PDF #28
Comments
Results for years 2000, 2001, and early April 2002 and 2003 are also available only in PDF format. Should these be added to this list? |
Election 422 (2011-04-05) belongs in this list as well. |
This might be a good resource to try: https://github.com/UW-Deepdive-Infrastructure/table-extract Also: https://github.com/WZBSocialScienceCenter/pdftabextract |
For the 2006-09-12 primary election (id 437), These offices have only PDF files: Four additional PDFs are labeled recount: |
Here's an updated table of elections with no results, or results only in PDF format:
Results for some of the offices in these elections are only in PDF format: id 1577, 2002-11-05 now has xls data for District Attorney (previously only in PDF?) |
Last September, Derek recommended pdftotext, in the xPDF package. |
Here's a couple examples of running wxw_assm_60_94_pdf_16250.txt There are some format differences between pre- and post-2010. Also 2003 and earlier used vertical candidate names, and 2004 and later started showing those horizontally. |
The recount results for election 421 (2011-04-05) for Sheboygan district court branch 3 are also only available in PDF |
These elections have results for some or all offices that are available only in PDF:
I pressed WEC to get Excel versions of these added to their site. Their spokesperson said that since they've gone through a couple reorganizations since these elections occurred, they no longer have the database that these files were produced out of. So for them to produce and host Excel files for these elections, they'd have to extract results from the PDFs.
We'll have to figure out how to process these files. I've tried Tabula and PDF python libraries, but everything I've managed to produce has been lossy and/or messy.
Here's an example input file:
http://elections.wi.gov/sites/default/files/elecSpec03_wbw_assm18.pdf
The text was updated successfully, but these errors were encountered: