Parser for pulling data out of FEC filings. Open sourced for Transparency Camp 2014.
#Background
The Federal Election Commission releases campaign finance disclosure data on its web site for non-Senate federal candidates. Each file is a zip-compressed archive of .FEC files. The .FEC format is a plain text, file separator character (ASCII 28) delimited file where each row represents a required FEC disclosure form.
There are currently 47 possible forms and schedules any line in the files could represent. The name of the form for the row is in the first few characters of the row.
parsefec is variation on a script we use at OpenSecrets to load these files into a database. Some functionality will be more useful to reporters and researchers doing small downloads without a database so we've made an effort to include those types of features. From the command line, the --mode
parameter controls output in text, insert clauses for use in datbases where direct access is not available, or through pyodbc. The values for mode are --mode=text
, --mode=inserts
, and --mode=db
respectively.
All of the default settings and directories are in the repo including a small zip file. If you download the repository and run python parsefec.py
you should see text output. The only requirement is argparse (pip install argparse
) for text output.
For more advanced settings, see settings.py where you can choose a database driver, which forms to process, input/output directories and others as time goes on.
#Usage
####Command Line:
# Text output using all defaults. Uses directories and schema included in repo.
> python parsefec.py --mode=text -d='\t'
# Help
> python parserfec.py --help
usage: parsefec.py [-h] [--outdir OUTDIR] [--inputdir INPUTDIR] [--mode MODE]
[--delimiter DELIMITER]
Parser for FEC Electronic Filing data from OpenSecrets.org
optional arguments:
-h, --help show this help message and exit
--inputdir INPUTDIR, -i INPUTDIR
Directory of zip files from
ftp://ftp.fec.gov/FEC/electronic/
--mode MODE, -m MODE Mode of output: db, inserts (insert statements), text
--delimiter DELIMITER, -d DELIMITER
Delimiter for text output. Use python escapes:
--delimiter='\t'
####As a library (in development):
import parsefec
parsefec.parseDir('input')
#Correspondence Data
The FEC includes official correspondence from committees in the data as multi-line text documents marked by [BEGINTEXT]
and [ENDTEXT]
. These documents are saved by default in the logs directory.
#Logs
Any errors such as truncation of a field, an unrecognized form code, and data type errors that occur in the data such as poorly formed dates are stored in a directory with the date the script was executed. Any subsequent runs of the script will append to the day's file. Deleting it, or the correspondence log, will not cause errors.
#Similar Projects
ParseFEC is one of several FEC Electronic Filing parsers available from the NYT, USA Today, and The Sunlight Foundation.
#Roadmap
- Automatic download from ftp.fec.gov/FEC/electronic/
- Unit tests