Skip to content

Parser for pulling data out of FEC Filings. Open sourced for Transparency Camp 2014

License

Notifications You must be signed in to change notification settings

opensecrets/parsefec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

parsefec

Parser for pulling data out of FEC filings. Open sourced for Transparency Camp 2014.

#Background

The Federal Election Commission releases campaign finance disclosure data on its web site for non-Senate federal candidates. Each file is a zip-compressed archive of .FEC files. The .FEC format is a plain text, file separator character (ASCII 28) delimited file where each row represents a required FEC disclosure form.

There are currently 47 possible forms and schedules any line in the files could represent. The name of the form for the row is in the first few characters of the row.

parsefec is variation on a script we use at OpenSecrets to load these files into a database. Some functionality will be more useful to reporters and researchers doing small downloads without a database so we've made an effort to include those types of features. From the command line, the --mode parameter controls output in text, insert clauses for use in datbases where direct access is not available, or through pyodbc. The values for mode are --mode=text, --mode=inserts, and --mode=db respectively.

All of the default settings and directories are in the repo including a small zip file. If you download the repository and run python parsefec.py you should see text output. The only requirement is argparse (pip install argparse) for text output.

For more advanced settings, see settings.py where you can choose a database driver, which forms to process, input/output directories and others as time goes on.

#Usage

####Command Line:

# Text output using all defaults.  Uses directories and schema included in repo.
> python parsefec.py --mode=text -d='\t'
# Help
> python parserfec.py --help
usage: parsefec.py [-h] [--outdir OUTDIR] [--inputdir INPUTDIR] [--mode MODE]
                   [--delimiter DELIMITER]

Parser for FEC Electronic Filing data from OpenSecrets.org

optional arguments:
  -h, --help            show this help message and exit
  --inputdir INPUTDIR, -i INPUTDIR
                        Directory of zip files from
                        ftp://ftp.fec.gov/FEC/electronic/
  --mode MODE, -m MODE  Mode of output: db, inserts (insert statements), text
  --delimiter DELIMITER, -d DELIMITER
                        Delimiter for text output. Use python escapes:
                        --delimiter='\t'

####As a library (in development):

import parsefec

parsefec.parseDir('input')

#Correspondence Data

The FEC includes official correspondence from committees in the data as multi-line text documents marked by [BEGINTEXT] and [ENDTEXT]. These documents are saved by default in the logs directory.

#Logs

Any errors such as truncation of a field, an unrecognized form code, and data type errors that occur in the data such as poorly formed dates are stored in a directory with the date the script was executed. Any subsequent runs of the script will append to the day's file. Deleting it, or the correspondence log, will not cause errors.

#Similar Projects

ParseFEC is one of several FEC Electronic Filing parsers available from the NYT, USA Today, and The Sunlight Foundation.

#Roadmap

  • Automatic download from ftp.fec.gov/FEC/electronic/
  • Unit tests

About

Parser for pulling data out of FEC Filings. Open sourced for Transparency Camp 2014

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages