Skip to content

Fielded Batch Ingest Helper Script

Lauren Ko edited this page Aug 31, 2016 · 1 revision

Using Fielded Batch Ingest Helper Script

fielded_batch_ingest.py was written to allow bulk uploading of URLs to the Nomination Tool.

Usage is as follows:

fielded_batch_ingest - Adds urls from a text file into the URL table

example: fielded_batch_ingest.py -p <PROJECT_SLUG> -n <NOMINATOR_ID> filename

Takes a list of urls from a text file (with a required project and nominator specified)
and adds the urls to the URL table, adding a surt attribute if none exists.

Optional arguments are:
-p, specifies the project slug (required)
-n, specifies the nominator id (required)
-c, import the file as a csv file
-d, import file is pickled dictionary format
-h, --help, display help
-v, --verify, verify url is valid and available

Running Script From the Command Line

Make sure your Django settings file is accessible from the PYTHONPATH; set the DJANGO_SETTINGS_MODULE variable.

$ export PYTHONPATH="${PYTHONPATH}:/home/digital3/current/digital3/"
$ export DJANGO_SETTINGS_MODULE="config.settings.production"

Check for field names at the top of the CSV file and verify they correspond to existing Metadata_Field names for the project (unless you are intentionally wanting to create extra field names). The URL/entity field should be called "url" in the CSV file.

Make sure you run the script with the Python executable that has the django-nomination app installed (i. e. if you are running URL Nomination Tool in a virtual environment). If you are using a virtual environment, either activate the environment before running the script or manually indicate the correct Python path when running it.

Uploading URLs Only (no metadata)

If neither the -c nor the -d flag are passed, the file is expected to contain only URLs, one per line.

$ /home/digital3/current/env/bin/python fielded_batch_ingest.py -p test-project -n 44 ~/test.urls
 -p is the project slug
 -n is the nominator id
 ~/test.urls is the path to the text file containing one URL per line

Uploading Via CSV (allows inclusion of metadata)

$ /home/digital3/current/env/bin/python fielded_batch_ingest.py -c -p eth2016_bulk -n 546 ~/test.csv
 -p is the project slug
 -n is the nominator id
 -c indicates we are ingesting a CSV file
 ~/test.csv is the path to the CSV file containing the URLs and metadata to be uploaded

Sample CSV file:

url,Agency,List_Name
http://1.usa.gov,,analytics-usa-gov-sites-list
http://1010ez.med.va.gov,,analytics-usa-gov-sites-list
http://blogs.cdc.gov/,"CDC, Centers for Disease Control and Prevention",GPO-Active-Collections

In this example, note that Agency and List_Name are metadata fields already existing in the URL Nomination Tool for the project to which we are adding URLs.

Uploading Via Pickle (allows inclusion of multivalued metadata)

Data can also be uploaded to the URL Nomination Tool from a serialized Python object in pickle format. This allows multivalued attributes that can be stored in a Python dictionary as lists.

$ /home/digital3/current/env/bin/python fielded_batch_ingest.py -d -p test-project -n 44 ~/test.pkl
 -p is the project slug
 -n is the nominator id
 -d indicates we are ingesting a file containing a pickled object
 ~/test.pkl is the path to the file containing the pickle