-
Notifications
You must be signed in to change notification settings - Fork 1
Fielded Batch Ingest Helper Script
fielded_batch_ingest.py was written to allow bulk uploading of URLs to the Nomination Tool.
Usage is as follows:
fielded_batch_ingest - Adds urls from a text file into the URL table
example: fielded_batch_ingest.py -p <PROJECT_SLUG> -n <NOMINATOR_ID> filename
Takes a list of urls from a text file (with a required project and nominator specified)
and adds the urls to the URL table, adding a surt attribute if none exists.
Optional arguments are:
-p, specifies the project slug (required)
-n, specifies the nominator id (required)
-c, import the file as a csv file
-d, import file is pickled dictionary format
-h, --help, display help
-v, --verify, verify url is valid and available
Make sure your Django settings file is accessible from the PYTHONPATH; set the DJANGO_SETTINGS_MODULE variable.
$ export PYTHONPATH="${PYTHONPATH}:/home/digital3/current/digital3/"
$ export DJANGO_SETTINGS_MODULE="config.settings.production"
Check for field names at the top of the CSV file and verify they correspond to existing Metadata_Field names for the project (unless you are intentionally wanting to create extra field names). The URL/entity field should be called "url" in the CSV file.
Make sure you run the script with the Python executable that has the django-nomination app installed (i. e. if you are running URL Nomination Tool in a virtual environment). If you are using a virtual environment, either activate the environment before running the script or manually indicate the correct Python path when running it.
If neither the -c
nor the -d
flag are passed, the file is expected to contain only URLs, one per line.
$ /home/digital3/current/env/bin/python fielded_batch_ingest.py -p test-project -n 44 ~/test.urls
-p is the project slug
-n is the nominator id
~/test.urls is the path to the text file containing one URL per line
$ /home/digital3/current/env/bin/python fielded_batch_ingest.py -c -p eth2016_bulk -n 546 ~/test.csv
-p is the project slug
-n is the nominator id
-c indicates we are ingesting a CSV file
~/test.csv is the path to the CSV file containing the URLs and metadata to be uploaded
Sample CSV file:
url,Agency,List_Name
http://1.usa.gov,,analytics-usa-gov-sites-list
http://1010ez.med.va.gov,,analytics-usa-gov-sites-list
http://blogs.cdc.gov/,"CDC, Centers for Disease Control and Prevention",GPO-Active-Collections
In this example, note that Agency
and List_Name
are metadata fields already existing in the URL Nomination Tool for the project to which we are adding URLs.
Data can also be uploaded to the URL Nomination Tool from a serialized Python object in pickle format. This allows multivalued attributes that can be stored in a Python dictionary as lists.
$ /home/digital3/current/env/bin/python fielded_batch_ingest.py -d -p test-project -n 44 ~/test.pkl
-p is the project slug
-n is the nominator id
-d indicates we are ingesting a file containing a pickled object
~/test.pkl is the path to the file containing the pickle