Skip to content

bcgov/FIT_opendatadownloader

Repository files navigation

FIT Open Data Downloader

Lifecycle:Experimental

GeoBC Foundational Information and Technology (FIT) Section tool for downloading open data and reporting on changes since last download.

Workflow

  1. Based on sources and schedule defined in a provided config file, download spatial data from the internet
  2. Compare downloaded data to cached version on object storage
  3. If changes are detected, write the latest download to object storage along with a change report

Installation

Using pip managed by the target Python environment:

git clone [email protected]:bcgov/FIT_opendatadownloader.git
cd FIT_opendatadownloader
pip install .

Usage

A command line interface is provided:

$ fit_downloader process --help
Usage: fit_downloader process [OPTIONS] CONFIG_FILE

  For each configured layer - download latest, detect changes, write to file

Options:
  -l, --layer TEXT            Layer to process in provided config.
  -p, --prefix                S3 prefix.
  -f, --force                 Force download to out-path without running
                              change detection.
  -s, --schedule [D|W|M|Q|A]  Process only sources with given schedule tag.
  -V, --validate              Validate configuration
  -v, --verbose               Increase verbosity.
  -q, --quiet                 Decrease verbosity.
  --help                      Show this message and exit.

Examples:

  1. Validate a configuration file for a given source:

     fit_downloader process -vV example_config.json
    
  2. Process data defined in sources/CAPRD/victoria.json configuration file, saving to s3://$BUCKET/CAPRD/victoria:

     fit_downloader process -v \
       --prefix s3://$BUCKET/Change_Detection/CAPRD/victoria \
       sources/CAPRD/victoria.json 
    

Configuration

Layers for downloaded are configured per jusrisdiction in sources. Each config .json file has several tag defining how to handle data for the given jurisdiciton:

tag required description
out_layer Y Name of target file/layer (parks, roads, etc)
source Y url or file path to file based source, format readable by GDAL/OGR (required)
protocol Y Type of download (http - file via http, esri - ESRI REST API endpoint, bcgw - download BCGW table via WFS)
fields Y List of source field(s) to retain in the download (required)
schedule Y Download frequency (required, must be one of: [D, W, M, Q, A] - daily/weekly/monthly/quarterly/annual)
source_layer N Name of layer to use within source (optional, defaults to first layer in file)
query N Query to subset data in source/layer (OGR SQL) (optional, currently only supported for sources where protocol is http)
primary_key N List of source field(s) used as primary key (optional, must be a subset of fields)
hash_fields N List of additional source field(s) to add to a synthetic geometry hash based primary key (optional, must be a subset of fields)
metadata_url N Link to source metadata

For the full schema definition, see source.schema.json.

Development and testing

virtual environment

Using GDAL on your system:

$ git clone [email protected]:bcgov/FIT_opendatadownloader.git
$ cd FIT_opendatadownloader
$ python -m venv .venv
$ source .venv/bin/activate
$ pip install -e .[test]
(.venv) $ py.test

Dockerized environment

Using GDAL on a docker image:

To build:

$ git clone [email protected]:bcgov/FIT_opendatadownloader.git
$ cd FIT_opendatadownlaoder
$ docker build -t fit_opendatadownloader .

Drop in to a bash session:

$ docker run --rm -it -v ./:/home/fit_opendatadownloader fit_opendatadownloader bash

About

Monitor open data, report when changes are detected

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages