Skip to content
This repository has been archived by the owner on Jul 29, 2018. It is now read-only.
Mike Tigas edited this page Apr 29, 2014 · 15 revisions

This is a project of the U.S. Open Data Institute to provide a system where open data released by governments can be authenticated by end users — whether or not the data was downloaded from the official source.

Government data releases need to abide by local laws (for example, the District of Columbia Official Code) and should also abide by the Uniform Electronic Legal Material Act (UELMA). Part of the UELMA provisions state that “legal material be…authenticated, by providing a method to determine that it is unaltered”.

This project aims to provide agencies with a web-based interface to provide this functionality.

Care should be taken to avoid misinterpreting data integrity as inclusive of authentication. Not only should a user have the ability to validate that data has not been tampered with since release, but a user must also have the ability to authenticate that given data was provably distributed by a given agency at some point in time (and not simply a well-constructed checksum collision).


High-level needs

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

  • Administrators must have some secure ability to log in to the service.
  • Administrators must be able to upload data files and provide metadata about said files.
  • Administrators must be able to edit file metadata and upload new versions of data files.
  • Administrators must be able to remove files from the service.
  • Users must be able to upload a data file to double-check that the file was actually released by the agency and has not been tampered with.
  • Advanced users should be able to retrieve file hash information or PGP signatures (depending on final implementation details).

Architecture

A basic Django application, configured and integrated with this fork of python-gnupg, a wrapper around the gnupg command-line client.

The Django frameowrk has a well-documented login system, using the PBKDF2+SHA256 method for key strengthening (deliberately slowing down the rate at which a password can be hashed — and therefore brute forced).

It is recommended that a previously unused GPG key be used for installations of this project. (This project should eventually contain a "setup script" wrapper that shields all actual GnuPG complexity from an implementor of this project.)

GnuPG private key material (which the authentication site instance uses to authenticate data in addition to provide data integrity

Some basic groundwork for "Phase 1" of the project:


Timeframe / Scope

Phase 1 represents the initial buildout of a minimum viable prototype, from April 28 through mid-May.

Clone this wiki locally