Skip to content
This repository has been archived by the owner on Jul 29, 2018. It is now read-only.
Waldo Jaquith edited this page Jan 9, 2015 · 15 revisions

Data Seal is a project of U.S. Open Data to provide a system where open data released by governments can be authenticated by end users—whether or not the data was most recently downloaded from the official source.

Government data releases need to abide by local laws (for example, the District of Columbia Official Code) and should also abide by the Uniform Electronic Legal Material Act (UELMA). Part of the UELMA provisions state that “legal material be…authenticated, by providing a method to determine that it is unaltered”.

Data Seal provides agencies with a web-based interface to provide this functionality.

Care should be taken to avoid misinterpreting data integrity as inclusive of authentication. Not only should a user have the ability to validate that data has not been tampered with since release, but a user must also have the ability to authenticate that given data was provably distributed by a given agency at some point in time (and not simply a well-constructed checksum collision).

In a nutshell

A user, having downloaded public data from a government agency or a third-party (a news outlet, a library, or an open data portal), wants to verify that the data is legitimate -- untampered and provably released by said agency.

Data Seal provides a "clearinghouse" system by which a user can upload a suspect file and determine it's legitimacy.

Technical users and other organizations could also download the verification data (PGP signatures) and host them elsewhere, providing a distributed system and reducing the "single point of failure." The verification system utilizes OpenPGP, a well-studied open cryptographic standard.

Architecture

A basic Django application, configured and integrated with this fork of python-gnupg, a wrapper around the gnupg command-line client. The gist:

  1. The website is configured with its own PGP key at install time, with some identifier like "Department of X Authentication.io". (Or, we could do keys attached to the admin user's identity, but I think a website-specific key is somewhat easier to manage.)
  2. When an admin uploads a file, the server will create a detached signature for that file and store it in the database.
  3. When a user uploads a file to test authenticity, the detached signature is checked against their uploaded version, not the version uploaded by the admin.
  4. Advanced users could download the detached signature and public key and do their own checks, too.

Security concerns

We won't use our own login system. The Django framework has a well-documented login system, using the PBKDF2+SHA256 method for key strengthening (deliberately slowing down the rate at which a password can be hashed — and therefore brute forced, if a database leak occurs).

A new PGP key will be generated for installations of this project. Existing keychains and keys should not be used, since the key data (and private key passwords) will not be strongly protected -- the PGP key password will be stored in the server's Django configuration, similar to how database passwords are handled in production. The PGP key should not have encrypt rights; it should only serve as a signing-only key. -- In fact, §2.5 of RFC4880 (OpenPGP) speaks of “Signature-Only Applications,” in that “it is reasonable for there to be subset implementations that are non-conformant only in that they omit encryption.”

Data Seal should be deployed with enforced SSL/TLS according to the latest best practices. (See here and here.) This is slightly outside of the scope of project development, but should be a requirement at install-time.

Architecture documentation

Other notes

Project requirements

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this section are to be interpreted as described in RFC 2119.

"Administrator" is an individual at an agency with "superuser" access to add/remove other administrator or employee accounts.
"Employee" represents an average agency user of the product, who may manage data files/documents available on the service.
"Users" are normal, public visitors from the internet.

  • Employees must have some secure ability to log in to the service.
  • Employees must be able to upload data files and provide metadata about said files.
  • Employees must be able to edit file metadata and upload new versions of data files.
  • Employees must be able to remove files from the service.
  • Administrators must be able to add/remove/edit employee accounts
  • Administrators must have all access as employees
  • Users must be able to upload a data file to double-check that the file was actually released by the agency and has not been tampered with.
  • Advanced users should be able to retrieve file hash information or PGP signatures (depending on final implementation details).
  • User uploads should be rate-limited based on IP address or other clustering mechanism, so that a flood of uploads does not cause the service to become overloaded due to excessive PGP signature checks or file hash/checksum checks.

[TODO] Requirements during installation:

  • The installing administrator must have a way of setting up initial administrator user account.