Skip to content
Nick Evans edited this page Jul 10, 2016 · 59 revisions

This page provides an overview of the purpose and use of a Dataset Site. It's aim is to help anyone in an organisation create a Dataset Site for their data (not just the developers).

Why do I need a Dataset Site?

To publish open data for anyone to freely access, use and share, you must create a webpage that describes the data you are publishing. You must include relevant licensing information and documentation. You must also specify how dataset users (innovators who want to build on top of/use your data) should attribute your data. This a Dataset Site.

If you are publishing data using the Openactive specifications, you need a Dataset Site.

What does a Dataset Site look like?

Take a look at examples from British Cycling and GoodGym.

What does Dataset Site do?

The purpose of a Dataset Site is to provide:

  • A web page that can be referenced when discussing the dataset.
  • A human and machine readable license associated with the data (the Dataset Page contains invisible metadata which allows its details to be read automatically).
  • An accessible "single point of truth" that explains where the data can be found.
  • Details ("documentation") and historical record ("changelog") relating to the format of the data, including the specifications it follows, and the data fields it contains.
  • A place where the community can contribute with comments, and raise issues.
  • A mailing list to which the data users can subscribe to get updates about changes to the data format, spec and fields.

What is the Dataset Site Generator?

The Dataset Site Generator and associated guides create a minimal Dataset Site using a freely available, open source tools. A generated site contains features sufficient for publishing a single dataset, which in most cases is enough for initial publishing of data relating to Openactive.

Additional datasets can be easily added later, please raise an issue on this repository to request a guide for this.

Do I need to be really techy to do this?

Not at all. There are no risks associated with just having a go at using the guides in the next section. If it all goes wrong, you can just delete the repositories (defined in the next section) you've created and start again.

I am non-techy. What is GitHub?

GitHub is a place where the open source community can collaborate.

A further explanation of GitHub terms to make this easier:

  • Repository: A repository in GitHub is the name for a collection of Code, Issues, and a Wiki. The page you are looking at right now is inside a repository (this repository is called "dataset-site-generator". See the "openactive / dataset-site-generator" title at the top of the page).
  • Organisation: This repository is called "dataset-site-generator", and it exists inside the organisation "openactive". See the "openactive / dataset-site-generator" title at the top of the page.
  • Wiki: You are currently looking at the Wiki inside this repository (see the "Wiki" tab at the top of the page). A wiki is a collection of pages that can be easily edited. Some wikis are unrestricted (like this one), so they can be edited by anyone on GitHub (and all existing editors are notified of changes). Others are restricted to be editable only by GitHub users who have been granted access.
  • Code: The code tab at the top of the page will show you the code in this repository, which can be edited.
  • Issues: The issues tab at the top of the page is a place people can leave comments about the repository.
  • Fork: Means to "copy", as in copy-and-paste a repository. A "fork" is a "copy" of a repository, and the forked repository always links back to the original. You can "fork" this repository to make your own Dataset Site by following one of the guides below.

This all sounds great! How do I make a Dataset Site? What do I need to create?

  1. A GitHub account and GitHub organisation for you and your organisation, respectively.
  1. A repository, containing a Dataset Site, which can be "forked" (copied) from this repository.
  1. A Mailchimp mailing list
  • This allows dataset users (innovators who want to build on top of/use your data) to be kept up-to-date with changes to your data's format, spec and fields. Follow the guide in this document to create one of these.
  1. A repository containing Documentation, which can be created new, following the examples of others.
  • This repository provides dataset users with documentation of data format, spec and fields, as well as allowing them to comment and raise issues. It also includes a historical record of changes to the data format, spec and fields (a "changelog"). Follow the guide in this document to create one of these.