Skip to content

statzhero/data-projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 

Repository files navigation

A list of projects related to data

This is a selection of projects I've worked on in the past related to data. The images link to the project sites. Note that this only includes public projects. A good, general thought piece of mine is Data as culture: how will we live in a data driven society?

Table of Contents

  1. Show me the money, a data project
  2. How open data can help shape the way we analyse electoral behaviour, another data project
  3. What is a CSV? A case study of CSVs on data.gov.uk, an elaborate blog post
  4. A survey of the uses in quantified self, a survey and presentation
  5. How to prioritise open data to drive global development, a tool for global development
  6. The Anonymisation Decision-Making Framework, an online course
  7. Benchmarking open data automatically, a technical report
  8. The Open Rail Performance Index, a failed study
  9. Academic and technical stuff
  10. On data visualisation
  11. Some more random fun projects, with data

Show me the money, a data project

In Show me the money I analysed all existing transactions of the three biggest peer-to-peer (P2P) platforms in the UK: Zopa, RateSetter, and Funding Circle. The data contains almost 14 million loan parts. It provided the most comprehensive snapshot of the UK P2P market at the time of publication. We gained high-profile media coverage for this story and have direct evidence of change in the peer-to-peer sector. Nice cartogram!

2013, peer-to-peer, R, analysis, visualisation, project management

How open data can help shape the way we analyse electoral behaviour, another data project

A joint project between Deloitte and the Open Data Institute on how election data can help give insights into voting behaviour. I enjoyed this project a lot, as it was interesting, a fast turn-around and I got to apply various models among them random forests.

2014, R, analysis, visualisation, elections

What is a CSV? A case study of CSVs on data.gov.uk, an elaborate blog post

A short project during my time at the Open Data Institute, where I analysed more than 20,000 links to CSV files on data.gov.uk. Results: only around one third turned out to be machine-readable. A typical CSV is between 1kb-1mb in size and has around eight columns. And I got to play around with Gephi.

2014, R, analysis, CSV, study

A survey of the uses in quantified self, a survey and presentation

The presentation slides give a brief overview about the findings. For me it was interesting to play around with SPSS and R integration, that is variable labels. There's also a nice integration with Google documents and R. I was particularly pleased when I reused code that was years old – and it worked.

2014, quantified self, R, analysis, survey, presentation

There is a lot more where this came from... Here is an example: a book chapter.

How to prioritise open data to drive global development, a tool for global development

I designed the methodology and enjoyed classifying case studies, applications, anecdata. For each sector, we mapped out relevant datasets and examples of real-world open data applications. We then offer three goal options to help decision- and policy-makers select datasets to release as open data.

2014, open data, report, spreadsheet, recommendation

The Anonymisation Decision-Making Framework, an online course

Together with the UK Anonymisation Network and Purple Guerrilla I've managed and developed an online course as an introduction to anonymisation. The online learning aims to promote the decision-making framework and give data practitioners confidence when dealing with personal data.

2015, anonymisation, governance, learning, project management, spreadsheet

Benchmarking open data automatically, a technical report

A high-level overview of if and how we can evaluate and rank countries, organisations and projects, based on how well they use open data in different ways. As open data becomes more widespread and useful, so does the need for effective ways to analyse it.

2014, open data, benchmarking, report, recommendations

The Open Rail Performance Index, a failed study

An example of a failed study because we never managed to publish the results. Not everything is always a success, and I hope others can related to this. The study was titled How the UK could gain up to £387 million per year and then it got political. The upshot: I learned a lot about R, the train industry and the value of travel time savings.

2014, open data, R, rail, benchmarking, report

Academic and technical stuff

On data visualisation

  • An introduction to the history of visualisation and many examples. Sadly it's hidden deep away on page 30 of an OpenDataMonitor report. I plan to make this more accessible in the future.
  • I run an R course on visualisation, in particular ggplot2. I think I'm not allowed to release the training material, so I err on the safe side.
  • A vain, self-referential visualisation of temperature levels of 10 000 days for my 10 000th day alive.
  • SCHEME_TUFTE: Stata module to provide a Tufte-inspired graphics scheme
  • Some templates to make SPSS charts look more appealing

Some more random fun projects, with data

Releases

No releases published

Packages

No packages published