Skip to content
Sytze Van Herck edited this page Oct 13, 2022 · 10 revisions

Introduction

CSV on the Web (CoW) was developed for academic researchers, and in particular for historians. The goal of the tool is to encourage researchers to change their working practices. Instead of compiling only tabular data, CoW now allows anyone to transition to Linked Data.

The introduction briefly explains Linked Data, addresses the concept of FAIR data, explains the process of converting tabular data to Linked Data, and gives an overview of the wiki structure.

What is Linked Data

Linked Data was conceived by Tim Berners-Lee in 2008 for what he called the Semantic Web.

The Semantic Web is a vision: the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but also for automation, integration and reuse of data across various applications.[1]

The Semantic Web is based on the Resource Description Framework (RDF) consisting of triples. A triple contains a subject (i.e. William Shakespeare), a predicate or property (i.e. is a) and an object or value (i.e. Playwright). Each part of the triple can be a Uniform Resource Identifier (URI) that may or may not link to a URL. In order to create linked data sets, you need several tools from the semantic web stack visualized here.

Semantic Web Stack

CoW takes care of nearly all grey zones of the Semantic Web Stack above. There is no need to worry about the data interchange, taxonomies, ontologies, or rules when you just get started. However, you still need to determine the identifiers or URIs, and you can adapt the metadata or JSON-schema file. You also need to query the data using other tools.

How can Linked Data also be FAIR Data

Less well shared are the arguments on why storing your data in proprietary formats (e.g. Excel, Access, SPSS) is not the best of ideas, and that's why our starting point is a good old text file, a .csv.

CoW

To create Linked Data from your .csv file, we follow a three step procedure.

  1. We ask CoW to generate a JSON-schema (a sort of recipe) based on our .csv file.
  2. We improve on this automatically derived JSON-schema, by manually adding the specificities of our .csv such as provenance, and links to other data.
  3. We ask CoW to generate Linked Data, based on our .csv file and bespoke JSON-schema.

The process of converting a .csv file to Linked Data thus requires a piece of software called CoW. While the installation instructions are provided, for people not used to working with the command line it has proved to be a hassle. We therefore provide an image in a virtualization engine that allows you to use CoW without installing it. The image is hosted by Docker, and available here.

Docker

The Docker image follows exactly the three steps mentioned above:

  1. Select the .csv file you want to upload from your computer. You can then choose to manually modify the JSON-schema (that will be automatically created) using the “Upload” button, or to directly generate and download a RDF version of your .csv file using the “Convert” button.
  2. To modify your JSON schema in ruminator, click on red cow button (or the name of the .csv file). More information on the template format can be found here. After finishing your edits, click the “done” button (this will redirect you back to cattle).
  3. Click the “Convert” button to create the RDF file based on the JSON file you created in step 2 and the already uploaded .csv file. You can change the format of the downloaded RDF file using the “Advanced features” button.

From the next section onwards, we'll explain the various parts of the JSON-schema and how to augment it.

Contents

  1. Adapting the Metadata
    • Base URI
    • Prefixes
    • Datatypes
    • Column Titles and Descriptions
    • Triples
      • Subject (aboutURL)
      • Predicate (propertyURL)
      • Object (valueURL/CSVW: value)
  2. Enriching the Data
    • Adding Data using Virtual Columns
    • Adding Provenance
  3. Additional Features
    • Converting CSVs containing URIs
    • Choosing other RDF types
  4. Tutorial
    • Linking 1921 Death Duty Data to Civil Registry Data
  5. FAQ

[1] V. Kashyap, C. Bussler, and M. Moran. 2008. The Semantic Web: Semantics for Data and Services on the Web. Data-Centric Systems and Applications. Springer Berlin Heidelberg, 2008, 3.

Clone this wiki locally