Skip to content

Python script to import a DCAT dataset in rdf format.

Notifications You must be signed in to change notification settings

I14Y-ch/import_rdf_datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DCAT Dataset in xml/rdf or ttl format Import Tool

A Python-based tool for importing DCAT datasets in xml/rdf or ttl format into the I14Y platform of the Swiss Federal Statistical Office (BFS).

Features

  • Import DCAT datasets from xml/rdf or ttl files to I14Y API
  • Supported properties for dcat.Dataset:
Property Requirement level
dct:title mandatory
dct:description mandatory
dct:accessRight (chosen from: PUBLIC, NON_PUBLIC, CONFIDENTIAL, RESTRICTED) mandatory
dct:publisher (stated in config.py) mandatory
dct:identifier mandatory
dct:issued optional
dct:modified optional
dcat:landingPage optional
dcat:keyword optional
dct:language optional
dcat:contactPoint optional
documentation (foaf:page) optional
schema:image optional
dct:temporalCoverage optional
dcat:temporalResolution optional
frequency (dct:accrualPeriodicity) optional
dct:isReferencedBy optional
dct:relation optional
spatial/geographical coverage (dct:spatial) optional
dct:conformsTo optional
dcat:theme optional
dcat:version optional
adms:versionNotes optional

prov.qualifiedAttribution and prov.qualifiedRelation are not supported automatically, you can add those informations manually on I14Y.

  • Supported properties for dcat.Distribution:
Property Requirement level
dct:title (if not stated, set automatically to 'Datenexport') mandatory
dct:description (if not stated, set automatically to 'Export der Daten') mandatory
dcat:accessURL mandatory
dcat:downloadURL optional
dct:license optional
dct:issued optional
dct:modified optional
dct:rights optional
dct:language optional
schema:image optional
dcat:spatialResolutionInMeters optional
dcat:temporalResolution optional
dct:conformsTo optional
dcat:mediaType optional
dct:format optional
dct:packageFormat optional
spdx:checksum optional
dcat:byteSize optional

Prerequisites

  • Python 3.8+
  • pip package manager

Installation

  1. Clone this repository:
git clone [repository-url]
cd import_rdf_datasets
  1. (Optional but recommended) Create and activate a virtual environment:
# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Configure the application:
    • Edit src/config.py with your I14Y API token, organization ID and right file format ("xml" or "ttl")

Usage

Import Datasets

  1. Log in on the interoperability platform. Copy the token clicking on the profile symbol. Fill in the token in the file config.py. Also provide the identifier of your organsation.
  2. Place your RDF files in the data/ folder (.xml, .rdf or .ttl)
  3. Run the import script:
python src/import_datasets.py

The script will process each row and display real-time progress and error messages in the terminal.

File Structure

import_rdf_datasets/
├── data/
│   └── datasets.xml
├── src/
│   ├── config.py
│   ├── dcat_properies_utils.py
│   ├── import_datasets.py
│   └── mappings.py
├── requirements.txt
└── README.md

Contributing

Please ensure any pull requests or contributions adhere to the following guidelines:

  • Keep the code simple and well-documented
  • Follow PEP 8 style guidelines
  • Include appropriate error handling
  • Test thoroughly before submitting

About

Python script to import a DCAT dataset in rdf format.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages