This repository contains the code for analyzing the results from running BikeDNA, in a version adapted for large data sets, on nationwide data for Denmark, comparing data from OpenStreetMap (OSM) and GeoDanmark.
The analysis is an exploratory analysis focused on detecting spatial patterns in the data quality, looking at, for example, the correlations between administrative divisions and differences in data completeness, correlations between OSM tag quality and population density, and dentifying areas with large differences between the two data sources.
For a full reproducible setup with all input data, see .
The analysis is based on Jupyter notebooks. It therefore requires an installation of Python, including tools for Jupyter notebook.
The first step is to successfuly run BikeDNA BIG doing both intrinsic and extrinsic analysis of OSM and GeoDanmark data.
First clone this repository (recommended) to your local machine or download it.
To avoid cloning the history and larger branches with example data and plots, use:
git clone -b main --single-branch https://github.com/anerv/bikedna_dk_analysis --depth 1
To ensure that all packages needed for the analysis are installed, it is recommended to create and activate a new conda environment using the environment.yml
:
conda env create --file=environment.yml
conda activate bikedna_analysis
If this fails, the environment can be created by running:
conda config --prepend channels conda-forge
conda create -n bikedna_analysis --strict-channel-priority geopandas pyarrow pandas folium pyyaml matplotlib contextily rasterio rioxarray jupyterlab ipykernel h3-py splot pysal plotly plotly_express
conda activate bikedna_analysis
This method does not control the library versions and should be used as a last resort.
The code for BikeDNA has been developed and tested using macOS 13.2.1.
The repository has been set up using the structure described in the Good Research Developer. Once the repository has been downloaded, navigate to the main folder in a terminal window and run the command
pip install -e .
Lastly, add the environment kernel to Jupyter via:
python -m ipykernel install --user --name=bikedna_analysis
Run Jupyter Lab or Notebook with kernel bikedna (Kernel > Change Kernel > bikedna_analysis).
In order to run the code, the configuration file config.yml
must be filled out. The config.yml on the main branch contains settings for, for example, CRS and the name of the study area used for folder structure setup, plot naming, and result labelling. The configuration file also specifies where to find the data and results from running BikeDNA (step 0).
Plot settings can be changed in scripts/settings/plotting.py
.
Next, to create the required folder structure and to copy the results from running BikeDNA, navigate to the main folder in a terminal window and run the Python file setup_folders_input_data.py
python setup_folders_input_data.py
This should return:
...
Successfully created folder results/compare_analysis/
Successfully created folder results/osm_analysis/
Successfully created folder results/ref_analysis/
...
To validate that the results and data were successfully copied to this directory, check that the results
folder now contains a subfolder reference
and osm
with content matching the output of BikeDNA.
In addition to the input data from BikeDNA, the analysis makes use of:
- A dataset with muncipal boundaries:
municipalities.gpkg
- A dataset with the total population in each municipality:
muni_pop.csv
- Population rasters with the local population density
These data sets are already provided as part of this repository for an analysis covering all of Denmark, using the default study area settings in the config.yml. If other datasets are to be used, once the folders have been created:
- remove the existing data files
- place the files
municipalities.gpkg
andmuni_pop.csv
in the folder data > municipalities > 'study_area' > raw - place the population rasters in the folder data > population > 'study_area' > raw
- specify the name of the population rasters in config.yml
Warning The notebooks making use of the municipal and population input data are at the moment hardcoded to use the datasets provided on this reposity, with municipal boundaries for Denmark from Dataforsyningen, municipal population data from Statistics Denmark, and population rasters from the Global Human Settlement Layer (GHSL).
All analysis notebooks are in the scripts
folder.
prepare_population_grid.ipynb
: This notebook processes the population rasters and converts the data into H3 hexagons at the chosen resolutions.
municipal_analysis_OSM.ipynb
: The notebook indexes the results of the intrinsic analysis of OSM by municipality and examines correlations between municipality and high/low data quality.analyze_OSM_tags.ipynb
: The notebook runs an analysis of spatial patterns in existing and missing tags in the OSM data.
municipal_analysis_reference.ipynb
: The notebook indexes the results of the intrinsic analysis of the GeoDanmark data by municipality and examines correlations between municipality and high/low data quality.
extrinsic_analysis.ipynb
: Looks at spatial patterns in differences between the two data sets, and contrats the findings with areas of high and low population density.municipal_comparison.ipynb
: Compares the outcome of the notebooks looking at the quality and completeness at the municipal level.
Additionally, the scripts folder contain the notebook explore_spatia_weights_sensitivity.ipynb
used to explore the sensitivity of the analysis of spatial patterns in infrastructure density differences to the definition of spatial weights.
Warning Most notebooks can be run independently, but both
municipal_analysis_OSM.ipynb
andmunicipal_analysis_reference.ipynb
must be run beforemunicipal_comparison.ipynb
, andpop_grid.ipynb
must be run beforeextrinsic_analysis.ipynb
andanalyze_OSM_tags.ipynb
.
The results folder contains the results from running BikeDNA, used as inputs in this analysis (in the folders results/compare
, results/osm
, and results/reference
), and the outputs from running the analysis notebooks.
Output data and plots from the analysis of the BikeDNA outputs are stored in the _analysis
folders:
municipal_analysis_OSM.ipynb
&analyze_OSM_tags.ipynb
→results/osm_analysis/'study_area'/
municipal_analysis_reference.ipynb
→results/reference_analysis/'study_area'/
extrinsic_analysis.ipynb
&municipal_comparison.ipynb
→results/compare_analysis/'study_area'/
Since this is an exploratory analysis producing a high number of maps and figures, only selected plots are automatically saved.
Most of the plots from the accompanying paper have been prepared in QGIS.
To recreate the plots, run the Python script export_plot_data.py
and open the QGIS project file illustrations.qgz
.
A few subsets of the data used in illustrations have been selected and exported manually and are specific to the analysis of the OSM and GeoDanmark data sets in Denmark. These data can be found in the qgis/data_manual
folder.
Do you have any questions or feedback? Reach us at [email protected] (Ane Rahbek Vierø) or [email protected] (Anastassia Vybornova).
Our code is free to use and repurpose under the AGPL 3.0 license.
The repository includes data from the following sources:
© OpenStreetMap contributors
License: Open Data Commons Open Database License
Downloaded spring 2023.
Contains data from GeoDanmark (retrieved spring 2022)
© SDFI (Styrelsen for Dataforsyning og Infrastruktur)
License: GeoDanmark
Downloaded spring 2023.
© SDFI (Styrelsen for Dataforsyning og Infrastruktur) License: Vilkår for brug af frie geografiske data
Downloaded spring 2023.
Contains data from Statistics Denmark - https://statistikbanken.dk/folk1a
Downloaded spring 2023.
Contains data from the European Commission's GHSL (Global Human Settlement Layer)
Schiavina M., Freire S., Carioli A., MacManus K. (2023): GHS-POP R2023A - GHS population grid multitemporal (1975-2030).European Commission, Joint Research Centre (JRC).
Downloaded fall 2022.
Supported by the Danish Road Directorate.