Skip to content

ENSAE project supervised by Sarah J. Berkemer and Paula Tubaro. We analyse the composition of big cities (for now Paris) using OSM data and network analysis.

Notifications You must be signed in to change notification settings



Repository files navigation



Student Project on analysing the morphology of cities (here Paris and its closest suburbs) using OSM data and meso-level data from INSEE.

By Simon GENET, Léopold MAURICE, Marie-Olive THAURY

Supervised by Paula Tubora and Sarah J. Berkemer.

Paper submited to CCS France.

Made during studies at ENSAE.

Organization of the repository

All requirements are contained in environment.yml for conda, or requirements.txt for pip. Sadly, no docker.

  • data folder contains the data for Paris analysis.
  • examples_archive and first-example-code contains several explorations
  • helpers contains most of the home made functions
  • kmean_interp is a library to interpret KMeans clusters through classificators of dummy variable of each cluster. Not really used.
  • extract_filosofi_data.ipynb explains how to extract filosofi data, and how to merge them with OSM data
  • paris_local_composition explains the analysis and the use of the function on Paris data
  • pc_local_composition does the same but simpler on petite couronne data, you may want to look at it to have a good understanding of the analysis.


  • Datascrapping
    • At Paris level
    • At the Petite Couronne level
    • Merging OSM data on INSEE's INSPIRE Squares.
  • Analysis
    • Restaurants accessibility, Gini inequality
    • Descriptive data
    • Regressions
    • Clustering
    • Dimensionnality Reduction


INSEE socio-economic data :

OpenStreetMap scrapping

There are multiple ways to get OSM data :

  • use the OSMnx library : really complete and easy to use

    • near_eiffel_tower notebook explores OSMnx possibilities
    • Uses Overpass API but simplier
    • It is what we chosen
  • using Geofabrick : download and zip already .shp formats : great to use with geopandas

    • See the second part of near_eiffel_tower notebook
    • default : only a selection of towns/regions/countries : enough at least to starts with
    • advantage : files come in OSM data but also in .shp files : easier to open in geopandas
    • See the files : Europe, IdF,
    • see documentation
  • using OSM datafiles (.osm = op-to-data, .osh = history) with osmium python library. .osm.pbf = contains every OSM elements versions through time.

    • see the example-code folder
    • main default : not easy extraction even with osmium

The OSM files can be dowload through different ways :

  • through the OSM API : limited, for instance we can't d.ownload the whole Paris (seems logical)
  • through API Overpass : mirror data from OSM, without the limitation. Hard to use on its own -> encapsulated in OSMnx.
  • through Planet OSM : regular copies.
  • through Geofabrik : regular copies but only a selection of towns/regions/countries. (Geofabrik already suppress user data but the rest of the metadata are the same). Copies diponible in .osm and .shp.
  • through Ohsome : for historical data analysis. Comes with an API and a python library.


Tutorials and Inspirations

OSM parser with python

Urban walkabity using OSMnx

Course on OSMnx by G. Boeing (created OSMnx)

Sergio J. Rey, Dani Arribas-Bel, Levi J. Wolf's book on Geographic Data Science with Python

See also :




Ltd, Gispo. « Analysing urban walkability using OpenStreetMap and Python ». Medium (blog), 22 février 2022.

Administrations (INSEE et APUR)

Mixité sociale et ségragation dans la Métropole du Grand Paris

Commerces de proximité par l'INSEE

Cartographie du logement social à Paris par l'APUR

Scientific articles (selection)

Berkemer, Sarah J., et Peter F. Stadler. « Street Name Data as a Reflection of Migration and Settlement History ». Urban Science 4, nᵒ 4 (11 décembre 2020): 74.

Boeing, Goeff. « OSMnx: New Methods for Acquiring, Constructing, Analyzing, and Visualizing Complex Street Networks | Elsevier Enhanced Reader ». Consulté le 12 novembre 2022.

Knap, Elizabeth, Mehmet Baran Ulak, Karst T. Geurs, Alex Mulders, et Sander van der Drift. « A Composite X-Minute City Cycling Accessibility Metric and Its Role in Assessing Spatial and Socioeconomic Inequalities – A Case Study in Utrecht, the Netherlands ». Journal of Urban Mobility 3 (1 décembre 2023): 100043.

Girres, Jean-François, et Guillaume Touya. « Quality Assessment of the French OpenStreetMap Dataset ». Transactions in GIS 14, no 4 (2010): 435‑59.

PySAL: A Python Library of Spatial Analytical Methods, Rey, S.J. and L. Anselin, Review of Regional Studies 37, 5-27 2007.

Python packages and tools

Geographic/spatial packages

Geopandas obviously

PySal and in particular PySal.lib

An excellent tool with a lot of different spatial statistics functions implemented !

r5py for transport time calculations

Cartiflette: for working with french geographic data sets

Cartiflette, git repo and examples from the ENSAE data science class

OpenStreetMap directly related to :

osmium: tool to parse osm files with python bindings pyosmium

osmium website, documentation

OSMnx: a library that can be used to extract data easily both graph and POI data

OSMnx git repo and the Associated examples

OSMnx is developped by G. Boeing from USC. It uses the Overpass API but largely encapsulated to gather OSM data. OSMnx also incopores algorithms to simply/make more realistic networks from the OSM graphs and to analyses the network itself.

For now, OSMnx is probably the best way to access OSM data.

Ohsome : another library by Heildelberg university for historical data

can be found on on the git repo for ohsom-py, created by Heidelberg Institute for Geoinformation Technology

Ohsome-py is a python-based encapsulation of an API named Ohsom by the HeiGIT which allows to access their database to explore historical (meaning the evolution of volontary contribution to OSM). It is closer to the response libary. The API is really oriented tower the evolution of OSM data and so less usefull than OSMnx for graph and POI analysis.

The graphic access through the OhsomeHEx website is really well made and usefull to see for instance where there are enough data.


ENSAE project supervised by Sarah J. Berkemer and Paula Tubaro. We analyse the composition of big cities (for now Paris) using OSM data and network analysis.







No packages published


  • Jupyter Notebook 100.0%