Skip to content

Latest commit

 

History

History
46 lines (27 loc) · 2.02 KB

osf.md

File metadata and controls

46 lines (27 loc) · 2.02 KB

OSF documentation

Direct search

Web form: https://osf.io/search/. Example with .mdp files: https://osf.io/search/?q=mdp&filter=file&page=1

Query follows Lucene Search Query Help

Note
OSF does not allow to search by file extension

API

API to search for files: https://api.osf.io/v2/search/files/. Example with .mdp files: https://api.osf.io/v2/search/files/?q=mdp&page=1

API documentation.

A token is required to use the API programmatically. Create one from your user settings. Select the osf.full_read scope. Save this token in a .env file:

OSF_TOKEN=<YOUR OSF TOKEN HERE>

Scraping strategy

  1. Search for relevant files. Loop on file extensions with keywords. Results are paginated. Extract a set of unique datasets.
  2. For each dataset, retrieve informations and files list.
  3. Retrieve files informations. Results are paginated.

Dataset examples

Dataset with folders:

Dataset with components:

Dataset with zip files:

Note: we cannot easily catch the content of zip files as displayed by OSF since the overview is Javascript based. See for instance the source of the page with the content of AllModel.zip. More advanced solutions such as selenium might be useful.