Skip to content

Commit

Permalink
Merge pull request #15 from xcube-dev/konstntokas-013-change_data_id_…
Browse files Browse the repository at this point in the history
…url_part

Change data IDs and make the STAC data store searchable
  • Loading branch information
forman authored Jul 10, 2024
2 parents b39c4b6 + 24288cc commit bdaaeb9
Show file tree
Hide file tree
Showing 29 changed files with 76,086 additions and 5,896 deletions.
339 changes: 339 additions & 0 deletions examples/nonsearchable_stac_catalog.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,339 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Non-seachable STAC catalog\n",
"\n",
"This notebook shows an example how to access items in a non-searchable STAC catalog, which does not implement the [STAC API - Item Search](https://github.com/radiantearth/stac-api-spec/tree/release/v1.0.0/item-search) conformance class. When searching in such type of catalog, the catalog needs to be crawled through and the items properties needs to be matched to the search parameters. This process will there be slow, especially for large catalogs.\n",
"\n",
"### Setup\n",
"In order to run this notebook you need to install [`xcube`](https://xcube.readthedocs.io/en/latest/) and the [`xcube_stac`](https://github.com/xcube-dev/xcube-stac) plugin. You may install [`xcube_stac`](https://github.com/xcube-dev/xcube-stac) directly from the git repository by cloning the repository, directing into `xcube-stac`, and following the steps below:\n",
"\n",
"```bash\n",
"conda env create -f environment.yml\n",
"conda activate xcube-stac\n",
"pip install .\n",
"```\n",
"\n",
"Note that [`xcube`](https://xcube.readthedocs.io/en/latest/) is included in the `environment.yml`. \n",
"\n",
"Now, we first import everything we need:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from xcube.core.store import new_data_store, get_data_store_params_schema\n",
"import itertools"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we get the store parameters needed to initialize a STAC [data store](https://xcube.readthedocs.io/en/latest/dataaccess.html#data-store-framework). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We determine the url of the [EcoDataCube.eu](https://stac.ecodatacube.eu/) STAC catalog and initiate a STAC [data store](https://xcube.readthedocs.io/en/latest/dataaccess.html#data-store-framework) where the `xcube-stac` plugin is recognized by setting the first argument to `\"stac\"` in the `new_data_store` function."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/konstantin/micromamba/envs/xcube-stac/lib/python3.12/site-packages/pystac_client/client.py:190: NoConformsTo: Server does not advertise any conformance classes.\n",
" warnings.warn(NoConformsTo())\n"
]
}
],
"source": [
"url = \"https://s3.eu-central-1.wasabisys.com/stac/odse/catalog.json\"\n",
"store = new_data_store(\"stac\", url=url)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data IDs point to a [STAC item's JSON](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md) and are specified by the segment of the URL that follows the catalog's URL. The data IDs can be streamed using the following code where we show the first 10 data IDs as an example."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['lcv_land.mask_eumap/lcv_land.mask_eumap_2014.01.01..2016.12.31/lcv_land.mask_eumap_2014.01.01..2016.12.31.json',\n",
" 'lcv_blue_landsat.glad.ard/lcv_blue_landsat.glad.ard_1999.12.02..2000.03.20/lcv_blue_landsat.glad.ard_1999.12.02..2000.03.20.json',\n",
" 'lcv_blue_landsat.glad.ard/lcv_blue_landsat.glad.ard_2000.03.21..2000.06.24/lcv_blue_landsat.glad.ard_2000.03.21..2000.06.24.json',\n",
" 'lcv_blue_landsat.glad.ard/lcv_blue_landsat.glad.ard_2000.06.25..2000.09.12/lcv_blue_landsat.glad.ard_2000.06.25..2000.09.12.json',\n",
" 'lcv_blue_landsat.glad.ard/lcv_blue_landsat.glad.ard_2000.09.13..2000.12.01/lcv_blue_landsat.glad.ard_2000.09.13..2000.12.01.json',\n",
" 'lcv_blue_landsat.glad.ard/lcv_blue_landsat.glad.ard_2000.12.02..2001.03.20/lcv_blue_landsat.glad.ard_2000.12.02..2001.03.20.json',\n",
" 'lcv_blue_landsat.glad.ard/lcv_blue_landsat.glad.ard_2001.03.21..2001.06.24/lcv_blue_landsat.glad.ard_2001.03.21..2001.06.24.json',\n",
" 'lcv_blue_landsat.glad.ard/lcv_blue_landsat.glad.ard_2001.06.25..2001.09.12/lcv_blue_landsat.glad.ard_2001.06.25..2001.09.12.json',\n",
" 'lcv_blue_landsat.glad.ard/lcv_blue_landsat.glad.ard_2001.09.13..2001.12.01/lcv_blue_landsat.glad.ard_2001.09.13..2001.12.01.json',\n",
" 'lcv_blue_landsat.glad.ard/lcv_blue_landsat.glad.ard_2001.12.02..2002.03.20/lcv_blue_landsat.glad.ard_2001.12.02..2002.03.20.json']"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_ids = store.get_data_ids()\n",
"list(itertools.islice(data_ids, 10))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the next step, we can search for items using search parameters. The following code shows which search parameters are available."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"application/json": {
"additionalProperties": false,
"properties": {
"bbox": {
"items": [
{
"type": "number"
},
{
"type": "number"
},
{
"type": "number"
},
{
"type": "number"
}
],
"title": "Bounding box [x1,y1,x2,y2] in geographical coordinates",
"type": "array"
},
"collections": {
"description": "Collection IDs to be included in the search request.",
"items": {
"minLength": 0,
"type": "string"
},
"title": "Collection IDs",
"type": "array",
"uniqueItems": true
},
"time_range": {
"description": "Time range given as pair of start and stop dates. Dates must be given using format 'YYYY-MM-DD'. Start and stop are inclusive.",
"items": [
{
"format": "date",
"type": [
"string",
"null"
]
},
{
"format": "date",
"type": [
"string",
"null"
]
}
],
"title": "Time Range",
"type": "array"
}
},
"type": "object"
},
"text/plain": [
"<xcube.util.jsonschema.JsonObjectSchema at 0x71d3deb4b2f0>"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"search_params = store.get_search_params_schema()\n",
"search_params"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's search for Landsat Thematic Mapper data for the European region during the first quarter of 2000."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'data_id': 'lcv_blue_landsat.glad.ard/lcv_blue_landsat.glad.ard_1999.12.02..2000.03.20/lcv_blue_landsat.glad.ard_1999.12.02..2000.03.20.json',\n",
" 'data_type': 'dataset',\n",
" 'bbox': [-23.550818268711048,\n",
" 24.399543432891665,\n",
" 63.352379098951936,\n",
" 77.69295185585888],\n",
" 'time_range': ['1999-12-02', '2000-03-20']},\n",
" {'data_id': 'lcv_blue_landsat.glad.ard/lcv_blue_landsat.glad.ard_2000.03.21..2000.06.24/lcv_blue_landsat.glad.ard_2000.03.21..2000.06.24.json',\n",
" 'data_type': 'dataset',\n",
" 'bbox': [-23.550818268711048,\n",
" 24.399543432891665,\n",
" 63.352379098951936,\n",
" 77.69295185585888],\n",
" 'time_range': ['2000-03-21', '2000-06-24']}]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"descriptors = list(store.search_data(\n",
" collections=[\"lcv_blue_landsat.glad.ard\"],\n",
" bbox=[-10, 40, 40, 70],\n",
" time_range=[\"2000-01-01\", \"2000-04-01\"]\n",
"))\n",
"[d.to_dict() for d in descriptors]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the next step, we can open the data for each data ID. (Note that this is not fully implemented yet. So far we can access assets which will give the href to the data resource). The following code shows which parameters are available for opening the data."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"application/json": {
"additionalProperties": false,
"properties": {
"asset_names": {
"description": "Names of assets which will be included in the data cube.",
"items": {
"minLength": 0,
"type": "string"
},
"title": "Names of assets",
"type": "array",
"uniqueItems": true
}
},
"type": "object"
},
"text/plain": [
"<xcube.util.jsonschema.JsonObjectSchema at 0x71d31f6732f0>"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"open_params = store.get_open_data_params_schema()\n",
"open_params"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We select the Band 1 (blue) and get the corresponding assets and the corresponding hrefs pointing to the data resources by running the following code."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['https://s3.eu-central-1.wasabisys.com/eumap/lcv/lcv_blue_landsat.glad.ard_p50_30m_0..0cm_1999.12.02..2000.03.20_eumap_epsg3035_v1.1.tif',\n",
" 'https://s3.eu-central-1.wasabisys.com/eumap/lcv/lcv_blue_landsat.glad.ard_p50_30m_0..0cm_2000.03.21..2000.06.24_eumap_epsg3035_v1.1.tif']"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"asset_collection = []\n",
"for descriptor in descriptors:\n",
" assets = store.open_data(descriptor.data_id, asset_names=[\"blue_p50\"])\n",
" assert len(assets) == 1\n",
" asset_collection.append(assets[0])\n",
"[asset.href for asset in asset_collection]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook will be continued once the data access is implemented."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading

0 comments on commit bdaaeb9

Please sign in to comment.