Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/grid tiles #393

Merged
merged 83 commits into from
Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
e0fab1e
Stashing - GridTiles
May 9, 2023
f6cb725
Add CRS method to Index Systems.
May 30, 2023
fc4f415
Major changes:
Jun 4, 2023
f0eef62
Update gitignore.
Jun 4, 2023
94effd2
Major changes:
Jun 4, 2023
abf3f24
Major changes:
Jun 5, 2023
964cfbc
Properly name the raster column in the schema.
Jun 5, 2023
f772460
Minor fixes for UUID.
Jun 5, 2023
35e0fef
Merge branch 'main' into feature/grid_tiles
Jun 11, 2023
cfe0a04
Add BalancedSubdivision for dividing rasters into pieces that maintai…
Jun 24, 2023
789b170
Update the way GDAL is installed.
Jul 24, 2023
4ee50f9
Add new installable and new example notebook to prototypes.
Jul 26, 2023
23fb47f
Fix aux.xml issues with rasters in mosaic.
Jul 27, 2023
bb85cb3
Fix aux.xml issues with rasters in mosaic.
Jul 27, 2023
dbaebc2
Fixing unnecessary IO.
Jul 28, 2023
ee80140
Fixing unnecessary IO.
Jul 28, 2023
e4588ea
Add examples for stacking bands into a single raster.
Aug 10, 2023
f220d75
Improve data prep scripts.
Aug 11, 2023
cd47656
Fix missing spatial reference when generating index geom.
Sep 8, 2023
076bdc2
fix H3 geometry constructor (#374)
Aug 1, 2023
ed9d80e
Cleanup notebooks.
Sep 11, 2023
c3025bc
Upload the correct wheel.
Sep 13, 2023
ddab8cc
Fix NDVI and missing pixel issues.
Sep 26, 2023
43e483f
Revise all RST_ expressions.
Oct 4, 2023
3d6b648
Add the concept of raster-tile to the schema and types.
Oct 11, 2023
e2455b1
Remove a comma.
Oct 11, 2023
8e18317
Merge branch 'main' into feature/grid_tiles
Oct 11, 2023
cfacf5f
Use null for non-specified cell ids.
Oct 11, 2023
30f8b31
Merge remote-tracking branch 'origin/feature/grid_tiles' into feature…
Oct 11, 2023
2a7c34c
Add python signatures.
Oct 12, 2023
35526c0
applied python formatter
sllynn Oct 12, 2023
509cf31
fixed python `enable_gdal`
sllynn Oct 12, 2023
bef9f04
Fix clip.
Oct 12, 2023
1f886ab
added docstring in MosaicRasterBand
sllynn Oct 12, 2023
ee1a4ac
moved `MosaicRasterBandGDAL`
sllynn Oct 12, 2023
41b9dae
restructed core.raster
sllynn Oct 12, 2023
6d4067c
restructed core.raster
sllynn Oct 12, 2023
9b79122
restructed core.raster
sllynn Oct 12, 2023
7170697
Remove redundant UUID definitions.
Oct 13, 2023
19773fe
Merge remote-tracking branch 'origin/feature/grid_tiles' into feature…
Oct 13, 2023
27e5870
Add BNG examples.
Oct 17, 2023
b9b1856
Merge remote-tracking branch 'origin/feature/grid_tiles' into feature…
Oct 17, 2023
b7f856d
Fix merge conflicts.
Oct 17, 2023
742045a
Re-enable coverage.
Oct 19, 2023
5e1a47b
Fix PR comments.
Oct 20, 2023
5cb0a9e
Fix github test install of GDAL.
Oct 20, 2023
9f462eb
Add GDAL jni path to LD_LIBRARY_PATH.
Oct 20, 2023
b47a731
Add GDAL jni path to LD_LIBRARY_PATH.
Oct 20, 2023
4747771
Add GDAL jni path to LD_LIBRARY_PATH.
Oct 20, 2023
00063ce
Add GDAL jni path to LD_LIBRARY_PATH.
Oct 20, 2023
624dba6
Add GDAL jni path to LD_LIBRARY_PATH.
Oct 20, 2023
58e6160
Add GDAL jni path to LD_LIBRARY_PATH.
Oct 20, 2023
3a66513
Add GDAL jni path to LD_LIBRARY_PATH.
Oct 20, 2023
7b3c084
Add GDAL jni path to LD_LIBRARY_PATH.
Oct 20, 2023
1b0cb9d
Add GDAL jni path to LD_LIBRARY_PATH.
Oct 20, 2023
529c95f
Add GDAL jni path to LD_LIBRARY_PATH.
Oct 20, 2023
76b8044
Add GDAL jni path to LD_LIBRARY_PATH.
Oct 20, 2023
366f430
Add GDAL jni path to LD_LIBRARY_PATH.
Oct 20, 2023
b3141d5
Add GDAL jni path to LD_LIBRARY_PATH.
Oct 20, 2023
a50399b
Add GDAL jni path to LD_LIBRARY_PATH.
Oct 20, 2023
093f005
Add GDAL jni path to LD_LIBRARY_PATH.
Oct 20, 2023
9625eb0
Add GDAL jni path to LD_LIBRARY_PATH.
Oct 20, 2023
8b53c22
Remove copying of shared objects.
Oct 20, 2023
c1a6e6b
Set python to 3.9 in actions.
Oct 20, 2023
c235b34
Set env variable in actions for DATABRICKS_ROOT_VIRTUALENV_ENV
Oct 20, 2023
b7e0144
Set env variable in actions for DATABRICKS_ROOT_VIRTUALENV_ENV
Oct 20, 2023
1c36512
Set env variable in actions for DATABRICKS_ROOT_VIRTUALENV_ENV
Oct 20, 2023
9df305c
Set env variable in actions for DATABRICKS_ROOT_VIRTUALENV_ENV
Oct 20, 2023
b7eaba5
Set env variable in actions for DATABRICKS_ROOT_VIRTUALENV_ENV
Oct 20, 2023
3e7d421
Set env variable in actions for DATABRICKS_ROOT_VIRTUALENV_ENV
Oct 20, 2023
e20c0d8
Set env variable in actions for DATABRICKS_ROOT_VIRTUALENV_ENV
Oct 20, 2023
2355885
Set env variable in actions for DATABRICKS_ROOT_VIRTUALENV_ENV
Oct 20, 2023
9c2d6e5
Set env variable in actions for DATABRICKS_ROOT_VIRTUALENV_ENV
Oct 20, 2023
af15929
Set env variable in actions for DATABRICKS_ROOT_VIRTUALENV_ENV
Oct 20, 2023
525afd5
Set env variable in actions for DATABRICKS_ROOT_VIRTUALENV_ENV
Oct 20, 2023
79213a1
Remove scala side script installation.
Oct 20, 2023
cdd9b0d
Add _agg aliases for _aggregate functions.
Oct 20, 2023
ccf39bb
Remove duplicate python bindings.
Oct 20, 2023
b7dedf7
R/fix automation (#441)
sllynn Oct 26, 2023
3aabd8f
Add CombineAVG expressions.
Oct 29, 2023
74195ce
Merge remote-tracking branch 'origin/feature/grid_tiles' into feature…
Oct 29, 2023
aa47d9c
Fix temp file data leak.
Oct 31, 2023
d9c3b5c
Remove the wheel from the examples.
Oct 31, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#IntelliJ files
.idea
*.iml
tmp_

#VSCode files
.vscode
Expand Down Expand Up @@ -65,6 +66,7 @@ coverage.xml
.hypothesis/
.pytest_cache/
/python/test/.run/
spatial_knn

# Translations
*.mo
Expand Down
281 changes: 281 additions & 0 deletions notebooks/prototypes/grid_tiles/00 Download STACs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,281 @@
# Databricks notebook source
# MAGIC %md
# MAGIC ## Install the libraries and prepare the environment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am missing an introduction that describes the use case or goal of this analysis.
It would also be helpful to have a bullet list of the high-level steps involved here.


# COMMAND ----------

# MAGIC %md
# MAGIC For this demo we will require a few spatial libraries that can be easily installed via pip install. We will be using gdal, rasterio, pystac and databricks-mosaic for data download and data manipulation. We will use planetary computer as the source of the raster data for the analysis.

# COMMAND ----------

# MAGIC %pip install databricks-mosaic rasterio==1.3.5 --quiet gdal==3.4.3 pystac pystac_client planetary_computer tenacity rich

# COMMAND ----------

import library
import pystac_client
import planetary_computer
import mosaic as mos

from pyspark.sql import functions as F

mos.enable_mosaic(spark, dbutils)
mos.enable_gdal(spark)

# COMMAND ----------

# MAGIC %reload_ext autoreload
# MAGIC %autoreload 2
# MAGIC %reload_ext library

# COMMAND ----------

spark.conf.set("spark.sql.adaptive.coalescePartitions.enabled", "false")

# COMMAND ----------

# MAGIC %md
# MAGIC We will download census data from TIGER feed for this demo. The data can be downloaded as a zip to dbfs (or managed volumes).

# COMMAND ----------

dbutils.fs.rm("/FileStore/geospatial/odin/census/", True)
dbutils.fs.mkdirs("/FileStore/geospatial/odin/census/")

# COMMAND ----------

import urllib.request
urllib.request.urlretrieve(
"https://www2.census.gov/geo/tiger/TIGER2021/COUNTY/tl_2021_us_county.zip",
"/dbfs/FileStore/geospatial/odin/census/data.zip"
)

# COMMAND ----------

# MAGIC %sh ls -al /dbfs/FileStore/geospatial/odin/census/

# COMMAND ----------

# MAGIC %md
# MAGIC Mosaic has specialised readers for shape files and other GDAL supported formats. We dont need to unzip the data zip file. Just need to pass "vsizip" option to the reader.

# COMMAND ----------

census_df = mos.read().format("multi_read_ogr")\
.option("vsizip", "true")\
.option("chunkSize", "50")\
.load("dbfs:/FileStore/geospatial/odin/census/data.zip")\
.cache() # We will cache the loaded data to avoid schema inference being done repeatedly for each query

# COMMAND ----------

# MAGIC %md
# MAGIC For this exmaple we will focus on Alaska counties. Alska state code is 02 so we will apply a filter to our ingested data.

# COMMAND ----------

census_df.where("STATEFP == 2").display()

# COMMAND ----------

to_display = census_df\
.where("STATEFP == 2")\
.withColumn(
"geom_0",
mos.st_updatesrid("geom_0", "geom_0_srid", F.lit(4326))
)\
.select("geom_0")

# COMMAND ----------

# MAGIC %%mosaic_kepler
# MAGIC to_display geom_0 geometry 50

# COMMAND ----------

cells = census_df\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to explain why we are doing tessellation here.

.where("STATEFP == 2")\
.withColumn(
"geom_0",
mos.st_updatesrid("geom_0", "geom_0_srid", F.lit(4326))
)\
.withColumn("geom_0_srid", F.lit(4326))\
.withColumn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good practice to drop the original geometry after a tessellateexplode

"grid",
mos.grid_tessellateexplode("geom_0", F.lit(3))
)

# COMMAND ----------

cells.display()

# COMMAND ----------

to_display = cells.select(mos.st_simplify("grid.wkb", F.lit(0.1)).alias("wkb"))

# COMMAND ----------

# MAGIC %%mosaic_kepler
# MAGIC to_display wkb geometry 100000

# COMMAND ----------

# MAGIC %md
# MAGIC It is fairly easy to interface with the pysta_client and a remote raster data catalogs. We can browse resource collections and individual assets.

# COMMAND ----------

time_range = "2021-06-01/2021-06-30"

# COMMAND ----------

cell_jsons = cells\
.withColumn("area_id", F.hash("geom_0"))\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

area_id seems to be unused

.withColumn("h3", F.col("grid.index_id"))\
.groupBy("h3")\
.agg(
mos.st_union_agg("grid.wkb").alias("geom_1")
)\
.withColumn("geojson", mos.st_asgeojson(mos.grid_boundaryaswkb("h3")))\
.drop("count", "geom_1")

# COMMAND ----------

# MAGIC %md
# MAGIC Stac catalogs support easy download for area of interest provided as geojsons. With this in mind we will convert all our H3 cells of interest into geojsons and prepare stac requests.

# COMMAND ----------

cell_jsons.display()

# COMMAND ----------

cell_jsons.count()

# COMMAND ----------

# MAGIC %%mosaic_kepler
# MAGIC cell_jsons h3 h3

# COMMAND ----------

# MAGIC %md
# MAGIC Our framework allows for easy preparation of stac requests with only one line of code. This data is delta ready as this point and can easily be stored for lineage purposes.

# COMMAND ----------

eod_items = library.get_assets_for_cells(cell_jsons.repartition(200), time_range ,"sentinel-2-l2a" ).cache()
eod_items.display()

# COMMAND ----------

# MAGIC %md
# MAGIC From this point we can easily extract the download links for items of interest.

# COMMAND ----------

dbutils.fs.rm("/FileStore/geospatial/odin/alaska/", True)
dbutils.fs.mkdirs("/FileStore/geospatial/odin/alaska/")

# COMMAND ----------

# MAGIC %sql
# MAGIC DROP DATABASE IF EXISTS odin_alaska CASCADE;
# MAGIC CREATE DATABASE IF NOT EXISTS odin_alaska;

# COMMAND ----------

# MAGIC %sql
# MAGIC USE odin_alaska;

# COMMAND ----------

def download_band(eod_items, band_name):
to_download = eod_items\
.withColumn("timestamp", F.col("item_properties.datetime"))\
.groupBy("item_id", "timestamp")\
.agg(
*[F.first(cn).alias(cn) for cn in eod_items.columns if cn not in ["item_id"]]
)\
.withColumn("date", F.to_date("timestamp"))\
.withColumn("href", F.col("asset.href"))\
.where(
f"asset.name == '{band_name}'"
)

spark.sql(f"DROP TABLE IF EXISTS alaska_{band_name}")
dbutils.fs.rm(f"/FileStore/geospatial/odin/alaska/{band_name}", True)
dbutils.fs.mkdirs(f"/FileStore/geospatial/odin/alaska/{band_name}")

catalof_df = to_download\
.withColumn(
"outputfile",
library.download_asset("href", F.lit(f"/dbfs/FileStore/geospatial/odin/alaska/{band_name}"),
F.concat(F.hash(F.rand()), F.lit(".tif")))
)

catalof_df.write\
.mode("overwrite")\
.option("overwriteSchema", "true")\
.format("delta")\
.saveAsTable(f"alaska_{band_name}")


# COMMAND ----------

import rich.table

region = census_df.where("STATEFP == 2").select(mos.st_asgeojson("geom_0").alias("geojson")).limit(1).collect()[0]["geojson"]

catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)

search = catalog.search(
collections=["sentinel-2-l2a"],
intersects=region,
datetime=time_range
)

items = search.item_collection()

table = rich.table.Table("Asset Key", "Description")
for asset_key, asset in items[0].assets.items():
table.add_row(asset_key, asset.title)

table

# COMMAND ----------

bands = []
for asset_key, asset in items[0].assets.items():
bands.append(asset_key)

bands = [b for b in bands if b not in ["visual", "preview", "safe-manifest", "tilejson", "rendered_preview", "granule-metadata", "inspire-metadata", "product-metadata", "datastrip-metadata"]]
bands

# COMMAND ----------

for band in bands:
download_band(eod_items, band)

# COMMAND ----------

# MAGIC %fs ls /FileStore/geospatial/odin/alaska/B08

# COMMAND ----------

import rasterio
from matplotlib import pyplot
from rasterio.plot import show

fig, ax = pyplot.subplots(1, figsize=(12, 12))
raster = rasterio.open("""/dbfs/FileStore/geospatial/odin/alaska/B08/2764922.tif""")
show(raster, ax=ax, cmap='Greens')
pyplot.show()

# COMMAND ----------


Loading
Loading