Skip to content

Commit

Permalink
minor docs fix, split out tolerance into two params
Browse files Browse the repository at this point in the history
  • Loading branch information
sllynn committed Sep 23, 2024
1 parent 6965ae1 commit 5f5bc44
Show file tree
Hide file tree
Showing 26 changed files with 195 additions and 144 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ sdf <- withColumn(sdf, "yWidth", lit(6L))
sdf <- withColumn(sdf, "xSize", lit(0.1))
sdf <- withColumn(sdf, "ySize", lit(0.1))
sdf <- withColumn(sdf, "tile", rst_dtmfromgeoms(
column("masspoints"), column("breaklines"), lit(0.01),
column("masspoints"), column("breaklines"), lit(0.0), lit(0.01),
column("origin"), column("xWidth"), column("yWidth"), column("xSize"), column("ySize"))
)
expect_equal(SparkR::count(sdf), 1)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ sdf <- createDataFrame(

sdf <- agg(groupBy(sdf), masspoints = collect_list(column("wkt")))
sdf <- withColumn(sdf, "breaklines", expr("array('LINESTRING EMPTY')"))
triangulation_sdf <- withColumn(sdf, "triangles", st_triangulate(column("masspoints"), column("breaklines"), lit(0.01)))
triangulation_sdf <- withColumn(sdf, "triangles", st_triangulate(column("masspoints"), column("breaklines"), lit(0.0), lit(0.01)))
cache(triangulation_sdf)
expect_equal(SparkR::count(triangulation_sdf), 2)
expected <- c("POLYGON Z((0 2 2, 2 1 0, 1 3 3, 0 2 2))", "POLYGON Z((1 3 3, 2 1 0, 3 2 1, 1 3 3))")
Expand All @@ -124,6 +124,7 @@ interpolation_sdf <- withColumn(interpolation_sdf, "ySize", lit(0.1))
interpolation_sdf <- withColumn(interpolation_sdf, "interpolated", st_interpolateelevation(
column("masspoints"),
column("breaklines"),
lit(0.0),
lit(0.01),
column("origin"),
column("xWidth"),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,7 @@ test_that ("a terrain model can be produced from point geometries", {
tile = rst_dtmfromgeoms(
masspoints,
breaklines,
as.double(0.0),
as.double(0.01),
origin,
xWidth,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ test_that ("triangulation and interpolation functions behave as intended", {
mutate(breaklines = array("LINESTRING EMPTY"))

triangulation_sdf <- sdf %>%
mutate(triangles = st_triangulate(masspoints, breaklines, as.double(0.01)))
mutate(triangles = st_triangulate(masspoints, breaklines, as.double(0.00), as.double(0.01)))

expect_equal(sdf_nrow(triangulation_sdf), 2)

Expand All @@ -152,6 +152,7 @@ test_that ("triangulation and interpolation functions behave as intended", {
interpolated = st_interpolateelevation(
masspoints,
breaklines,
as.double(0.0),
as.double(0.01),
origin,
xWidth,
Expand Down
1 change: 1 addition & 0 deletions docs/source/api/raster-functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
Raster functions
=================

#####
Intro
#####
Raster functions are available in mosaic if you have installed the optional dependency `GDAL`.
Expand Down
4 changes: 2 additions & 2 deletions docs/source/api/spatial-indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -850,7 +850,7 @@ grid_cellkringexplode
</div>

grid_cell_intersection
**************
**********************

.. function:: grid_cell_intersection(left_chip, right_chip)

Expand Down Expand Up @@ -906,7 +906,7 @@ grid_cell_intersection
+--------------------------------------------------------+

grid_cell_union
**************
***************

.. function:: grid_cell_union(left_chip, right_chip)

Expand Down
23 changes: 14 additions & 9 deletions docs/source/api/vector-format-readers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
Vector Format Readers
=====================


#####
Intro
################
#####
Mosaic provides spark readers for vector files supported by GDAL OGR drivers.
Only the drivers that are built by default are supported.
Here are some common useful file formats:
Expand Down Expand Up @@ -35,7 +35,7 @@ Additionally, for convenience, Mosaic provides specific readers for Shapefile an
* :code:`spark.read.format("shapefile")` reader for Shapefiles natively in Spark.

spark.read.format("ogr")
*************************
************************
A base Spark SQL data source for reading GDAL vector data sources.
The output of the reader is a DataFrame with inferred schema.
The schema is inferred from both features and fields in the vector file.
Expand All @@ -55,7 +55,8 @@ The reader supports the following options:
* layerNumber - number of the layer to read (IntegerType), zero-indexed


.. function:: spark.read.format("ogr").load(path)
.. function:: load(path)
:module: spark.read.format("ogr")

Loads a vector file and returns the result as a :class:`DataFrame`.

Expand Down Expand Up @@ -128,7 +129,8 @@ and parsed into expected types on execution. The reader supports the following o
* layerNumber - number of the layer to read (IntegerType), zero-indexed [pass as String]


.. function:: mos.read().format("multi_read_ogr").load(path)
.. function:: load(path)
:module: mos.read().format("multi_read_ogr")

Loads a vector file and returns the result as a :class:`DataFrame`.

Expand Down Expand Up @@ -175,7 +177,7 @@ and parsed into expected types on execution. The reader supports the following o


spark.read.format("geo_db")
*****************************
***************************
Mosaic provides a reader for GeoDB files natively in Spark.
The output of the reader is a DataFrame with inferred schema.
Only 1 file per task is read. For parallel reading of large files use the multi_read_ogr reader.
Expand All @@ -186,7 +188,8 @@ The reader supports the following options:
* layerNumber - number of the layer to read (IntegerType), zero-indexed
* vsizip - if the vector files are zipped files, set this to true (BooleanType)

.. function:: spark.read.format("geo_db").load(path)
.. function:: load(path)
:module: spark.read.format("geo_db")

Loads a GeoDB file and returns the result as a :class:`DataFrame`.

Expand Down Expand Up @@ -234,7 +237,7 @@ The reader supports the following options:


spark.read.format("shapefile")
********************************
******************************
Mosaic provides a reader for Shapefiles natively in Spark.
The output of the reader is a DataFrame with inferred schema.
Only 1 file per task is read. For parallel reading of large files use the multi_read_ogr reader.
Expand All @@ -245,7 +248,8 @@ The reader supports the following options:
* layerNumber - number of the layer to read (IntegerType), zero-indexed
* vsizip - if the vector files are zipped files, set this to true (BooleanType)

.. function:: spark.read.format("shapefile").load(path)
.. function:: load(path)
:module: spark.read.format("shapefile")

Loads a Shapefile and returns the result as a :class:`DataFrame`.

Expand Down Expand Up @@ -291,6 +295,7 @@ The reader supports the following options:
These must be supplied as a :code:`String`.
Also, you can supply function signature values as :code:`String`.

################
Vector File UDFs
################

Expand Down
21 changes: 11 additions & 10 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
napoleon_use_admonition_for_notes = True
sphinx_tabs_disable_tab_closing = True
todo_include_todos = True
suppress_warnings = ["autosectionlabel.*"]

# -- Options for HTML output -------------------------------------------------

Expand All @@ -64,27 +65,27 @@
html_theme_options = {

# Set the name of the project to appear in the navigation.
'nav_title': f'Mosaic {release}',
# 'nav_title': f'Mosaic {release}',

# Specify a base_url used to generate sitemap.xml. If not
# specified, then no sitemap will be built.
# 'base_url': 'https://project.github.io/project',

# Set the color and the accent color
'color_primary': 'green',
'color_accent': 'green',
# 'color_primary': 'green',
# 'color_accent': 'green',

# Set the repo location to get a badge with stats
'repo_url': 'https://github.com/databrickslabs/mosaic/',
'repo_name': 'Mosaic',
# 'repo_url': 'https://github.com/databrickslabs/mosaic/',
# 'repo_name': 'Mosaic',

'globaltoc_depth': 3,
# 'globaltoc_depth': 3,
'globaltoc_collapse': False,
'globaltoc_includehidden': True,
'heroes': {'index': 'Simple, scalable geospatial analytics on Databricks',
'examples/index': 'examples and tutorials to get started with '
'Mosaic'},
"version_dropdown": True,
# 'heroes': {'index': 'Simple, scalable geospatial analytics on Databricks',
# 'examples/index': 'examples and tutorials to get started with '
# 'Mosaic'},
# "version_dropdown": True,
# "version_json": "../versions-v2.json",

}
Expand Down
10 changes: 5 additions & 5 deletions docs/source/usage/automatic-sql-registration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,21 @@ with a geospatial middleware component such as [Geoserver](https://geoserver.org

.. warning::
Mosaic 0.4.x SQL bindings for DBR 13 can register with Assigned clusters (as Spark Expressions), but not Shared Access due
to `Unity Catalog <https://www.databricks.com/product/unity-catalog>`_ API changes, more `here <https://docs.databricks.com/en/udf/index.html>`_.
to `Unity Catalog <https://www.databricks.com/product/unity-catalog>`__ API changes, more `here <https://docs.databricks.com/en/udf/index.html>`__.

Pre-requisites
**************

In order to use Mosaic, you must have access to a Databricks cluster running
Databricks Runtime 13. If you have cluster creation permissions in your Databricks
workspace, you can create a cluster using the instructions
`here <https://docs.databricks.com/clusters/create.html#use-the-cluster-ui>`_.
`here <https://docs.databricks.com/clusters/create.html#use-the-cluster-ui>`__.

You will also need "Can Manage" permissions on this cluster in order to attach init script
to your cluster. A workspace administrator will be able to grant
these permissions and more information about cluster permissions can be found
in our documentation
`here <https://docs.databricks.com/security/access-control/cluster-acl.html#cluster-level-permissions>`_.
`here <https://docs.databricks.com/security/access-control/cluster-acl.html#cluster-level-permissions>`__.

Installation
************
Expand Down Expand Up @@ -59,9 +59,9 @@ To install Mosaic on your Databricks cluster, take the following steps:
EOF
#. Configure the init script for the cluster following the instructions `here <https://docs.databricks.com/clusters/init-scripts.html#configure-a-cluster-scoped-init-script>`_.
#. Configure the init script for the cluster following the instructions `here <https://docs.databricks.com/clusters/init-scripts.html#configure-a-cluster-scoped-init-script>`__.
#. Add the following spark configuration values for your cluster following the instructions `here <https://docs.databricks.com/clusters/configure.html#spark-configuration>`_.
#. Add the following spark configuration values for your cluster following the instructions `here <https://docs.databricks.com/clusters/configure.html#spark-configuration>`__.
.. code-block:: bash
Expand Down
8 changes: 4 additions & 4 deletions docs/source/usage/install-gdal.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,17 @@ In order to use Mosaic 0.4 series, you must have access to a Databricks cluster
Databricks Runtime 13.3 LTS.
If you have cluster creation permissions in your Databricks
workspace, you can create a cluster using the instructions
`here <https://docs.databricks.com/clusters/create.html#use-the-cluster-ui>`_.
`here <https://docs.databricks.com/clusters/create.html#use-the-cluster-ui>`__.

You will also need "Can Manage" permissions on this cluster in order to attach the
Mosaic library to your cluster. A workspace administrator will be able to grant
these permissions and more information about cluster permissions can be found
in our documentation
`here <https://docs.databricks.com/security/access-control/cluster-acl.html#cluster-level-permissions>`_.
`here <https://docs.databricks.com/security/access-control/cluster-acl.html#cluster-level-permissions>`__.

.. warning::
These instructions assume an Assigned cluster is being used (vs a Shared Access cluster),
more on access modes `here <https://docs.databricks.com/en/compute/configure.html#access-modes>`_.
more on access modes `here <https://docs.databricks.com/en/compute/configure.html#access-modes>`__.

GDAL Installation
####################
Expand Down Expand Up @@ -131,7 +131,7 @@ GDAL is configured as follows in `MosaicGDAL <https://github.com/databrickslabs/
* - GDAL_VRT_ENABLE_PYTHON
- "YES"
* - GDAL_DISABLE_READDIR_ON_OPEN
- "TRUE"
- "EMPTY_DIR"
* - CPL_TMPDIR
- "<CPL_TMPDIR>"
* - GDAL_PAM_PROXY_DIR
Expand Down
38 changes: 19 additions & 19 deletions docs/source/usage/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,49 +16,49 @@ Mosaic 0.4.x series only supports DBR 13.x DBRs. If running on a different DBR i
DEPRECATION ERROR: Mosaic v0.4.x series only supports Databricks Runtime 13.
You can specify :code:`%pip install 'databricks-mosaic<0.4,>=0.3'` for DBR < 13.

Mosaic 0.4.x series issues an ERROR on standard, non-Photon clusters `ADB <https://learn.microsoft.com/en-us/azure/databricks/runtime/>`_ |
`AWS <https://docs.databricks.com/runtime/index.html/>`_ |
`GCP <https://docs.gcp.databricks.com/runtime/index.html/>`_:
Mosaic 0.4.x series issues an ERROR on standard, non-Photon clusters `ADB <https://learn.microsoft.com/en-us/azure/databricks/runtime/>`__ |
`AWS <https://docs.databricks.com/runtime/index.html/>`__ |
`GCP <https://docs.gcp.databricks.com/runtime/index.html/>`__:

DEPRECATION ERROR: Please use a Databricks Photon-enabled Runtime for performance benefits or Runtime ML for
spatial AI benefits; Mosaic 0.4.x series restricts executing this cluster.

As of Mosaic 0.4.0 / DBR 13.3 LTS (subject to change in follow-on releases):

* `Assigned Clusters <https://docs.databricks.com/en/compute/configure.html#access-modes>`_
* `Assigned Clusters <https://docs.databricks.com/en/compute/configure.html#access-modes>`__
* Mosaic Python, SQL, R, and Scala APIs.
* `Shared Access Clusters <https://docs.databricks.com/en/compute/configure.html#access-modes>`_
* Mosaic Scala API (JVM) with Admin `allowlisting <https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html>`_.
* `Shared Access Clusters <https://docs.databricks.com/en/compute/configure.html#access-modes>`__
* Mosaic Scala API (JVM) with Admin `allowlisting <https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/allowlist.html>`__.
* Mosaic Python bindings (to Mosaic Scala APIs) are blocked by Py4J Security on Shared Access Clusters.
* Mosaic SQL expressions cannot yet be registered due to `Unity Catalog <https://www.databricks.com/product/unity-catalog>`_.
API changes, more `here <https://docs.databricks.com/en/udf/index.html>`_.
* Mosaic SQL expressions cannot yet be registered due to `Unity Catalog <https://www.databricks.com/product/unity-catalog>`__.
API changes, more `here <https://docs.databricks.com/en/udf/index.html>`__.

.. note::
Mosaic is a custom JVM library that extends spark, which has the following implications in DBR 13.3 LTS:

* `Unity Catalog <https://www.databricks.com/product/unity-catalog>`_ enforces process isolation which is difficult
* `Unity Catalog <https://www.databricks.com/product/unity-catalog>`__ enforces process isolation which is difficult
to accomplish with custom JVM libraries; as such only built-in (aka platform provided) JVM APIs can be invoked from
other supported languages in Shared Access Clusters.
* Clusters can read `Volumes <https://docs.databricks.com/en/connect/unity-catalog/volumes.html>`_ via relevant
* Clusters can read `Volumes <https://docs.databricks.com/en/connect/unity-catalog/volumes.html>`__ via relevant
built-in (aka platform provided) readers and writers or via custom python calls which do not involve any custom JVM code.

If you have cluster creation permissions in your Databricks
workspace, you can create a cluster using the instructions
`here <https://docs.databricks.com/clusters/create.html#use-the-cluster-ui>`_.
`here <https://docs.databricks.com/clusters/create.html#use-the-cluster-ui>`__.

You will also need "Can Manage" permissions on this cluster in order to attach the
Mosaic library to your cluster. A workspace administrator will be able to grant
these permissions and more information about cluster permissions can be found
in our documentation
`here <https://docs.databricks.com/security/access-control/cluster-acl.html#cluster-level-permissions>`_.
`here <https://docs.databricks.com/security/access-control/cluster-acl.html#cluster-level-permissions>`__.

Package installation
####################

Installation from PyPI
**********************
Python users can install the library directly from `PyPI <https://pypi.org/project/databricks-mosaic/>`_
using the instructions `here <https://docs.databricks.com/libraries/cluster-libraries.html>`_
Python users can install the library directly from `PyPI <https://pypi.org/project/databricks-mosaic/>`__
using the instructions `here <https://docs.databricks.com/libraries/cluster-libraries.html>`__
or from within a Databricks notebook using the :code:`%pip` magic command, e.g.

.. code-block:: bash
Expand All @@ -72,11 +72,11 @@ if you need to install Mosaic 0.3 series for DBR 12.2 LTS, e.g.
%pip install "databricks-mosaic<0.4,>=0.3"
For Mosaic versions < 0.4 please use the `0.3 docs <https://databrickslabs.github.io/mosaic/v0.3.x/index.html>`_.
For Mosaic versions < 0.4 please use the `0.3 docs <https://databrickslabs.github.io/mosaic/v0.3.x/index.html>`__.

Installation from release artifacts
***********************************
Alternatively, you can access the latest release artifacts `here <https://github.com/databrickslabs/mosaic/releases>`_
Alternatively, you can access the latest release artifacts `here <https://github.com/databrickslabs/mosaic/releases>`__
and manually attach the appropriate library to your cluster.

Which artifact you choose to attach will depend on the language API you intend to use.
Expand All @@ -85,13 +85,13 @@ Which artifact you choose to attach will depend on the language API you intend t
* For Scala users, take the Scala JAR (packaged with all necessary dependencies).
* For R users, download the Scala JAR and the R bindings library [see the sparkR readme](R/sparkR-mosaic/README.md).

Instructions for how to attach libraries to a Databricks cluster can be found `here <https://docs.databricks.com/libraries/cluster-libraries.html>`_.
Instructions for how to attach libraries to a Databricks cluster can be found `here <https://docs.databricks.com/libraries/cluster-libraries.html>`__.

Automated SQL registration
**************************
If you would like to use Mosaic's functions in pure SQL (in a SQL notebook, from a business intelligence tool,
or via a middleware layer such as Geoserver, perhaps) then you can configure
"Automatic SQL Registration" using the instructions `here <https://databrickslabs.github.io/mosaic/usage/automatic-sql-registration.html>`_.
"Automatic SQL Registration" using the instructions `here <https://databrickslabs.github.io/mosaic/usage/automatic-sql-registration.html>`__.

Enabling the Mosaic functions
#############################
Expand Down Expand Up @@ -184,4 +184,4 @@ register the Mosaic SQL functions in your SparkSession from a Scala notebook cel
.. warning::
Mosaic 0.4.x SQL bindings for DBR 13 can register with Assigned clusters (as Spark Expressions), but not Shared Access due
to `Unity Catalog <https://www.databricks.com/product/unity-catalog>`_ API changes, more `here <https://docs.databricks.com/en/udf/index.html>`_.
to `Unity Catalog <https://www.databricks.com/product/unity-catalog>`__ API changes, more `here <https://docs.databricks.com/en/udf/index.html>`__.
Loading

0 comments on commit 5f5bc44

Please sign in to comment.