Skip to content

Commit

Permalink
feat(bq,sf|h3,quadbin): add H3/QUADBIN_POLYFILL_TABLE (#447)
Browse files Browse the repository at this point in the history
  • Loading branch information
Jesus89 authored Oct 11, 2023
1 parent 66a203f commit f3c4cdd
Show file tree
Hide file tree
Showing 13 changed files with 604 additions and 1 deletion.
2 changes: 1 addition & 1 deletion clouds/bigquery/modules/doc/h3/H3_POLYFILL_MODE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ H3_POLYFILL_MODE(geog, resolution, mode)

**Description**

Returns an array of quadbin cell indexes contained in the given geography at a given level of detail. Containment is determined by the mode: center, intersects, contains.
Returns an array of H3 cell indexes contained in the given geography at a given level of detail. Containment is determined by the mode: center, intersects, contains.

* `geog`: `GEOGRAPHY` representing the shape to cover.
* `resolution`: `INT64` level of detail. The value must be between 0 and 15 ([H3 resolution table](https://h3geo.org/docs/core-library/restable)).
Expand Down
58 changes: 58 additions & 0 deletions clouds/bigquery/modules/doc/h3/H3_POLYFILL_TABLE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
## H3_POLYFILL_TABLE (BETA)

```sql:signature
H3_POLYFILL_TABLE(input_query, resolution, mode, output_table)
```

**Description**

Returns a table with the H3 cell indexes contained in the given geography at a given level of detail. Containment is determined by the mode: center, intersects, contains. All the attributes except the geography will be included in the output table, clustered by the h3 column.

* `input_query`: `STRING` input data to polyfill. It must contain a column `geom` with the shape to cover. Additionally, other columns can be included.
* `resolution`: `INT64` level of detail. The value must be between 0 and 15 ([H3 resolution table](https://h3geo.org/docs/core-library/restable)).
* `mode`: `STRING`
* `center` returns the indexes of the H3 cells which centers intersect the input geography (polygon). The resulting H3 set does not fully cover the input geography, however, this is **significantly faster** that the other modes. This mode is not compatible with points or lines. Equivalent to [`H3_POLYFILL`](h3#h3_polyfill).
* `intersects` returns the indexes of the H3 cells that intersect the input geography. The resulting H3 set will completely cover the input geography (point, line, polygon).
* `contains` returns the indexes of the H3 cells that are entirely contained inside the input geography (polygon). This mode is not compatible with points or lines.
* `output_table`: `STRING` name of the output table to store the results of the polyfill.

Mode `center`:

![](h3_polyfill_mode_center.png)

Mode `intersects`:

![](h3_polyfill_mode_intersects.png)

Mode `contains`:

![](h3_polyfill_mode_contains.png)

**Output**

The results are stored in the table named `<output_table>`, which contains the following columns:

* `h3`: `STRING` the geometry of the considered point.
* The rest of columns included in `input_query` except `geom`.

**Examples**

```sql
CALL carto.H3_POLYFILL_TABLE(
"SELECT ST_GEOGFROMTEXT('POLYGON ((-3.71219873428345 40.413365349070865, -3.7144088745117 40.40965661286395, -3.70659828186035 40.409525904775634, -3.71219873428345 40.413365349070865))') AS geom",
9, 'intersects',
'<project>.<dataset>.<output_table>'
);
-- The table `<project>.<dataset>.<output_table>` will be created
-- with column: h3
```

```sql
CALL carto.H3_POLYFILL_TABLE(
'SELECT geom, name, value FROM `<project>.<dataset>.<table>`',
9, 'center',
'<project>.<dataset>.<output_table>'
);
-- The table `<project>.<dataset>.<output_table>` will be created
-- with columns: h3, name, value
```
58 changes: 58 additions & 0 deletions clouds/bigquery/modules/doc/quadbin/QUADBIN_POLYFILL_TABLE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
## QUADBIN_POLYFILL_TABLE (BETA)

```sql:signature
QUADBIN_POLYFILL_TABLE(input_query, resolution, mode, output_table)
```

**Description**

Returns a table with the quadbin cell indexes contained in the given geography at a given level of detail. Containment is determined by the mode: center, intersects, contains. All the attributes except the geography will be included in the output table, clustered by the quadbin column.

* `input_query`: `STRING` input data to polyfill. It must contain a column `geom` with the shape to cover. Additionally, other columns can be included.
* `resolution`: `INT64` level of detail. The value must be between 0 and 26.
* `mode`: `STRING`
* `center` returns the indexes of the quadbin cells which centers intersect the input geography (polygon). The resulting quadbin set does not fully cover the input geography, however, this is **significantly faster** that the other modes. This mode is not compatible with points or lines. Equivalent to [`QUADBIN_POLYFILL`](quadbin#quadbin_polyfill).
* `intersects` returns the indexes of the quadbin cells that intersect the input geography. The resulting quadbin set will completely cover the input geography (point, line, polygon).
* `contains` returns the indexes of the quadbin cells that are entirely contained inside the input geography (polygon). This mode is not compatible with points or lines.
* `output_table`: `STRING` name of the output table to store the results of the polyfill.

Mode `center`:

![](quadbin_polyfill_mode_center.png)

Mode `intersects`:

![](quadbin_polyfill_mode_intersects.png)

Mode `contains`:

![](quadbin_polyfill_mode_contains.png)

**Output**

The results are stored in the table named `<output_table>`, which contains the following columns:

* `quadbin`: `INT64` the geometry of the considered point.
* The rest of columns included in `input_query` except `geom`.

**Examples**

```sql
CALL carto.QUADBIN_POLYFILL_TABLE(
"SELECT ST_GEOGFROMTEXT('POLYGON ((-3.71219873428345 40.413365349070865, -3.7144088745117 40.40965661286395, -3.70659828186035 40.409525904775634, -3.71219873428345 40.413365349070865))') AS geom",
12, 'intersects',
'<project>.<dataset>.<output_table>'
);
-- The table `<project>.<dataset>.<output_table>` will be created
-- with column: quadbin
```

```sql
CALL carto.QUADBIN_POLYFILL_TABLE(
'SELECT geom, name, value FROM `<project>.<dataset>.<table>`',
12, 'center',
'<project>.<dataset>.<output_table>'
);
-- The table `<project>.<dataset>.<output_table>` will be created
-- with columns: quadbin, name, value
```
59 changes: 59 additions & 0 deletions clouds/bigquery/modules/sql/h3/H3_POLYFILL_TABLE.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
----------------------------
-- Copyright (C) 2023 CARTO
----------------------------

CREATE OR REPLACE FUNCTION `@@BQ_DATASET@@.__H3_POLYFILL_QUERY`
(
input_query STRING,
resolution INT64,
mode STRING,
output_table STRING
)
RETURNS STRING
DETERMINISTIC
LANGUAGE js
AS """
if (!['center', 'intersects', 'contains'].includes(mode)) {
throw Error('Invalid mode, should be center, intersects, or contains.')
}
if (resolution < 0 || resolution > 15) {
throw Error('Invalid resolution, should be between 0 and 15.')
}
output_table = output_table.replace(/`/g, '')
const containmentFunction = (mode === 'contains') ? 'ST_CONTAINS' : 'ST_INTERSECTS'
const cellFunction = (mode === 'center') ? '@@BQ_DATASET@@.H3_CENTER' : '@@BQ_DATASET@@.H3_BOUNDARY'
return 'CREATE TABLE `' + output_table + '` CLUSTER BY (h3) AS\\n' +
'WITH __input AS (' + input_query + '),\\n' +
'__cells AS (SELECT h3, i.* FROM __input AS i,\\n' +
'UNNEST(`@@BQ_DATASET@@.__H3_POLYFILL_INIT`(geom,`@@BQ_DATASET@@.__H3_POLYFILL_INIT_Z`(geom,' + resolution + '))) AS parent,\\n' +
'UNNEST(`@@BQ_DATASET@@.H3_TOCHILDREN`(parent,' + resolution + ')) AS h3)\\n' +
'SELECT * EXCEPT (geom) FROM __cells\\n' +
'WHERE ' + containmentFunction + '(geom, `' + cellFunction + '`(h3));'
""";

CREATE OR REPLACE PROCEDURE `@@BQ_DATASET@@.H3_POLYFILL_TABLE`
(
input_query STRING,
resolution INT64,
mode STRING,
output_table STRING
)
BEGIN
DECLARE polyfill_query STRING;

-- Check if the destination tileset already exists
CALL `@@BQ_DATASET@@.__CHECK_TABLE`(output_table);

SET polyfill_query = `@@BQ_DATASET@@.__H3_POLYFILL_QUERY`(
input_query,
resolution,
mode,
output_table
);

EXECUTE IMMEDIATE polyfill_query;
END;
59 changes: 59 additions & 0 deletions clouds/bigquery/modules/sql/quadbin/QUADBIN_POLYFILL_TABLE.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
----------------------------
-- Copyright (C) 2023 CARTO
----------------------------

CREATE OR REPLACE FUNCTION `@@BQ_DATASET@@.__QUADBIN_POLYFILL_QUERY`
(
input_query STRING,
resolution INT64,
mode STRING,
output_table STRING
)
RETURNS STRING
DETERMINISTIC
LANGUAGE js
AS """
if (!['center', 'intersects', 'contains'].includes(mode)) {
throw Error('Invalid mode, should be center, intersects, or contains.')
}
if (resolution < 0 || resolution > 26) {
throw Error('Invalid resolution, should be between 0 and 26.')
}
output_table = output_table.replace(/`/g, '')
const containmentFunction = (mode === 'contains') ? 'ST_CONTAINS' : 'ST_INTERSECTS'
const cellFunction = (mode === 'center') ? '@@BQ_DATASET@@.QUADBIN_CENTER' : '@@BQ_DATASET@@.QUADBIN_BOUNDARY'
return 'CREATE TABLE `' + output_table + '` CLUSTER BY (quadbin) AS\\n' +
'WITH __input AS (' + input_query + '),\\n' +
'__cells AS (SELECT quadbin, i.* FROM __input AS i,\\n' +
'UNNEST(`@@BQ_DATASET@@.__QUADBIN_POLYFILL_INIT`(geom,`@@BQ_DATASET@@.__QUADBIN_POLYFILL_INIT_Z`(geom,' + resolution + '))) AS parent,\\n' +
'UNNEST(`@@BQ_DATASET@@.QUADBIN_TOCHILDREN`(parent,' + resolution + ')) AS quadbin)\\n' +
'SELECT * EXCEPT (geom) FROM __cells\\n' +
'WHERE ' + containmentFunction + '(geom, `' + cellFunction + '`(quadbin));'
""";

CREATE OR REPLACE PROCEDURE `@@BQ_DATASET@@.QUADBIN_POLYFILL_TABLE`
(
input_query STRING,
resolution INT64,
mode STRING,
output_table STRING
)
BEGIN
DECLARE polyfill_query STRING;

-- Check if the destination tileset already exists
CALL `@@BQ_DATASET@@.__CHECK_TABLE`(output_table);

SET polyfill_query = `@@BQ_DATASET@@.__QUADBIN_POLYFILL_QUERY`(
input_query,
resolution,
mode,
output_table
);

EXECUTE IMMEDIATE polyfill_query;
END;
33 changes: 33 additions & 0 deletions clouds/bigquery/modules/sql/utils/__CHECK_TABLE.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---------------------------------
-- Copyright (C) 2020-2021 CARTO
---------------------------------

CREATE OR REPLACE PROCEDURE `@@BQ_DATASET@@.__CHECK_TABLE`
(destination_table STRING)
BEGIN
DECLARE destination_parts DEFAULT (SELECT `@@BQ_DATASET@@.__TABLENAME_SPLIT`(destination_table));
DECLARE tables_metadata STRING;
DECLARE table_name STRING;
DECLARE num_tables INT64;

IF destination_parts IS NULL OR destination_parts.table IS NULL OR destination_parts.dataset IS NULL THEN
SELECT ERROR("The output table does not have a correct format, i.e. [projectID].dataset.tablename. Please, use a different output table name and try again.");
END IF;

SET table_name = destination_parts.table;
SET tables_metadata = `@@BQ_DATASET@@.__TABLENAME_JOIN`((destination_parts.project, destination_parts.dataset, '__TABLES__'));

EXECUTE IMMEDIATE FORMAT(
'''
SELECT COUNT(size_bytes)
FROM %s
WHERE table_id='%s'
''',
tables_metadata,
table_name
) INTO num_tables;

IF num_tables > 0 THEN
SELECT ERROR("The output table to store the tileset already exists. Please, use a different output table name and try again.");
END IF;
END;
14 changes: 14 additions & 0 deletions clouds/bigquery/modules/sql/utils/__TABLENAME_JOIN.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
----------------------------
-- Copyright (C) 2021 CARTO
----------------------------

CREATE OR REPLACE FUNCTION `@@BQ_DATASET@@.__TABLENAME_JOIN`
(split_name STRUCT<project STRING, dataset STRING, table STRING>)
RETURNS STRING
AS (
IF(
split_name.project IS NULL,
FORMAT('`%s`.`%s`', split_name.dataset, split_name.table),
FORMAT('`%s`.`%s`.`%s`', split_name.project, split_name.dataset, split_name.table)
)
);
16 changes: 16 additions & 0 deletions clouds/bigquery/modules/sql/utils/__TABLENAME_SPLIT.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
----------------------------
-- Copyright (C) 2021 CARTO
----------------------------

CREATE OR REPLACE FUNCTION `@@BQ_DATASET@@.__TABLENAME_SPLIT`
(qualified_name STRING)
RETURNS STRUCT<project STRING, dataset STRING, table STRING>
AS ((
WITH unquoted AS (SELECT REPLACE(qualified_name, "`", "") AS name)

SELECT AS STRUCT
REGEXP_EXTRACT(name, r"^(.+)\..+\..+$") AS project,
COALESCE(REGEXP_EXTRACT(name, r"^.+\.(.+)\..+$"), REGEXP_EXTRACT(name, r"^(.+)\..+$")) AS dataset,
REGEXP_EXTRACT(name, r"^.+\.(.+)$") AS table
FROM unquoted
));
20 changes: 20 additions & 0 deletions clouds/bigquery/modules/test/h3/H3_POLYFILL_TABLE.test.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
const { runQuery } = require('../../../common/test-utils');

const BQ_DATASET = process.env.BQ_DATASET;

test('H3_POLYFILL_TABLE should generate the correct query', async () => {
const query = `SELECT \`@@BQ_DATASET@@.__H3_POLYFILL_QUERY\`(
'SELECT geom, name, value FROM \`<project>.<dataset>.<table>\`',
12, 'center',
'<project>.<dataset>.<output_table>'
) AS output`;
const rows = await runQuery(query);
expect(rows.length).toEqual(1);
expect(rows[0].output).toEqual(`CREATE TABLE \`<project>.<dataset>.<output_table>\` CLUSTER BY (h3) AS
WITH __input AS (SELECT geom, name, value FROM \`<project>.<dataset>.<table>\`),
__cells AS (SELECT h3, i.* FROM __input AS i,
UNNEST(\`@@BQ_DATASET@@.__H3_POLYFILL_INIT\`(geom,\`@@BQ_DATASET@@.__H3_POLYFILL_INIT_Z\`(geom,12))) AS parent,
UNNEST(\`@@BQ_DATASET@@.H3_TOCHILDREN\`(parent,12)) AS h3)
SELECT * EXCEPT (geom) FROM __cells
WHERE ST_INTERSECTS(geom, \`@@BQ_DATASET@@.H3_CENTER\`(h3));`.replace(/@@BQ_DATASET@@/g, BQ_DATASET));
});
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
const { runQuery } = require('../../../common/test-utils');

const BQ_DATASET = process.env.BQ_DATASET;

test('QUADBIN_POLYFILL_TABLE should generate the correct query', async () => {
const query = `SELECT \`@@BQ_DATASET@@.__QUADBIN_POLYFILL_QUERY\`(
'SELECT geom, name, value FROM \`<project>.<dataset>.<table>\`',
12, 'center',
'<project>.<dataset>.<output_table>'
) AS output`;
const rows = await runQuery(query);
expect(rows.length).toEqual(1);
expect(rows[0].output).toEqual(`CREATE TABLE \`<project>.<dataset>.<output_table>\` CLUSTER BY (quadbin) AS
WITH __input AS (SELECT geom, name, value FROM \`<project>.<dataset>.<table>\`),
__cells AS (SELECT quadbin, i.* FROM __input AS i,
UNNEST(\`@@BQ_DATASET@@.__QUADBIN_POLYFILL_INIT\`(geom,\`@@BQ_DATASET@@.__QUADBIN_POLYFILL_INIT_Z\`(geom,12))) AS parent,
UNNEST(\`@@BQ_DATASET@@.QUADBIN_TOCHILDREN\`(parent,12)) AS quadbin)
SELECT * EXCEPT (geom) FROM __cells
WHERE ST_INTERSECTS(geom, \`@@BQ_DATASET@@.QUADBIN_CENTER\`(quadbin));`.replace(/@@BQ_DATASET@@/g, BQ_DATASET));
});
Loading

0 comments on commit f3c4cdd

Please sign in to comment.