Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_parquet() browser support #18

Open
jwass opened this issue Mar 14, 2024 · 2 comments
Open

read_parquet() browser support #18

jwass opened this issue Mar 14, 2024 · 2 comments

Comments

@jwass
Copy link

jwass commented Mar 14, 2024

This is a great tool.

I was eager to try it out on the latest Overture release data from yesterday.

I tried out this query in the Overture Maps Downloader tool: select * from read_parquet('s3://overturemaps-us-west-2/release/2024-03-12-alpha.0/theme=buildings/type=building/*') where bbox.minx > -73.510779 and bbox.miny > 41.171381 and bbox.maxx < -69.471306 and bbox.maxy < 42.98083 limit 100;. This will work fine in native duckdb, but returns Error: Error: Invalid Error: [object WebAssembly.Exception] here. I see a similar error in shell.duckdb.org so I'm guessing this is lack of wasm support for the underlying aws libraries.

@Youssef-Harby
Copy link
Owner

Youssef-Harby commented Mar 14, 2024

@jwass, thank you so much! Yes, the Content Security Policy in the browser, I believe, doesn't allow the wildcard/glob (https://*) functions to be used because the entire app is running on the client side in the browser, with no servers involved (ref: duckdb/duckdb-wasm#1040 ). You can try:

SELECT 
    id,
    geometry
FROM 
    's3://overturemaps-us-west-2/release/2024-03-12-alpha.0/theme=buildings/type=building/part-00150-4dfc75cd-2680-4d52-b5e0-f4cc9f36b267-c000.zstd.parquet'
LIMIT 
    100;

Or, I believe you can create a list with all the parquet files you want to query from, like:

SELECT 
    id,
    geometry
FROM 
    read_parquet(['s3://overturemaps-us-west-2/release/2024-03-12-alpha.0/theme=buildings/type=building/part-00150-4dfc75cd-2680-4d52-b5e0-f4cc9f36b267-c000.zstd.parquet','s3://overturemaps-us-west-2/release/2024-03-12-alpha.0/theme=buildings/type=building/part-00151-4dfc75cd-2680-4d52-b5e0-f4cc9f36b267-c000.zstd.parquet'])
WHERE
    bbox.minx > 78.53608051444496 
AND bbox.maxx < 78.56248512255797 
AND bbox.miny > 25.483406547011967 
AND bbox.maxy < 25.521748210549873
LIMIT 
    2000;

We can implement a workaround in this app by utilizing STAC for the GeoParquet files stored in S3. This way, when a user zooms into a specific bounding box (bbox), we can intersect this bbox with the ones defined in the STAC catalog. This process will enable us to list the GeoParquet files containing data for the specified region.

@jwass
Copy link
Author

jwass commented Mar 14, 2024

Thanks! I literally just discovered this about 10 minutes ago and came to say that it works for individual files. This is pretty cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants