GeoParquet reader revamp #660

kylebarron · 2024-07-02T03:34:55Z

The difficulty here is that you need to be able to know the output schema at the start of the iterator, before accessing any data

if the geometry type is known in the parquet metadata, parse to that type
if the geometry type is not known, parse to a MixedArray

Also, we should look more closely into the parquet ArrowReaderBuilder. That has a lot of functionality to cover both sync and async readers. Can we put all the geospatial functionality in a GeoParquetReader<ArrowReaderBuilder<T>>, and then do the same batch transforms for each geoparquet batch for both async and sync readers?

H-Plus-Time · 2024-07-27T08:19:23Z

One thing before I dive too deeply into tweaking the request flow - this doesn't happen to cover HEAD request elimination or the metadata size guess stuff, right?

(the latter point, every file in the overture maps dataset undershoots by about 320kB - well, it's either that or there's something immediately preceding the FileMetaData region that's always read)

kylebarron · 2024-07-29T03:22:25Z

this doesn't happen to cover HEAD request elimination or the metadata size guess stuff, right?

No it doesn't

kylebarron · 2024-07-29T03:23:14Z

something immediately preceding the FileMetaData region that's always read

That might be the PageIndex

If it's ok, I'd be stoked to get a v0.3.0 release of **geoarrow** — some of my **stac-geoparquet** is getting close to being release-able. Here's a checklist of things that have been mentioned as part of a v0.3 (including #628 (comment) and https://github.com/geoarrow/geoarrow-rs/milestone/3): - #660 is done ✅ - Some (but not all) of the doc updates are done in #696, and I've got a tracking issue for the rest in #689 - "Broader support for 3d geometries" isn't done as far as I know, but I haven't really been touching that at all yet - #539 is a Python thing, not a Rust crate thing As a part of this release PR I've updated our deps when possible (`sqlx` will require code change to support an update so I haven't done that one).

kylebarron and others added 11 commits July 1, 2024 23:28

Parquet record batch reader

b41fc33

progress

f146e8b

Progress

051ac88

Implement parsing of record batch

e9d6210

Flesh out parsing

e79dbf5

compiles

b9d04b6

respect coord type in options

d0b79fe

Only pass down coord_type

ee92388

Re-expose ParquetFile to Python

3cd727c

Update JS sync binding

854cd37

Update Python ParquetDataset

2b0fd98

kylebarron changed the title ~~Parquet record batch reader~~ GeoParquet reader revamp Jul 4, 2024

kylebarron added 4 commits July 22, 2024 11:53

Merge branch 'main' into kyle/parquet-record-batch-reader

a2c1502

improved parquet docs

e0058c0

Update for 2d generic

6506831

progress on dataset api

99d5d32

Bonkles mentioned this pull request Jul 25, 2024

Manifest driven downloads OvertureMaps/io-site#117

Merged

2 tasks

kylebarron mentioned this pull request Aug 10, 2024

Release 0.3 #628

Closed

kylebarron added 9 commits August 11, 2024 10:46

Merge branch 'main' into kyle/parquet-record-batch-reader

ab631e4

lint

f4b0af8

docstrings

b57eef4

Update parquet doc

217f8ae

Update Python dataset bindings

182701b

Update js bindings

4c260e7

comment

43108d8

Merge branch 'main' into kyle/parquet-record-batch-reader

d031731

conditional compilation

f0098c4

kylebarron enabled auto-merge (squash) August 11, 2024 16:35

kylebarron merged commit 2a7f150 into main Aug 11, 2024
22 checks passed

kylebarron deleted the kyle/parquet-record-batch-reader branch August 11, 2024 16:43

gadomski mentioned this pull request Aug 23, 2024

Release geoarrow v0.3.0-beta.1 #705

Merged

kylebarron mentioned this pull request Aug 26, 2024

ParquetDataset from consolidated metadata #655

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GeoParquet reader revamp #660

GeoParquet reader revamp #660

kylebarron commented Jul 2, 2024

H-Plus-Time commented Jul 27, 2024 •

edited

Loading

kylebarron commented Jul 29, 2024

kylebarron commented Jul 29, 2024

GeoParquet reader revamp #660

GeoParquet reader revamp #660

Conversation

kylebarron commented Jul 2, 2024

H-Plus-Time commented Jul 27, 2024 • edited Loading

kylebarron commented Jul 29, 2024

kylebarron commented Jul 29, 2024

H-Plus-Time commented Jul 27, 2024 •

edited

Loading