Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-637] Show spatial filters pushed to GeoParquet scans in the query plan; allow disabling spatial filter pushdown #1540

Merged

Conversation

Kontinuation
Copy link
Member

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

What changes were proposed in this PR?

Spatial filters pushed down to the GeoParquet scan node are visible in the query plan. For example, the following query

df.where("ST_Intersects(geometry, ST_Point(1, 1))").explain()

Produces the following query plan

== Physical Plan ==
Filter (isnotnull(geometry#218) AND  **org.apache.spark.sql.sedona_sql.expressions.ST_Intersects**  )
+- FileScan geoparquet [id#217L,geometry#218,bbox#219] Batched: false, DataFilters: [isnotnull(geometry#218),  **org.apache.spark.sql.sedona_sql.expressions.ST_Intersects**  ], Format: GeoParquet with spatial filter [geometry INTERSECTS POINT (1 1)], Location: InMemoryFileIndex(1 paths).., PartitionFilters: [], PushedFilters: [IsNotNull(geometry)], ReadSchema: struct<id:bigint,geometry:binary,bbox:struct<xmin:double,ymin:double,xmax:double,ymax:double>>

The spatial filters pushed down to GeoParquet scan is shown in Format: GeoParquet with spatial filter [...].

Spatial filter push-down can be manually disabled by configuring the Spark configuration spark.sedona.geoparquet.spatialFilterPushDown to false:

spark.conf.set("spark.sedona.geoparquet.spatialFilterPushDown", "false")
df.where("ST_Intersects(geometry, ST_Point(1, 1))").explain()
== Physical Plan ==
Filter (isnotnull(geometry#218) AND  **org.apache.spark.sql.sedona_sql.expressions.ST_Intersects**  )
+- FileScan geoparquet [id#217L,geometry#218,bbox#219] Batched: false, DataFilters: [isnotnull(geometry#218),  **org.apache.spark.sql.sedona_sql.expressions.ST_Intersects**  ], Format: GeoParquet, Location: InMemoryFileIndex(1 paths).., PartitionFilters: [], PushedFilters: [IsNotNull(geometry)], ReadSchema: struct<id:bigint,geometry:binary,bbox:struct<xmin:double,ymin:double,xmax:double,ymax:double>>

How was this patch tested?

Pass newly added tests

Did this PR include necessary documentation updates?

  • Yes, I have updated the documentation.

@Kontinuation Kontinuation marked this pull request as ready for review August 5, 2024 14:13
@jiayuasu jiayuasu added this to the sedona-1.6.1 milestone Aug 5, 2024
@jiayuasu jiayuasu merged commit 68abdcd into apache:master Aug 5, 2024
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants