Skip to content

Commit 633419a

Browse files
[SPARK-51989][PYTHON] Add missing Filter subclasses to __all__ list in datasource
### What changes were proposed in this pull request? This PR adds missing Filter subclasses to __all__ list in pyspark.sql.datasource. ### Why are the changes needed? To improve python data source filter pushdown ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? Closes #50782 from allisonwang-db/spark-51989-missing-filter. Authored-by: Allison Wang <[email protected]> Signed-off-by: Allison Wang <[email protected]>
1 parent 30cf5b4 commit 633419a

File tree

1 file changed

+13
-1
lines changed

1 file changed

+13
-1
lines changed

python/pyspark/sql/datasource.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,18 @@
5353
"WriterCommitMessage",
5454
"Filter",
5555
"EqualTo",
56+
"EqualNullSafe",
57+
"GreaterThan",
58+
"GreaterThanOrEqual",
59+
"LessThan",
60+
"LessThanOrEqual",
61+
"In",
62+
"IsNull",
63+
"IsNotNull",
64+
"Not",
65+
"StringStartsWith",
66+
"StringEndsWith",
67+
"StringContains",
5668
]
5769

5870

@@ -966,7 +978,7 @@ def abort(self, messages: List[Optional["WriterCommitMessage"]]) -> None:
966978

967979
class DataSourceArrowWriter(DataSourceWriter):
968980
"""
969-
A base class for data source writers that process data using PyArrows `RecordBatch`.
981+
A base class for data source writers that process data using PyArrow's `RecordBatch`.
970982
971983
Unlike :class:`DataSourceWriter`, which works with an iterator of Spark Rows, this class
972984
is optimized for using the Arrow format when writing data. It can offer better performance

0 commit comments

Comments
 (0)