Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement partial "lazy" support for DuckDB (even with this PR, DuckDB support is work-in-progress!) #1725

Merged
merged 116 commits into from
Jan 6, 2025
Merged
Show file tree
Hide file tree
Changes from 108 commits
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
fc950bd
duckdb with_columns
MarcoGorelli Dec 24, 2024
d06a45a
polars 1.18 compat
MarcoGorelli Dec 24, 2024
9190c5c
wip
MarcoGorelli Dec 24, 2024
21a3a9f
more wip
MarcoGorelli Dec 24, 2024
da43745
wip
MarcoGorelli Dec 24, 2024
f9e27ce
wip
MarcoGorelli Dec 24, 2024
9699671
simplify
MarcoGorelli Dec 24, 2024
2130494
simplify
MarcoGorelli Dec 24, 2024
a20236c
fix
MarcoGorelli Dec 24, 2024
f515e66
Merge remote-tracking branch 'upstream-token/main' into duckdb-relati…
MarcoGorelli Dec 24, 2024
d33c82a
implement sort
MarcoGorelli Dec 25, 2024
b781550
getting there!
MarcoGorelli Dec 25, 2024
8fb44dc
wip
MarcoGorelli Dec 25, 2024
3d27066
wip
MarcoGorelli Dec 26, 2024
105982a
wip
MarcoGorelli Dec 26, 2024
92593c1
wip
MarcoGorelli Dec 26, 2024
4be7ae5
simplify
MarcoGorelli Dec 26, 2024
9dab9a5
with renaming
MarcoGorelli Dec 26, 2024
3ecaf9b
groupby tests passing
MarcoGorelli Dec 26, 2024
7f8c82d
wip
MarcoGorelli Dec 26, 2024
2a6fa99
hey we can do all of q1!
MarcoGorelli Dec 27, 2024
ac5e827
inner join
MarcoGorelli Dec 27, 2024
dc1392e
wip
MarcoGorelli Dec 27, 2024
763583e
wip
MarcoGorelli Dec 27, 2024
f0229f9
wip
MarcoGorelli Dec 28, 2024
62743b4
max horizontal and min horizontal working!!!
MarcoGorelli Dec 28, 2024
7e62298
got clip too!
MarcoGorelli Dec 28, 2024
b27a60a
str.startswith
MarcoGorelli Dec 28, 2024
9570e1b
is_between
MarcoGorelli Dec 28, 2024
5e87db1
add unique
MarcoGorelli Dec 28, 2024
0a8f890
refactor
MarcoGorelli Dec 28, 2024
f4b7c9f
lint
MarcoGorelli Dec 28, 2024
f064bf7
fixup
MarcoGorelli Dec 28, 2024
3c2e409
wip join
MarcoGorelli Dec 28, 2024
34af8f4
wip
MarcoGorelli Dec 28, 2024
876c247
lets do this
MarcoGorelli Dec 28, 2024
88f851e
add slice for duckdb
raisadz Dec 28, 2024
83cdfcf
simplify slice
raisadz Dec 28, 2024
f811661
Merge pull request #2 from raisadz/duckdb-relational
MarcoGorelli Dec 28, 2024
f63e3b7
Merge branch 'duckdb-relational' of github.com:MarcoGorelli/narwhals …
MarcoGorelli Dec 28, 2024
fbd3a10
Merge remote-tracking branch 'upstream/main' into duckdb-relational
MarcoGorelli Dec 28, 2024
1955f6b
concat
MarcoGorelli Dec 28, 2024
4b9ae7b
concat
MarcoGorelli Dec 28, 2024
c4e61c0
q7 runs!
MarcoGorelli Dec 28, 2024
f261b64
q9 runs
MarcoGorelli Dec 28, 2024
0c0df97
contains test
MarcoGorelli Dec 28, 2024
4c23549
add round
MarcoGorelli Dec 28, 2024
7d149d8
invert
MarcoGorelli Dec 28, 2024
1e285fc
unique
MarcoGorelli Dec 28, 2024
6b6c3ef
expressify is_between
MarcoGorelli Dec 28, 2024
e4f2dfc
Merge remote-tracking branch 'upstream/main' into duckdb-relational
MarcoGorelli Dec 29, 2024
fbb4ce2
feat: add `to_lowercase` and `to_uppercase` to duckdb
raisadz Dec 29, 2024
a8d303d
Merge pull request #3 from raisadz/duckdb-relational
MarcoGorelli Dec 29, 2024
cf8bd98
add `strip_chars` to duckdb
raisadz Dec 29, 2024
2139ad0
Merge remote-tracking branch 'marcogorelli/duckdb-relational' into du…
raisadz Dec 29, 2024
b1b9230
add replace_all
raisadz Dec 29, 2024
e3bca42
strip all white spaces
raisadz Dec 29, 2024
b122af7
Merge pull request #4 from raisadz/duckdb-relational-strip-chars
MarcoGorelli Dec 29, 2024
42ce7b2
Merge branch 'duckdb-relational' of github.com:MarcoGorelli/narwhals …
MarcoGorelli Dec 29, 2024
81a9241
Merge remote-tracking branch 'upstream/main' into duckdb-relational
MarcoGorelli Dec 29, 2024
07a5232
add notimplemented for `replace` and literal is only True for `replac…
raisadz Dec 29, 2024
bd262d0
Merge remote-tracking branch 'marcogorelli/duckdb-relational' into du…
raisadz Dec 29, 2024
fc5303f
som exfails
MarcoGorelli Dec 29, 2024
364ae0d
raise notimplemetederror instead of typeerror
raisadz Dec 29, 2024
a34f936
unique
MarcoGorelli Dec 29, 2024
2abf875
wip
MarcoGorelli Dec 29, 2024
ca1c643
Merge pull request #5 from raisadz/duckdb-relational-replace-all
MarcoGorelli Dec 30, 2024
899eb89
yay xfail less
MarcoGorelli Dec 30, 2024
d97270d
Merge remote-tracking branch 'upstream/main' into duckdb-relational
MarcoGorelli Dec 30, 2024
e80c568
duckdb unique
MarcoGorelli Dec 31, 2024
c45767c
sort out unique
MarcoGorelli Dec 31, 2024
00a74dd
Merge remote-tracking branch 'upstream/main' into duckdb-relational
MarcoGorelli Jan 4, 2025
bbace71
fixup duckdb_test
MarcoGorelli Jan 4, 2025
6e9c598
datetime attributes
MarcoGorelli Jan 4, 2025
0af5293
var std working
MarcoGorelli Jan 4, 2025
16b2627
Merge remote-tracking branch 'upstream/main' into duckdb-relational
MarcoGorelli Jan 4, 2025
9994ed1
wip
MarcoGorelli Jan 4, 2025
431ab4f
wip
MarcoGorelli Jan 4, 2025
2a6a9f5
Merge remote-tracking branch 'upstream/main' into duckdb-relational
MarcoGorelli Jan 5, 2025
0e30dd8
Merge remote-tracking branch 'upstream/main' into duckdb-relational
MarcoGorelli Jan 5, 2025
c48aa78
wip
MarcoGorelli Jan 5, 2025
4aab7ca
fixup scalars
MarcoGorelli Jan 5, 2025
1bdc16a
keep going
MarcoGorelli Jan 5, 2025
ef61772
reductions test
MarcoGorelli Jan 5, 2025
be617a3
expressified clip
MarcoGorelli Jan 5, 2025
69d0255
go further
MarcoGorelli Jan 5, 2025
30df0b1
all expr tests passing
MarcoGorelli Jan 5, 2025
0a8fcc0
more tests
MarcoGorelli Jan 5, 2025
9bb16c6
more
MarcoGorelli Jan 5, 2025
88d228a
get all tests green :broccoli:
MarcoGorelli Jan 5, 2025
2c73c9a
get all tests green :broccoli:
MarcoGorelli Jan 5, 2025
feca043
document
MarcoGorelli Jan 5, 2025
5ca717a
update docs
MarcoGorelli Jan 5, 2025
60c1897
fixup conftest
MarcoGorelli Jan 5, 2025
b57c7f8
fixup conftest
MarcoGorelli Jan 5, 2025
d1fad9f
importorskip
MarcoGorelli Jan 5, 2025
a25f07d
fix docs
MarcoGorelli Jan 5, 2025
9c8fcfd
fixup test
MarcoGorelli Jan 5, 2025
fbccfc8
docs
MarcoGorelli Jan 5, 2025
f36dece
remove maintain_order for duckdb
MarcoGorelli Jan 5, 2025
eba33b7
fixup docs and tpch
MarcoGorelli Jan 5, 2025
e749a98
unique test
MarcoGorelli Jan 5, 2025
89e58d5
coverage
MarcoGorelli Jan 5, 2025
bf373af
coverage
MarcoGorelli Jan 5, 2025
adb3db8
coverage
MarcoGorelli Jan 5, 2025
a2578d8
dask
MarcoGorelli Jan 5, 2025
a8cfa91
cov
MarcoGorelli Jan 5, 2025
8b6ef7c
Merge remote-tracking branch 'upstream/main' into duckdb-relational
MarcoGorelli Jan 5, 2025
5c70cc6
Merge remote-tracking branch 'upstream/main' into duckdb-relational
MarcoGorelli Jan 5, 2025
2c28b6f
reduce diff
MarcoGorelli Jan 5, 2025
f3733d2
Merge remote-tracking branch 'upstream/main' into duckdb-relational
MarcoGorelli Jan 5, 2025
32016e4
simplify
MarcoGorelli Jan 5, 2025
d8c8919
catch missing pyarrow in collect
MarcoGorelli Jan 5, 2025
60979a4
simplify
MarcoGorelli Jan 5, 2025
5342308
Merge remote-tracking branch 'upstream/main' into duckdb-relational
MarcoGorelli Jan 6, 2025
d8247ac
fix returns_scalar in abs
MarcoGorelli Jan 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@
Extremely lightweight and extensible compatibility layer between dataframe libraries!

- **Full API support**: cuDF, Modin, pandas, Polars, PyArrow
- **Lazy-only support**: Dask
- **Interchange-level support**: DuckDB, Ibis, Vaex, anything which implements the DataFrame Interchange Protocol
- **Lazy-only support**: Dask. Work in progress: DuckDB, Ibis, PySpark.

Seamlessly support all, without depending on any!

Expand Down
4 changes: 4 additions & 0 deletions docs/backcompat.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,10 @@ before making any change.

### After `stable.v1`


- Since Narwhals 1.21, passing a `DuckDBPyRelation` to `from_native` returns a `LazyFrame`. In
`narwhals.stable.v1`, it returns a `DataFrame` with `level='interchange'`.

- Since Narwhals 1.15, `Series` is generic in the native Series, meaning that you can
write:
```python
Expand Down
16 changes: 11 additions & 5 deletions docs/basics/dataframe_conversion.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ To illustrate, we create dataframes in various formats:
```python exec="1" source="above" session="conversion"
import narwhals as nw
from narwhals.typing import IntoDataFrame
from typing import Any

import duckdb
import polars as pl
Expand Down Expand Up @@ -45,11 +46,15 @@ print(df_to_pandas(df_polars))

### Via PyCapsule Interface

Similarly, if your library uses Polars internally, you can convert any user-supplied dataframe to Polars format using Narwhals.
Similarly, if your library uses Polars internally, you can convert any user-supplied dataframe
which implements `__arrow_c_stream__`:

```python exec="1" source="above" session="conversion" result="python"
def df_to_polars(df: IntoDataFrame) -> pl.DataFrame:
return nw.from_arrow(nw.from_native(df), native_namespace=pl).to_native()
def df_to_polars(df_native: Any) -> pl.DataFrame:
if hasattr(df_native, "__arrow_c_stream__"):
return nw.from_arrow(df_native, native_namespace=pl).to_native()
msg = f"Expected object which implements '__arrow_c_stream__' got: {type(df)}"
raise TypeError(msg)


print(df_to_polars(df_duckdb)) # You can only execute this line of code once.
Expand All @@ -66,8 +71,9 @@ If you need to ingest the same dataframe multiple times, then you may want to go
This may be less efficient than the PyCapsule approach above (and always requires PyArrow!), but is more forgiving:

```python exec="1" source="above" session="conversion" result="python"
def df_to_polars(df: IntoDataFrame) -> pl.DataFrame:
return pl.DataFrame(nw.from_native(df).to_arrow())
def df_to_polars(df_native: IntoDataFrame) -> pl.DataFrame:
df = nw.from_native(df_native).lazy().collect()
return pl.DataFrame(nw.from_native(df, eager_only=True).to_arrow())


df_duckdb = duckdb.sql("SELECT * FROM df_polars")
Expand Down
7 changes: 3 additions & 4 deletions docs/extending.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,16 @@ Currently, Narwhals has **full API** support for the following libraries:
It also has **lazy-only** support for [Dask](https://github.com/dask/dask), and **interchange** support
for [DuckDB](https://github.com/duckdb/duckdb) and [Ibis](https://github.com/ibis-project/ibis).

We are working towards full "lazy-only" support for DuckDB, Ibis, and PySpark.

### Levels of support

Narwhals comes with three levels of support:

- **Full API support**: cuDF, Modin, pandas, Polars, PyArrow
- **Lazy-only support**: Dask
- **Lazy-only support**: Dask. Work in progress: DuckDB, Ibis, PySpark.
- **Interchange-level support**: DuckDB, Ibis, Vaex, anything which implements the DataFrame Interchange Protocol

The lazy-only layer is a major item on our 2025 roadmap, and hope to be able to bring libraries currently in
the "interchange" level into that one.

Libraries for which we have full support can benefit from the whole
[Narwhals API](./api-reference/index.md).

Expand Down
4 changes: 4 additions & 0 deletions narwhals/_arrow/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from narwhals._arrow.utils import validate_dataframe_comparand
from narwhals._expression_parsing import evaluate_into_exprs
from narwhals.dependencies import is_numpy_array
from narwhals.exceptions import ColumnNotFoundError
from narwhals.utils import Implementation
from narwhals.utils import flatten
from narwhals.utils import generate_temporary_column_name
Expand Down Expand Up @@ -669,6 +670,9 @@ def unique(
import pyarrow.compute as pc

df = self._native_frame
if subset is not None and any(x not in self.columns for x in subset):
msg = f"Column(s) {subset} not found in {self.columns}"
raise ColumnNotFoundError(msg)
Comment on lines +673 to +675
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we create a check_columns_exist function in narwhals.utils so we can reuse everywhere else? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sure!

subset = subset or self.columns

if keep in {"any", "first", "last"}:
Expand Down
4 changes: 4 additions & 0 deletions narwhals/_dask/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from narwhals._dask.utils import parse_exprs_and_named_exprs
from narwhals._pandas_like.utils import native_to_narwhals_dtype
from narwhals._pandas_like.utils import select_columns_by_name
from narwhals.exceptions import ColumnNotFoundError
from narwhals.typing import CompliantLazyFrame
from narwhals.utils import Implementation
from narwhals.utils import flatten
Expand Down Expand Up @@ -197,6 +198,9 @@ def unique(
*,
keep: Literal["any", "none"] = "any",
) -> Self:
if subset is not None and any(x not in self.columns for x in subset):
msg = f"Column(s) {subset} not found in {self.columns}"
raise ColumnNotFoundError(msg)
native_frame = self._native_frame
if keep == "none":
subset = subset or self.columns
Expand Down
Loading
Loading