You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Added support for Polars DataFrame and LazyFrame (#1614)
Polars (https://pola.rs) is an open-source library for data
manipulation, known for being one of the fastest data processing
solutions on a single machine. It features a well-structured, typed API
that is both expressive and easy to use.
this chnage is a simple 'to_polars' addiotn to the table api.
iceberg_table = catalog.load_table('data.data_points')
pdf = iceberg_table.scan().to_polars()
print(pdf)
---------
Co-authored-by: yigal.rozenberg <[email protected]>
Co-authored-by: Kevin Liu <[email protected]>
Co-authored-by: Kevin Liu <[email protected]>
Copy file name to clipboardExpand all lines: mkdocs/docs/api.md
+136Lines changed: 136 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1546,3 +1546,139 @@ df.show(2)
1546
1546
1547
1547
(Showing first 2 rows)
1548
1548
```
1549
+
1550
+
### Polars
1551
+
1552
+
PyIceberg interfaces closely with Polars Dataframes and LazyFrame which provides a full lazily optimized query engine interface on top of PyIceberg tables.
1553
+
1554
+
<!-- prettier-ignore-start -->
1555
+
1556
+
!!! note "Requirements"
1557
+
This requires [`polars` to be installed](index.md).
1558
+
1559
+
```python
1560
+
pip install pyiceberg['polars']
1561
+
```
1562
+
<!-- prettier-ignore-end -->
1563
+
1564
+
PyIceberg data can be analyzed and accessed through Polars using either DataFrame or LazyFrame.
1565
+
If your code utilizes the Apache Iceberg data scanning and retrieval API and then analyzes the resulting DataFrame in Polars, use the `table.scan().to_polars()` API.
1566
+
If the intent is to utilize Polars' high-performance filtering and retrieval functionalities, use LazyFrame exported from the Iceberg table with the `table.to_polars()` API.
1567
+
1568
+
```python
1569
+
# Get LazyFrame
1570
+
iceberg_table.to_polars()
1571
+
1572
+
# Get Data Frame
1573
+
iceberg_table.scan().to_polars()
1574
+
```
1575
+
1576
+
#### Working with Polars DataFrame
1577
+
1578
+
PyIceberg makes it easy to filter out data from a huge table and pull it into a Polars dataframe locally. This will only fetch the relevant Parquet files for the query and apply the filter. This will reduce IO and therefore improve performance and reduce cost.
0 commit comments