Skip to content

Commit 497d490

Browse files
committed
Updating documentation
1 parent 6ce4cfe commit 497d490

File tree

3 files changed

+44
-8
lines changed

3 files changed

+44
-8
lines changed

docs/source/user-guide/common-operations/expressions.rst

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,43 @@ examples for the and, or, and not operations.
6060
heavy_red_units = (col("color") == lit("red")) & (col("weight") > lit(42))
6161
not_red_units = ~(col("color") == lit("red"))
6262
63+
Arrays
64+
------
65+
66+
For columns that contain arrays of values, you can access individual elements of the array by index
67+
using bracket indexing. This is similar to callling the function
68+
:py:func:`datafusion.functions.array_element`, except that array indexing using brackets is 0 based,
69+
similar to Python arrays and ``array_element`` is 1 based indexing to be compatible with other SQL
70+
approaches.
71+
72+
.. ipython:: python
73+
74+
from datafusion import SessionContext, col
75+
76+
ctx = SessionContext()
77+
df = ctx.from_pydict({"a": [[1, 2, 3], [4, 5, 6]]})
78+
df.select(col("a")[0].alias("a0"))
79+
80+
81+
.. warning::
82+
83+
Indexing an element of an array via ``[]`` starts at index 0 whereas
84+
:py:func:`~datafusion.functions.array_element` starts at index 1.
85+
86+
Structs
87+
-------
88+
89+
Columns that contain struct elements can be accessed using the bracket notation as if they were
90+
Python dictionary style objects. This expects a string key as the parameter passed.
91+
92+
.. ipython:: python
93+
94+
ctx = SessionContext()
95+
data = {"a": [{"size": 15, "color": "green"}, {"size": 10, "color": "blue"}]}
96+
df = ctx.from_pydict(data)
97+
df.select(col("a")["size"].alias("a_size"))
98+
99+
63100
Functions
64101
---------
65102

python/datafusion/dataframe.py

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -548,17 +548,15 @@ def __arrow_c_stream__(self, requested_schema: pa.Schema) -> Any:
548548
def transform(self, func: Callable[..., DataFrame], *args: Any) -> DataFrame:
549549
"""Apply a function to the current DataFrame which returns another DataFrame.
550550
551-
This is useful for chaining together multiple functions. For example
551+
This is useful for chaining together multiple functions. For example::
552552
553-
```python
554-
def add_3(df: DataFrame) -> DataFrame:
555-
return df.with_column("modified", lit(3))
553+
def add_3(df: DataFrame) -> DataFrame:
554+
return df.with_column("modified", lit(3))
556555
557-
def within_limit(df: DataFrame, limit: int) -> DataFrame:
558-
return df.filter(col("a") < lit(limit)).distinct()
556+
def within_limit(df: DataFrame, limit: int) -> DataFrame:
557+
return df.filter(col("a") < lit(limit)).distinct()
559558
560-
df = df.transform(modify_df).transform(within_limit, 4)
561-
```
559+
df = df.transform(modify_df).transform(within_limit, 4)
562560
563561
Args:
564562
func: A callable function that takes a DataFrame as it's first argument

python/datafusion/tests/test_dataframe.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -876,6 +876,7 @@ def test_dataframe_export(df) -> None:
876876
failed_convert = True
877877
assert failed_convert
878878

879+
879880
def test_dataframe_transform(df):
880881
def add_string_col(df_internal) -> DataFrame:
881882
return df_internal.with_column("string_col", literal("string data"))

0 commit comments

Comments
 (0)