-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add flatten
array function
#562
Conversation
Hello @andygrove do you mind giving me a hand with this PR ? I exposed
|
Hi @mobley-trent # build and install package
maturin develop Also, don't forget to active the venv before this command. |
Hey @ongchi I tested the from datafusion import SessionContext, column
from datafusion import functions as f
import numpy as np
import pyarrow as pa
def py_flatten(arr):
# Testing helper function
result = []
for elem in arr:
if isinstance(elem, list):
result.extend(py_flatten(elem))
else:
result.append(elem)
return result
ctx = SessionContext()
data = [[1.0, 2.0, 3.0], [4.0, 5.0], [6.0]]
batch = pa.RecordBatch.from_arrays(
[np.array(data, dtype=object)], names=["arr"]
)
df = ctx.create_dataframe([[batch]])
col = column("arr")
stmt = f.flatten(col)
py_expr = lambda: [py_flatten(data)]
result = df.select(stmt).collect()[0].column(0).tolist()
print(f"flatten query: {result}")
print(f"py_expr: {py_expr()}") Results:
I expected the flatten query to be identical to the |
Using a regular ctx = SessionContext()
ctx.sql("select flatten([[1.0, 2.0, 3.0], [4.0, 5.0], [6.0]]);") Result:
|
Hi @mobley-trent
It's contains of multiple rows of one-dimensional array values. For the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mobley-trent
Fixed the merge conflicts |
Which issue does this PR close?
Refer to issue #463
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?