Skip to content

Commit 08b7856

Browse files
committed
enhancement(docs): Add user guide (#432)
1 parent 1fde8e4 commit 08b7856

21 files changed

+988
-225
lines changed

docs/requirements.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,5 @@ sphinx==5.3.0
1919
pydata-sphinx-theme==0.8.0
2020
myst-parser
2121
maturin
22-
jinja2
22+
jinja2
23+
ipython

docs/source/api.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,6 @@ API Reference
2424
.. toctree::
2525
:maxdepth: 2
2626

27-
api/config
2827
api/dataframe
2928
api/execution_context
3029
api/expression

docs/source/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@
5252
"sphinx.ext.viewcode",
5353
"sphinx.ext.napoleon",
5454
"myst_parser",
55+
"IPython.sphinxext.ipython_directive",
5556
]
5657

5758
source_suffix = {
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
.. Licensed to the Apache Software Foundation (ASF) under one
2+
.. or more contributor license agreements. See the NOTICE file
3+
.. distributed with this work for additional information
4+
.. regarding copyright ownership. The ASF licenses this file
5+
.. to you under the Apache License, Version 2.0 (the
6+
.. "License"); you may not use this file except in compliance
7+
.. with the License. You may obtain a copy of the License at
8+
9+
.. http://www.apache.org/licenses/LICENSE-2.0
10+
11+
.. Unless required by applicable law or agreed to in writing,
12+
.. software distributed under the License is distributed on an
13+
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
.. KIND, either express or implied. See the License for the
15+
.. specific language governing permissions and limitations
16+
.. under the License.
17+
18+
Introduction
19+
============
20+
We welcome and encourage contributions of all kinds, such as:
21+
22+
1. Tickets with issue reports of feature requests
23+
2. Documentation improvements
24+
3. Code, both PR and (especially) PR Review.
25+
26+
In addition to submitting new PRs, we have a healthy tradition of community members reviewing each other’s PRs.
27+
Doing so is a great way to help the community as well as get more familiar with Rust and the relevant codebases.
28+
29+
How to develop
30+
--------------
31+
32+
This assumes that you have rust and cargo installed. We use the workflow recommended by `pyo3 <https://github.com/PyO3/pyo3>`_ and `maturin <https://github.com/PyO3/maturin>`_.
33+
34+
Bootstrap:
35+
36+
.. code-block:: shell
37+
38+
# fetch this repo
39+
git clone [email protected]:apache/arrow-datafusion-python.git
40+
# prepare development environment (used to build wheel / install in development)
41+
python3 -m venv venv
42+
# activate the venv
43+
source venv/bin/activate
44+
# update pip itself if necessary
45+
python -m pip install -U pip
46+
# install dependencies (for Python 3.8+)
47+
python -m pip install -r requirements-310.txt
48+
49+
The tests rely on test data in git submodules.
50+
51+
.. code-block:: shell
52+
53+
git submodule init
54+
git submodule update
55+
56+
57+
Whenever rust code changes (your changes or via `git pull`):
58+
59+
.. code-block:: shell
60+
61+
# make sure you activate the venv using "source venv/bin/activate" first
62+
maturin develop
63+
python -m pytest
64+
65+
66+
Update Dependencies
67+
-------------------
68+
69+
To change test dependencies, change the `requirements.in` and run
70+
71+
.. code-block:: shell
72+
73+
# install pip-tools (this can be done only once), also consider running in venv
74+
python -m pip install pip-tools
75+
python -m piptools compile --generate-hashes -o requirements-310.txt
76+
77+
78+
To update dependencies, run with `-U`
79+
80+
.. code-block:: shell
81+
82+
python -m piptools compile -U --generate-hashes -o requirements-310.txt
83+
84+
85+
More details about pip-tools `here <https://github.com/jazzband/pip-tools>`_

docs/source/index.rst

Lines changed: 38 additions & 223 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,17 @@ Its query engine, DataFusion, is written in `Rust <https://www.rust-lang.org>`_,
3131

3232
Technically, zero-copy is achieved via the `c data interface <https://arrow.apache.org/docs/format/CDataInterface.html>`_.
3333

34-
How to use it
35-
=============
34+
Install
35+
-------
3636

37-
Simple usage:
37+
.. code-block:: shell
38+
39+
pip install datafusion
3840
39-
.. code-block:: python
41+
Example
42+
-------
43+
44+
.. ipython:: python
4045
4146
import datafusion
4247
from datafusion import col
@@ -58,234 +63,44 @@ Simple usage:
5863
col("a") - col("b"),
5964
)
6065
61-
# execute and collect the first (and only) batch
62-
result = df.collect()[0]
63-
64-
assert result.column(0) == pyarrow.array([5, 7, 9])
65-
assert result.column(1) == pyarrow.array([-3, -3, -3])
66-
67-
68-
We can also execute a query against data stored in CSV
69-
70-
.. code-block:: bash
71-
72-
echo "a,b\n1,4\n2,5\n3,6" > example.csv
73-
74-
75-
.. code-block:: python
76-
77-
import datafusion
78-
from datafusion import col
79-
import pyarrow
80-
81-
# create a context
82-
ctx = datafusion.SessionContext()
83-
84-
# register a CSV
85-
ctx.register_csv('example', 'example.csv')
86-
87-
# create a new statement
88-
df = ctx.table('example').select(
89-
col("a") + col("b"),
90-
col("a") - col("b"),
91-
)
92-
93-
# execute and collect the first (and only) batch
94-
result = df.collect()[0]
95-
96-
assert result.column(0) == pyarrow.array([5, 7, 9])
97-
assert result.column(1) == pyarrow.array([-3, -3, -3])
98-
99-
100-
And how to execute a query against a CSV using SQL:
101-
102-
103-
.. code-block:: python
104-
105-
import datafusion
106-
from datafusion import col
107-
import pyarrow
108-
109-
# create a context
110-
ctx = datafusion.SessionContext()
111-
112-
# register a CSV
113-
ctx.register_csv('example', 'example.csv')
114-
115-
# create a new statement via SQL
116-
df = ctx.sql("SELECT a+b, a-b FROM example")
117-
118-
# execute and collect the first (and only) batch
119-
result = df.collect()[0]
120-
121-
assert result.column(0) == pyarrow.array([5, 7, 9])
122-
assert result.column(1) == pyarrow.array([-3, -3, -3])
123-
124-
125-
126-
UDFs
127-
----
128-
129-
.. code-block:: python
130-
131-
import pyarrow
132-
from datafusion import udf
133-
134-
def is_null(array: pyarrow.Array) -> pyarrow.Array:
135-
return array.is_null()
136-
137-
is_null_arr = udf(is_null, [pyarrow.int64()], pyarrow.bool_(), 'stable')
138-
139-
# create a context
140-
ctx = datafusion.SessionContext()
141-
142-
# create a RecordBatch and a new DataFrame from it
143-
batch = pyarrow.RecordBatch.from_arrays(
144-
[pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])],
145-
names=["a", "b"],
146-
)
147-
df = ctx.create_dataframe([[batch]])
148-
149-
df = df.select(is_null_arr(col("a")))
150-
151-
result = df.collect()[0]
152-
153-
assert result.column(0) == pyarrow.array([False] * 3)
154-
66+
df
15567
156-
UDAF
157-
----
158-
159-
.. code-block:: python
160-
161-
import pyarrow
162-
import pyarrow.compute
163-
import datafusion
164-
from datafusion import udaf, Accumulator
165-
from datafusion import col
166-
167-
168-
class MyAccumulator(Accumulator):
169-
"""
170-
Interface of a user-defined accumulation.
171-
"""
172-
def __init__(self):
173-
self._sum = pyarrow.scalar(0.0)
174-
175-
def update(self, values: pyarrow.Array) -> None:
176-
# not nice since pyarrow scalars can't be summed yet. This breaks on `None`
177-
self._sum = pyarrow.scalar(self._sum.as_py() + pyarrow.compute.sum(values).as_py())
178-
179-
def merge(self, states: pyarrow.Array) -> None:
180-
# not nice since pyarrow scalars can't be summed yet. This breaks on `None`
181-
self._sum = pyarrow.scalar(self._sum.as_py() + pyarrow.compute.sum(states).as_py())
182-
183-
def state(self) -> pyarrow.Array:
184-
return pyarrow.array([self._sum.as_py()])
185-
186-
def evaluate(self) -> pyarrow.Scalar:
187-
return self._sum
188-
189-
# create a context
190-
ctx = datafusion.SessionContext()
191-
192-
# create a RecordBatch and a new DataFrame from it
193-
batch = pyarrow.RecordBatch.from_arrays(
194-
[pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])],
195-
names=["a", "b"],
196-
)
197-
df = ctx.create_dataframe([[batch]])
198-
199-
my_udaf = udaf(MyAccumulator, pyarrow.float64(), pyarrow.float64(), [pyarrow.float64()], 'stable')
200-
201-
df = df.aggregate(
202-
[],
203-
[my_udaf(col("a"))]
204-
)
205-
206-
result = df.collect()[0]
207-
208-
assert result.column(0) == pyarrow.array([6.0])
209-
210-
How to install (from pip)
211-
=========================
212-
213-
.. code-block:: shell
214-
215-
pip install datafusion
216-
217-
You can verify the installation by running:
218-
219-
.. code-block:: python
220-
221-
>>> import datafusion
222-
>>> datafusion.__version__
223-
'0.6.0'
224-
225-
226-
How to develop
227-
==============
228-
229-
This assumes that you have rust and cargo installed. We use the workflow recommended by `pyo3 <https://github.com/PyO3/pyo3>`_ and `maturin <https://github.com/PyO3/maturin>`_.
230-
231-
Bootstrap:
232-
233-
.. code-block:: shell
234-
235-
# fetch this repo
236-
git clone [email protected]:apache/arrow-datafusion-python.git
237-
# prepare development environment (used to build wheel / install in development)
238-
python3 -m venv venv
239-
# activate the venv
240-
source venv/bin/activate
241-
# update pip itself if necessary
242-
python -m pip install -U pip
243-
# install dependencies (for Python 3.8+)
244-
python -m pip install -r requirements-310.txt
245-
246-
The tests rely on test data in git submodules.
247-
248-
.. code-block:: shell
249-
250-
git submodule init
251-
git submodule update
252-
253-
254-
Whenever rust code changes (your changes or via `git pull`):
255-
256-
.. code-block:: shell
257-
258-
# make sure you activate the venv using "source venv/bin/activate" first
259-
maturin develop
260-
python -m pytest
261-
262-
263-
How to update dependencies
264-
==========================
265-
266-
To change test dependencies, change the `requirements.in` and run
267-
268-
.. code-block:: shell
269-
270-
# install pip-tools (this can be done only once), also consider running in venv
271-
python -m pip install pip-tools
272-
python -m piptools compile --generate-hashes -o requirements-310.txt
27368
69+
.. _toc.links:
70+
.. toctree::
71+
:hidden:
72+
:maxdepth: 1
73+
:caption: LINKS
27474

275-
To update dependencies, run with `-U`
75+
Github and Issue Tracker <https://github.com/apache/arrow-datafusion-python>
76+
Rust's API Docs <https://docs.rs/datafusion/latest/datafusion/>
77+
Code of conduct <https://github.com/apache/arrow-datafusion/blob/main/CODE_OF_CONDUCT.md>
27678

277-
.. code-block:: shell
278-
279-
python -m piptools compile -U --generate-hashes -o requirements-310.txt
79+
.. _toc.guide:
80+
.. toctree::
81+
:hidden:
82+
:maxdepth: 1
83+
:caption: USER GUIDE
28084

85+
user-guide/introduction
86+
user-guide/basics
87+
user-guide/common-operations/index
88+
user-guide/io/index
89+
user-guide/sql
28190

282-
More details about pip-tools `here <https://github.com/jazzband/pip-tools>`_
28391

92+
.. _toc.contributor_guide:
93+
.. toctree::
94+
:hidden:
95+
:maxdepth: 1
96+
:caption: CONTRIBUTOR GUIDE
28497

285-
API reference
286-
=============
98+
contributor-guide/introduction
28799

100+
.. _toc.api:
288101
.. toctree::
289-
:maxdepth: 2
102+
:hidden:
103+
:maxdepth: 1
104+
:caption: API
290105

291106
api

0 commit comments

Comments
 (0)