Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(python): Overview of available SQL functions #16268

Merged
merged 29 commits into from
Jun 1, 2024
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
d7c09c3
initial push to create SQL functions overview
r-brink May 16, 2024
980f18d
add tables to page, reformat functions on page
r-brink May 28, 2024
9b85f21
add sql operations overview page
r-brink May 28, 2024
f0c771a
rework page hierarchy
alexander-beedie May 28, 2024
6c86a9c
update index pages
alexander-beedie May 28, 2024
db90903
tabs/spaces
alexander-beedie May 28, 2024
0c9327a
apply grid layout to index pages, break out set operations
alexander-beedie May 28, 2024
86c37b2
add ref for `extract`
alexander-beedie May 28, 2024
c9fefdb
shuffle to set alphabetic order
r-brink May 29, 2024
ecf7979
add more extensive examples and output for sql clauses
r-brink May 29, 2024
a07c2f4
examples for aggregate
r-brink May 29, 2024
4773ded
adding array_upper and array_lower
r-brink May 29, 2024
0d9e01e
add conditional examples
r-brink May 29, 2024
e0d051e
add string and temporal
r-brink May 29, 2024
f00abef
math examples
r-brink May 29, 2024
126edab
set operations
r-brink May 29, 2024
9df0b1b
update union examples
alexander-beedie May 29, 2024
8572370
update a few other examples
alexander-beedie May 29, 2024
0643e2d
add example column aliases
alexander-beedie May 29, 2024
bb122cf
update string examples
alexander-beedie May 29, 2024
75a876a
update temporal examples
alexander-beedie May 29, 2024
2bcdaee
update clause examples
alexander-beedie May 29, 2024
62abc68
update string length example (contrast with octet_length)
alexander-beedie May 29, 2024
6db6aca
tweak example rendering (comment output)
alexander-beedie May 29, 2024
386550e
trigonometry examples
r-brink May 30, 2024
0c56e4c
fix small layout issue
r-brink May 30, 2024
77fca84
minor update
alexander-beedie May 30, 2024
208a10d
unnest table function
alexander-beedie May 31, 2024
e0e3009
de-dupe `unnest` label
alexander-beedie Jun 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion py-polars/docs/source/reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ methods. All classes and functions exposed in the ``polars.*`` namespace are pub
.. toctree::
:maxdepth: 2

sql
sql/index

.. grid-item-card::

Expand Down
311 changes: 311 additions & 0 deletions py-polars/docs/source/reference/sql/clauses.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,311 @@
SQL Clauses
===========

.. list-table::
:header-rows: 1
:widths: 20 60

* - Function
- Description
* - :ref:`SELECT <select>`
- Retrieves specific column data from one or more tables.
* - :ref:`FROM <from>`
- Specify the table(s) from which to retrieve or delete data.
* - :ref:`JOIN <join>`
- Combine rows from two or more tables based on a related column.
* - :ref:`WHERE <where>`
- Filter rows returned from the query based on specific condition(s).
* - :ref:`GROUP BY <group_by>`
- Aggregate row values based based on one or more key columns.
* - :ref:`HAVING <having>`
- Filter groups in a `GROUP BY` based on specific condition(s).
* - :ref:`ORDER BY <order_by>`
- Sort the query result based on one or more specified columns.
* - :ref:`LIMIT <limit>`
- Specify the number of rows returned.
* - :ref:`OFFSET <offset>`
- Skip a specified number of rows.


.. _select:

SELECT
------
Select the columns to be returned by the query.

**Example:**

.. code-block:: python

df = pl.DataFrame(
{
"a": [1, 2, 3],
"b": ["zz", "yy", "xx"],
}
)
df.sql("""
SELECT a, b FROM self
""")
# shape: (3, 2)
# ┌─────┬─────┐
# │ a ┆ b │
# │ --- ┆ --- │
# │ i64 ┆ str │
# ╞═════╪═════╡
# │ 1 ┆ zz │
# │ 2 ┆ yy │
# │ 3 ┆ xx │
# └─────┴─────┘

.. _from:

FROM
----
Specifies the table(s) from which to retrieve or delete data.

**Example:**

.. code-block:: python

df = pl.DataFrame(
{
"a": [1, 2, 3],
"b": ["zz", "yy", "xx"],
}
)
df.sql("""
SELECT * FROM self
""")
# shape: (3, 2)
# ┌─────┬─────┐
# │ a ┆ b │
# │ --- ┆ --- │
# │ i64 ┆ str │
# ╞═════╪═════╡
# │ 1 ┆ zz │
# │ 2 ┆ yy │
# │ 3 ┆ xx │
# └─────┴─────┘

.. _join:

JOIN
----
Combines rows from two or more tables based on a related column.

**Join Types**

* `CROSS JOIN`
* `FULL JOIN`
* `INNER JOIN`
* `LEFT JOIN`
* `[LEFT] ANTI JOIN`
* `[LEFT] SEMI JOIN`
* `RIGHT ANTI JOIN`
* `RIGHT SEMI JOIN`

**Example:**

.. code-block:: python

df1 = pl.DataFrame(
{
"foo": [1, 2, 3],
"ham": ["a", "b", "c"],
}
)
df2 = pl.DataFrame(
{
"apple": ["x", "y", "z"],
"ham": ["a", "b", "d"],
}
)
pl.sql("""
SELECT foo, apple, COALESCE(df1.ham, df2.ham) AS ham
FROM df1 FULL JOIN df2
USING (ham)
""").collect()

# shape: (4, 3)
# ┌──────┬───────┬─────┐
# │ foo ┆ apple ┆ ham │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ str ┆ str │
# ╞══════╪═══════╪═════╡
# │ 1 ┆ x ┆ a │
# │ 2 ┆ y ┆ b │
# │ null ┆ z ┆ d │
# │ 3 ┆ null ┆ c │
# └──────┴───────┴─────┘

.. _where:

WHERE
-----

Filter rows returned from the query based on specific condition(s).

.. code-block:: python

df = pl.DataFrame(
{
"foo": [30, 40, 50],
"ham": ["a", "b", "c"],
}
)
df.sql("""
SELECT * FROM self WHERE foo > 42
""")
# shape: (1, 2)
# ┌─────┬─────┐
# │ foo ┆ ham │
# │ --- ┆ --- │
# │ i64 ┆ str │
# ╞═════╪═════╡
# │ 50 ┆ c │
# └─────┴─────┘

.. _group_by:

GROUP BY
--------
Group rows that have the same values in specified columns into summary rows.

**Example:**

.. code-block:: python

df = pl.DataFrame(
{
"foo": ["a", "b", "b"],
"bar": [10, 20, 30],
}
)
df.sql("""
SELECT foo, SUM(bar) FROM self GROUP BY foo
""")
# shape: (2, 2)
# ┌─────┬─────┐
# │ foo ┆ bar │
# │ --- ┆ --- │
# │ str ┆ i64 │
# ╞═════╪═════╡
# │ b ┆ 50 │
# │ a ┆ 10 │
# └─────┴─────┘

.. _having:

HAVING
------
Filter groups in a `GROUP BY` based on specific condition(s).

.. code-block:: python

df = pl.DataFrame(
{
"foo": ["a", "b", "b", "c"],
"bar": [10, 20, 30, 40],
}
)
df.sql("""
SELECT foo, SUM(bar) FROM self GROUP BY foo HAVING bar >= 40
""")
# shape: (2, 2)
# ┌─────┬─────┐
# │ foo ┆ bar │
# │ --- ┆ --- │
# │ str ┆ i64 │
# ╞═════╪═════╡
# │ c ┆ 40 │
# │ b ┆ 50 │
# └─────┴─────┘

.. _order_by:

ORDER BY
--------
Sort the query result based on one or more specified columns.

**Example:**

.. code-block:: python

df = pl.DataFrame(
{
"foo": ["b", "a", "c", "b"],
"bar": [20, 10, 40, 30],
}
)
df.sql("""
SELECT foo, bar FROM self ORDER BY bar DESC
""")
# shape: (4, 2)
# ┌─────┬─────┐
# │ foo ┆ bar │
# │ --- ┆ --- │
# │ str ┆ i64 │
# ╞═════╪═════╡
# │ c ┆ 40 │
# │ b ┆ 30 │
# │ b ┆ 20 │
# │ a ┆ 10 │
# └─────┴─────┘

.. _limit:

LIMIT
-----
Limit the number of rows returned by the query.

**Example:**

.. code-block:: python

df = pl.DataFrame(
{
"foo": ["b", "a", "c", "b"],
"bar": [20, 10, 40, 30],
}
)
df.sql("""
SELECT foo, bar FROM self LIMIT 2
""")
# shape: (2, 2)
# ┌─────┬─────┐
# │ foo ┆ bar │
# │ --- ┆ --- │
# │ str ┆ i64 │
# ╞═════╪═════╡
# │ b ┆ 20 │
# │ a ┆ 10 │
# └─────┴─────┘

.. _offset:

OFFSET
------
Skip a number of rows before starting to return rows from the query.

**Example:**

.. code-block:: python

df = pl.DataFrame(
{
"foo": ["b", "a", "c", "b"],
"bar": [20, 10, 40, 30],
}
)
df.sql("""
SELECT foo, bar FROM self LIMIT 2 OFFSET 2
""")
# shape: (2, 2)
# ┌─────┬─────┐
# │ foo ┆ bar │
# │ --- ┆ --- │
# │ str ┆ i64 │
# ╞═════╪═════╡
# │ c ┆ 40 │
# │ b ┆ 30 │
# └─────┴─────┘
Loading
Loading