Skip to content

Commit

Permalink
Add histogram statistic documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
ZacBlanco committed Oct 4, 2024
1 parent 1987924 commit eae161e
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 14 deletions.
12 changes: 12 additions & 0 deletions presto-docs/src/main/sphinx/admin/properties.rst
Original file line number Diff line number Diff line change
Expand Up @@ -863,6 +863,18 @@ on a per-query basis using the ``treat-low-confidence-zero-estimation-as-unknown
Enable retry for failed queries who can potentially be helped by HBO. This can also be specified
on a per-query basis using the ``retry-query-with-history-based-optimization`` session property.

``optimizer.use-histograms``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* **Type:** ``boolean``
* **Default Value:** ``false``

Enables the optimizer to use histograms when available to perform cost estimate calculations
during query optimization. When set to ``false``, this parameter does not prevent histograms
from being collected by ``ANALYZE``, but prevents them from being used during query
optimization. This behavior can be controlled on a per-query basis using the
``optimizer_use_histograms`` session property.

Planner Properties
------------------

Expand Down
29 changes: 15 additions & 14 deletions presto-docs/src/main/sphinx/optimizer/statistics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ Presto supports statistics based optimizations for queries. For a query to take
advantage of these optimizations, Presto must have statistical information for
the tables in that query.

Table statistics are provided to the query planner by connectors. Currently, the
only connector that supports statistics is the :doc:`/connector/hive`.
Table statistics are provided to the query planner by connectors. Implementing
support for table statistics is optional. The decision is left to the authors
of the connector.

Table Layouts
-------------
Expand All @@ -30,23 +31,23 @@ Available Statistics

The following statistics are available in Presto:

* For a table:
* For a table:

* **row count**: the total number of rows in the table layout
* **row count**: the total number of rows in the table layout

* For each column in a table:
* For each column in a table:

* **data size**: the size of the data that needs to be read
* **nulls fraction**: the fraction of null values
* **distinct value count**: the number of distinct values
* **low value**: the smallest value in the column
* **high value**: the largest value in the column
* **data size**: the size of the data that needs to be read
* **nulls fraction**: the fraction of null values
* **distinct value count**: the number of distinct values
* **low value**: the smallest value in the column
* **high value**: the largest value in the column
* **histogram**: A connector-dependent histogram data structure.

The set of statistics available for a particular query depends on the connector
being used and can also vary by table or even by table layout. For example, the
Hive connector does not currently provide statistics on data size.

Table statistics can be displayed via the Presto SQL interface using the
:doc:`/sql/show-stats` command. For the Hive connector, refer to the
:ref:`Hive connector <hive_analyze>` documentation to learn how to update table
statistics.
Table statistics can be can be fetched using the :doc:`/sql/show-stats` query.
For the Hive connector, refer to the :ref:`Hive connector <hive_analyze>`
documentation to learn how to update table statistics.
3 changes: 3 additions & 0 deletions presto-docs/src/main/sphinx/sql/show-stats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,6 @@ The following table lists the returned columns and what statistics they represen
- The highest value found in this column
- ``NULL`` in the table summary row. Available for columns of DATE, integer, floating-point, and fixed-precision
data types.
* - ``histogram``
- The histogram for this column
- A summary of the underlying histogram is displayed in a human-readable format. ``NULL`` in the table summary row.

0 comments on commit eae161e

Please sign in to comment.