Skip to content
Permalink

Comparing changes

This is a direct comparison between two commits made in this repository or its related repositories. View the default comparison for this range or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: crate/cratedb-guide
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: afc9a977bddfde74e9c77235b2df202ce919a7ff
Choose a base ref
..
head repository: crate/cratedb-guide
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: b297e5f35de56b08087c891aafacf8211855d1c1
Choose a head ref
Showing with 5,241 additions and 158 deletions.
  1. +1 −0 .gitignore
  2. +1 −0 backlog.md
  3. +28 −0 docs/_include/card/timeseries-datashader.md
  4. +27 −0 docs/_include/card/timeseries-explore.md
  5. +24 −0 docs/_include/links.md
  6. +49 −0 docs/_include/styles.html
  7. +1 −1 docs/admin/clustering/index.rst
  8. +9 −9 docs/admin/sharding-partitioning.rst
  9. +6 −0 docs/admin/troubleshooting/index.md
  10. +1 −0 docs/admin/upgrade/index.md
  11. +21 −0 docs/conf.py
  12. +97 −0 docs/domain/analytics/index.md
  13. +0 −23 docs/domain/document/index.md
  14. +127 −4 docs/domain/index.md
  15. +332 −15 docs/domain/industrial/index.md
  16. +41 −9 docs/domain/ml/index.md
  17. +0 −24 docs/domain/search/index.md
  18. +93 −0 docs/domain/telemetry/index.md
  19. +289 −0 docs/domain/timeseries/advanced.md
  20. +49 −0 docs/domain/timeseries/basics.md
  21. +1 −1 docs/domain/timeseries/generate/index.rst
  22. +91 −14 docs/domain/timeseries/index.md
  23. +140 −0 docs/domain/timeseries/longterm.md
  24. +2 −1 docs/domain/timeseries/timeseries-querying.md
  25. +178 −0 docs/domain/timeseries/video.md
  26. +67 −0 docs/feature/blob/index.md
  27. +128 −0 docs/feature/ccr/index.md
  28. +112 −0 docs/feature/cloud/index.md
  29. +191 −0 docs/feature/cluster/index.md
  30. +85 −0 docs/feature/connectivity/index.md
  31. +28 −0 docs/feature/cursor/index.md
  32. +432 −0 docs/feature/document/index.md
  33. 0 docs/{domain → feature}/document/objects-hands-on.md
  34. +85 −0 docs/feature/fdw/index.md
  35. +51 −0 docs/feature/generated/index.md
  36. +211 −0 docs/feature/geospatial/index.md
  37. +166 −0 docs/feature/index.md
  38. +168 −0 docs/feature/index/index.md
  39. +230 −0 docs/feature/query/index.md
  40. +218 −0 docs/feature/relational/index.md
  41. +273 −0 docs/feature/search/analyzer.md
  42. +310 −0 docs/feature/search/index.md
  43. +107 −0 docs/feature/search/options.md
  44. +1 −0 docs/{domain → feature}/search/search-hands-on.md
  45. +131 −0 docs/feature/snapshot/index.md
  46. +120 −0 docs/feature/sql/index.md
  47. +97 −0 docs/feature/storage/index.md
  48. +74 −0 docs/feature/udf/index.md
  49. +274 −0 docs/feature/vector/index.md
  50. +3 −19 docs/getting-started.md
  51. +22 −0 docs/index.md
  52. +20 −0 docs/integrate/bi/index.md
  53. +2 −0 docs/integrate/df.md
  54. +2 −0 docs/integrate/etl/index.md
  55. +1 −1 docs/integrate/etl/kafka-connect.rst
  56. +3 −8 docs/integrate/metrics/index.md
  57. +14 −23 docs/integrate/visualize/index.md
  58. +1 −1 docs/performance/index.rst
  59. +2 −2 docs/performance/selects.rst
  60. +4 −3 docs/performance/sharding.rst
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
__pycache__
_out/
.build
.clone
1 change: 1 addition & 0 deletions backlog.md
Original file line number Diff line number Diff line change
@@ -7,6 +7,7 @@
- Rework container/index
- Update HTTP links, use Sphinx references instead
- Update remaining links from crate.io to cratedb.com
- Gallery: https://python.arviz.org/en/stable/examples/

## Iteration +2
- Render Jupyter Notebooks?
28 changes: 28 additions & 0 deletions docs/_include/card/timeseries-datashader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
::::{info-card}

:::{grid-item}
:columns: auto 9 9 9
**Display millions of data points using hvPlot, Datashader, and CrateDB**

[HoloViews] and [Datashader] frameworks enable channeling millions of data
points from your backend systems to the browser's glass.

This notebook plots the venerable NYC Taxi dataset after importing it
into a CrateDB Cloud database cluster.

🚧 _Please note this notebook is a work in progress._ 🚧

{{ '{}[cloud-datashader-github]'.format(nb_github) }} {{ '{}[cloud-datashader-colab]'.format(nb_colab) }}
:::

:::{grid-item}
:columns: 3
{tags-primary}`Time series visualization`

{tags-secondary}`Python`
{tags-secondary}`HoloViews`
{tags-secondary}`hvPlot`
{tags-secondary}`Datashader`
:::

::::
27 changes: 27 additions & 0 deletions docs/_include/card/timeseries-explore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
::::{info-card}

:::{grid-item}
:columns: auto 9 9 9
**CrateDB for Time Series Modeling, Exploration, and Visualization**

Access time series data from CrateDB via SQL, load it into pandas DataFrames,
and visualize it using Plotly.

About advanced time series operations in SQL, like aggregations, window
functions, interpolation of missing data, common table expressions, moving
averages, relational JOINs, and the handling of JSON data.

{{ '{}[timeseries-queries-and-visualization-github]'.format(nb_github) }} {{ '{}[timeseries-queries-and-visualization-colab]'.format(nb_colab) }}
:::

:::{grid-item}
:columns: 3
{tags-primary}`Time series visualization`

{tags-secondary}`Python`
{tags-secondary}`pandas`
{tags-secondary}`Plotly`
{tags-secondary}`Dash`
:::

::::
24 changes: 24 additions & 0 deletions docs/_include/links.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
[cloud-datashader-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/amo/cloud-datashader/topic/timeseries/explore/cloud-datashader.ipynb
[cloud-datashader-github]: https://github.com/crate/cratedb-examples/blob/amo/cloud-datashader/topic/timeseries/explore/cloud-datashader.ipynb
[Datashader]: https://datashader.org/
[Dynamic Database Schemas]: https://cratedb.com/product/features/dynamic-schemas
[Geospatial Data Model]: https://cratedb.com/data-model/geospatial
[Geospatial Database]: https://cratedb.com/geospatial-spatial-database
[HoloViews]: https://www.holoviews.org/
[Indexing, Columnar Storage, and Aggregations]: https://cratedb.com/product/features/indexing-columnar-storage-aggregations
[JSON Database]: https://cratedb.com/solutions/json-database
[LangChain and CrateDB: Code Examples]: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llm-langchain
[langchain-similarity-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fvector_search.ipynb
[langchain-similarity-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb
[langchain-similarity-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb
[langchain-rag-sql-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fcratedb-vectorstore-rag-openai-sql.ipynb
[langchain-rag-sql-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/cratedb-vectorstore-rag-openai-sql.ipynb
[langchain-rag-sql-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/cratedb-vectorstore-rag-openai-sql.ipynb
[Multi-model Database]: https://cratedb.com/solutions/multi-model-database
[Nested Data Structure]: https://cratedb.com/product/features/nested-data-structure
[query DSL based on JSON]: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html
[Relational Database]: https://cratedb.com/solutions/relational-database
[timeseries-queries-and-visualization-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/timeseries/timeseries-queries-and-visualization.ipynb
[timeseries-queries-and-visualization-github]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/timeseries-queries-and-visualization.ipynb
[Vector Database (Product)]: https://cratedb.com/solutions/vector-database
[Vector Database]: https://en.wikipedia.org/wiki/Vector_database
49 changes: 49 additions & 0 deletions docs/_include/styles.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
<!--
Custom styles for efficient tile layouts using card elements.
TODO: Upstream to crate-docs-theme.
-->
<style>
/* General */
/*
.sd-card-body {
line-height: 1.1em;
}
*/
.sd-card-footer {
font-size: small;
}

/* No margins for images in tight layouts, e.g. badges */
/* Needed for domain/timeseries/advanced.md */
.wrapper-content-right img {
margin-bottom: 0 !important;
}


/* Document Store */
.wrapper-content-right ul {
margin-left: 0;
}
.rubric-slimmer p.rubric {
margin-bottom: 0.25em;
}
.rubric-slim p.rubric {
margin-bottom: 0;
}
.title-slim .sd-col > * {
margin-top: 0;
margin-bottom: 0;
}
.no-margin > * {
margin-top: 0 !important;
margin-bottom: 0 !important;;
}


/* Cards with Links */
.sd-hide-link-text {
height: 0;
}

</style>
2 changes: 1 addition & 1 deletion docs/admin/clustering/index.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _clustering:
.. _admin-clustering:

==========
Clustering
18 changes: 9 additions & 9 deletions docs/admin/sharding-partitioning.rst
Original file line number Diff line number Diff line change
@@ -64,7 +64,7 @@ partition as a set of shards. For each partition, the number of shards defined
by ``CLUSTERED INTO x SHARDS`` are created, when a first record with a specific
``partition key`` is inserted.

In the following example - which represents a very simple time-series use-case
In the following example - which represents a very simple time series use-case
- we added another column ``part`` that automatically generates the current
month upon insertion from the ``ts`` column. The ``part`` column is further used
as the ``partition key``.
@@ -111,7 +111,7 @@ cluster.
Over-sharding and over-partitioning are common flaws leading to an overall
poor performance.

**As a rule of thumb, a single shard should hold somewhere between 5 - 100
**As a rule of thumb, a single shard should hold somewhere between 5 - 50
GB of data.**

To avoid oversharding, CrateDB by default limits the number of shards per
@@ -129,15 +129,15 @@ benchmarks across various strategies. The following steps provide a general guid
- Calculate the throughput

Then, to calculate the number of shards, you should consider that the size of each
shard should roughly be between 5 - 100 GB, and that each node can only manage
shard should roughly be between 5 - 50 GB, and that each node can only manage
up to 1000 shards.

Time-series example
Time series example
-------------------

To illustrate the steps above, let's use them on behalf of an example. Imagine
you want to create a *partitioned table* on a *three-node cluster* to store
time-series data with the following assumptions:
time series data with the following assumptions:

- Inserts: 1.000 records/s
- Record size: 128 byte/record
@@ -146,12 +146,12 @@ time-series data with the following assumptions:
Given the daily throughput is around 10 GB/day, the monthly throughput is 30 times
that (~ 300 GB). The partition column can be day, week, month, quarter, etc. So,
assuming a monthly partition, the next step is to calculate the number of shards
with the **shard size recommendation** (5 - 100 GB) and the **number of nodes** in
with the **shard size recommendation** (5 - 50 GB) and the **number of nodes** in
the cluster in mind.

With three shards, each shard will hold 100 GB (300 GB / 3 shards), which is too
close to the upper limit. With six shards, each shard will manage 50 GB
(300 GB / 6 shards) of data, which is closer to the recommended size range (5 - 100 GB).
With three shards, each shard would hold 100 GB (300 GB / 3 shards), which is above
the upper limit. With six shards, each shard will manage 50 GB (300 GB / 6 shards)
of data, which is right on the spot.

.. code-block:: psql
6 changes: 6 additions & 0 deletions docs/admin/troubleshooting/index.md
Original file line number Diff line number Diff line change
@@ -72,6 +72,12 @@ infrastructure operations. For example:
- Clean up stale node data.
:::

:::{card} {material-outlined}`wysiwyg;1.6em` About `crash`
:link: https://cratedb.com/docs/crate/crash/en/latest/troubleshooting.html
Troubleshooting the CLI program `crash`.
:::


:::{note}
You can find a lot of troubleshooting guides that explain how to perform
diagnostics on Java applications.
1 change: 1 addition & 0 deletions docs/admin/upgrade/index.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
(upgrading)=
# Upgrading

Guidelines about upgrading CrateDB clusters.
21 changes: 21 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -36,9 +36,30 @@
]

# Configure intersphinx.
if "sphinx.ext.intersphinx" not in extensions:
extensions += ["sphinx.ext.intersphinx"]

if "intersphinx_mapping" not in globals():
intersphinx_mapping = {}

intersphinx_mapping.update({
'ctk': ('https://cratedb-toolkit.readthedocs.io/', None),
'matplotlib': ('https://matplotlib.org/stable/', None),
'pandas': ('https://pandas.pydata.org/pandas-docs/stable/', None),
'numpy': ('https://numpy.org/doc/stable/', None),
})


# Configure substitutions.
if "myst_substitutions" not in globals():
myst_substitutions = {}

myst_substitutions.update({
"nb_colab": "[![Notebook on Colab](https://img.shields.io/badge/Open-Notebook%20on%20Colab-blue?logo=Google%20Colab)]",
"nb_binder": "[![Notebook on Binder](https://img.shields.io/badge/Open-Notebook%20on%20Binder-lightblue?logo=binder)]",
"nb_github": "[![Notebook on GitHub](https://img.shields.io/badge/Open-Notebook%20on%20GitHub-darkgreen?logo=GitHub)]",
"readme_github": "[![README](https://img.shields.io/badge/Open-README-darkblue?logo=GitHub)]",
"blog": "[![Blog](https://img.shields.io/badge/Open-Blog-darkblue?logo=Markdown)]",
"tutorial": "[![Navigate to Tutorial](https://img.shields.io/badge/Navigate%20to-Tutorial-darkcyan?logo=Markdown)]",
"readmore": "[![Read More](https://img.shields.io/badge/Read-More-darkyellow?logo=Markdown)]",
})
97 changes: 97 additions & 0 deletions docs/domain/analytics/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
(analytics)=
# Raw-Data Analytics

**CrateDB provides real-time analytics on raw data stored for the long term**

In all domains of real-time analytics where you absolutely must have access to all
the records, and can't live with any down-sampled variants, because records are
unique, and need to be accounted for within your analytics queries.

If you find yourself in such a situation, you need a storage system which
manages all the high-volume data in its hot zone, to be available right on
your fingertips, for live querying. Batch jobs to roll up raw data into
analytical results are not an option, because users' queries are too
individual, so you need to run them on real data in real time.

With CrateDB, compatible to PostgreSQL, you can do all of that using plain SQL.
Other than integrating well with commodity systems using standard database
access interfaces like ODBC or JDBC, it provides a proprietary HTTP interface
on top.

:Tags:
{tags-primary}`Analytics`
{tags-primary}`Long Term Storage`

:Related:
[](#timeseries)
[](#timeseries-longterm)
[](#machine-learning)

:Product:
[Real-time Analytics Database]


(bitmovin)=
## Bitmovin Insights

Multi tenant data analytics on top of billions of records.

> CrateDB enables use cases we couldn't satisfy with other
database systems, also with databases which are even stronger
focused on the time series domain.
>
> CrateDB is not your normal database!
>
> <small>-- Daniel Hölbling-Inzko, Director of Engineering Analytics, Bitmovin</small>
:Industry:
{tags-secondary}`Broadcasting`
{tags-secondary}`Media Transcoding`
{tags-secondary}`Streaming Media`

:Tags:
{tags-primary}`Event Tracking`
{tags-primary}`Real-Time Analytics`
{tags-primary}`Multi Tenancy`
{tags-primary}`SaaS`

:Related:
[CrateDB provides the backbone of Bitmovin's real-time video analytics platform] \
[How Bitmovin uses CrateDB to monitor the biggest live video events]


::::{info-card}

:::{grid-item}
:columns: 8

{material-outlined}`analytics;2em` &nbsp; **Bitmovin: Real-Time Analytics**

Bitmovin, as a leader in video codec algorithms and as a web-based video
stream broadcasting provider, produces billions of rows of data and stores
them in CrateDB, allowing their customers to do analytics on it.

One of their product's subsystems, a video analytics component, required to
serve real-time analytics on very large and fast-moving data, so they needed
to find a performing database at the right cost.

- [Bitmovin: Improving the Streaming Experience with Real-Time Analytics]

The use-case of Bitmovin illustrates why traditional databases weren't capable
to deal with so many data records and keep them all available for querying in
real time.
:::

:::{grid-item}
:columns: 4

<iframe width="240" src="https://www.youtube-nocookie.com/embed/4BPApD0Piyc?si=J0w5yG56Ld4fIXfm" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
:::

::::


[Bitmovin: Improving the Streaming Experience with Real-Time Analytics]: https://youtu.be/4BPApD0Piyc?feature=shared
[CrateDB provides the backbone of Bitmovin's real-time video analytics platform]: https://cratedb.com/customers/bitmovin
[How Bitmovin uses CrateDB to monitor the biggest live video events]: https://youtu.be/IR6hokaYv5g?feature=shared
[Real-time Analytics Database]: https://cratedb.com/solutions/real-time-analytics-database
23 changes: 0 additions & 23 deletions docs/domain/document/index.md

This file was deleted.

Loading