Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSML Primer: Article series about "Machine Learning for Time Series Data" #54

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
2 changes: 2 additions & 0 deletions .lycheeignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# https://lychee.cli.rs/recipes/excluding-paths/
https://www.youtube-nocookie.com/
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions docs/admin/sharding-partitioning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ partition as a set of shards. For each partition, the number of shards defined
by ``CLUSTERED INTO x SHARDS`` are created, when a first record with a specific
``partition key`` is inserted.

In the following example - which represents a very simple time-series use-case
In the following example - which represents a very simple time series use-case
- we added another column ``part`` that automatically generates the current
month upon insertion from the ``ts`` column. The ``part`` column is further used
as the ``partition key``.
Expand Down Expand Up @@ -132,12 +132,12 @@ Then, to calculate the number of shards, you should consider that the size of ea
shard should roughly be between 5 - 100 GB, and that each node can only manage
up to 1000 shards.

Time-series example
Time series example
-------------------

To illustrate the steps above, let's use them on behalf of an example. Imagine
you want to create a *partitioned table* on a *three-node cluster* to store
time-series data with the following assumptions:
time series data with the following assumptions:

- Inserts: 1.000 records/s
- Record size: 128 byte/record
Expand Down
2 changes: 1 addition & 1 deletion docs/build.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"schemaVersion": 1,
"label": "docs build",
"message": "2.1.1"
"message": "2.1.2"
}
2 changes: 2 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
r"https://cratedb.com/wp-content/uploads/2018/11/copy_from_population_data.zip",
# Forbidden by Stack Overflow.
r"https://stackoverflow.com/.*",
# HTTPSConnectionPool(host='aka.ms', port=443): Read timed out.
r"https://aka.ms/vs/.*",
]

if "sphinx.ext.intersphinx" not in extensions:
Expand Down
3 changes: 3 additions & 0 deletions docs/domain/document/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,11 @@ Storing documents in CrateDB provides the same development convenience like the
document-oriented storage layer of Lotus Notes / Domino, CouchDB, MongoDB, and
PostgreSQL's `JSON(B)` types.

- [](inv:crate-reference#type-object)
- [](inv:cloud#object)
- [CrateDB Objects]
- [Unleashing the Power of Nested Data: Ingesting and Querying JSON Documents with SQL]


[CrateDB Objects]: https://youtu.be/aQi9MXs2irU?feature=shared
[Unleashing the Power of Nested Data: Ingesting and Querying JSON Documents with SQL]: https://youtu.be/S_RHmdz2IQM?feature=shared
137 changes: 124 additions & 13 deletions docs/domain/industrial/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# Industrial Data

Learn how to use CrateDB in industrial / IIoT / Industry 4.0 scenarios within
engineering, manufacturing, and other operational domains.
engineering, manufacturing, production, and other operational domains.

In the realm of Industrial IoT, dealing with diverse data, ranging from
slow-moving structured data, to high-frequency measurements, presents unique
Expand All @@ -15,24 +15,110 @@ The complexities of industrial big data are characterized by its high variety,
unstructured features, different data sampling rates, and how these attributes
influence data storage, retention, and integration.

Today's warehouses are complex systems with a very high degree of automation.
The key to the successful operation of these warehouses lies in having a
holistic view on the entire system based on data from various components like
sensors, PLCs, embedded controllers and software systems.

(rauch)=
## Rauch Insights

::::{info-card}

:::{grid-item}
:columns: 8

{material-outlined}`data_exploration;2em`   **Rauch: High-Speed Production Lines**

_Scaling a high-speed production environment with CrateDB._

Rauch is filling 33 cans per second and how that adds up to 400 data records
per second which are being processed, stored, and analyzed. In total, they are
within the range of one to ten billion records persisted in CrateDB.

- [Rauch: High-Speed Production Lines]

The use-case of Rauch demonstrates why traditional databases weren't capable to
deal with so many data records and unstructured data. The benefits of CrateDB
made Rauch choose it over other databases, such as PostgreSQL compatibility,
the support for unstructured data, and its excellent customer support.

:Industry: {tags-secondary}`Food` {tags-secondary}`Packaging` {tags-secondary}`Production`
:Tags: {tags-primary}`SCADA` {tags-primary}`MDE` {tags-primary}`Data Historian` {tags-primary}`Industrial IoT` {tags-primary}`PLC`
:::

:::{grid-item}  
:columns: 4

<iframe width="240" src="https://www.youtube-nocookie.com/embed/gJPmJ0uXeVs?si=J0w5yG56Ld4fIXfm" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

**Date:** 28 Jun 2022 \
**Speaker:** Arno Breuss
:::

::::


(tgw)=
## TGW Insights


::::{info-card}

:::{grid-item}
:columns: 8

{material-outlined}`inventory;2em` &nbsp; **TGW: Data acquisition in high-speed logistics**

_Storing, querying, and analyzing industrial IoT data and metadata without
much hassle._

Today's warehouses are complex systems with a very high degree of automation.

TGW Logistics Group implements key factors to the successful operation of these
warehouses, by having a holistic view on the entire system acquiring data from
various components like sensors, PLCs, embedded controllers, and software
systems.

- [TGW: Fixing data silos in a high-speed logistics environment]

TGW states that all these components can be seen as "data silos",
distributed across the entire site, each of them storing just some pieces of
information in various data structures and different ways to access it.

After trying multiple database systems, TGW Logistics moved to CrateDB for
its ability to aggregate different data formats and ability to query this
information without much hassle.

its ability to aggregate different data formats and the ability to query this
information without further ado.

:Industry: {tags-secondary}`Logistics` {tags-secondary}`Shipping`
:Tags: {tags-primary}`SCADA` {tags-primary}`MDE` {tags-primary}`Data Historian` {tags-primary}`Industrial IoT` {tags-primary}`PLC`
:::

:::{grid-item} &nbsp;
:columns: 4

<iframe width="240" src="https://www.youtube-nocookie.com/embed/6dgjVQJtSKI?si=J0w5yG56Ld4fIXfm" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

**Date:** 22 Jun 2022 \
**Speakers:** Alexander Mann, Jan Weber
:::

::::



::::{info-card}

:::{grid-item}
:columns: 8

{material-outlined}`dashboard;2em` &nbsp; **TGW: Challenges in storing and analyzing industrial data**

_Not All Time-Series Are Equal: Challenges in Storing and Analyzing Industrial Data._

In the second presentation, you will learn how TGW leverages CrateDB to build
digital twins of physical warehouses around the world.
digital twins of physical warehouses around the world, by using its unique set
of features suitable for storing and querying complex industrial big data with
high variety, unstructured features, and at different data frequencies.

- [Fixing data silos in a high-speed logistics environment]
- [Challenges of Storing and Analyzing Industrial Data]
- [CrateDB: Challenges in industrial data]
- [TGW: Storing and analyzing real-world industrial data]

**What's inside**

Expand All @@ -47,6 +133,31 @@ digital twins of physical warehouses around the world.
- Real-World Applications: Exploration of actual customer use cases to
illustrate how CrateDB can be applied in various industrial scenarios.

:Industry: {tags-secondary}`Logistics` {tags-secondary}`Shipping`
:Tags: {tags-primary}`Data Historian` {tags-primary}`Industrial IoT` {tags-primary}`Digital Twin`
:::

:::{grid-item} &nbsp;
:columns: 4

<iframe width="240" class="speakerdeck-iframe" style="border: 0px; background: rgba(0, 0, 0, 0.1) padding-box; margin: 0px; padding: 0px; border-radius: 6px; box-shadow: rgba(0, 0, 0, 0.2) 0px 5px 40px; width: 100%; height: auto; aspect-ratio: 560 / 315;" frameborder="0" src="https://speakerdeck.com/player/acb78531a07e4238ac662539b0c23609" title=" Not all time-series are equal ​ Challenges of storing and analyzing industrial data" allowfullscreen="true" data-ratio="1.7777777777777777"></iframe>

**Date:** 23 Nov 2022 \
**Speaker:** Marija Selakovic


<iframe width="240" src="https://www.youtube-nocookie.com/embed/ugQvihToY0k?si=J0w5yG56Ld4fIXfm" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

**Date:** 5 Oct 2023 \
**Speakers:** Alexander Mann, Georg Traar
:::

::::




[Challenges of Storing and Analyzing Industrial Data]: https://youtu.be/ugQvihToY0k?feature=shared
[Fixing data silos in a high-speed logistics environment]: https://youtu.be/6dgjVQJtSKI?feature=shared
[CrateDB: Challenges in industrial data]: https://speakerdeck.com/cratedb/not-all-time-series-are-equal-challenges-of-storing-and-analyzing-industrial-data
[Rauch: High-Speed Production Lines]: https://youtu.be/gJPmJ0uXeVs?feature=shared
[TGW: Fixing data silos in a high-speed logistics environment]: https://youtu.be/6dgjVQJtSKI?feature=shared
[TGW: Storing and analyzing real-world industrial data]: https://youtu.be/ugQvihToY0k?feature=shared
Loading