diff --git a/.lycheeignore b/.lycheeignore new file mode 100644 index 00000000..d79351b6 --- /dev/null +++ b/.lycheeignore @@ -0,0 +1,2 @@ +# https://lychee.cli.rs/recipes/excluding-paths/ +https://www.youtube-nocookie.com/ diff --git a/docs/_assets/img/ml-timeseries-primer/cratedb-admin-ui-data-imported.png b/docs/_assets/img/ml-timeseries-primer/cratedb-admin-ui-data-imported.png new file mode 100644 index 00000000..4924feb6 Binary files /dev/null and b/docs/_assets/img/ml-timeseries-primer/cratedb-admin-ui-data-imported.png differ diff --git a/docs/_assets/img/ml-timeseries-primer/cratedb-cloud-import-ready.png b/docs/_assets/img/ml-timeseries-primer/cratedb-cloud-import-ready.png new file mode 100644 index 00000000..aedcf006 Binary files /dev/null and b/docs/_assets/img/ml-timeseries-primer/cratedb-cloud-import-ready.png differ diff --git a/docs/_assets/img/ml-timeseries-primer/cratedb-cloud-import-url.png b/docs/_assets/img/ml-timeseries-primer/cratedb-cloud-import-url.png new file mode 100644 index 00000000..42635f27 Binary files /dev/null and b/docs/_assets/img/ml-timeseries-primer/cratedb-cloud-import-url.png differ diff --git a/docs/_assets/img/ml-timeseries-primer/cratedb-missing-values.png b/docs/_assets/img/ml-timeseries-primer/cratedb-missing-values.png new file mode 100644 index 00000000..b4008c6b Binary files /dev/null and b/docs/_assets/img/ml-timeseries-primer/cratedb-missing-values.png differ diff --git a/docs/_assets/img/ml-timeseries-primer/cratedb-mlops.png b/docs/_assets/img/ml-timeseries-primer/cratedb-mlops.png new file mode 100644 index 00000000..494eebf6 Binary files /dev/null and b/docs/_assets/img/ml-timeseries-primer/cratedb-mlops.png differ diff --git a/docs/_assets/img/ml-timeseries-primer/cratedb-model-configuration.png b/docs/_assets/img/ml-timeseries-primer/cratedb-model-configuration.png new file mode 100644 index 00000000..0d556e00 Binary files /dev/null and b/docs/_assets/img/ml-timeseries-primer/cratedb-model-configuration.png differ diff --git a/docs/_assets/img/ml-timeseries-primer/cratedb-model-monitoring.png b/docs/_assets/img/ml-timeseries-primer/cratedb-model-monitoring.png new file mode 100644 index 00000000..ea2bcad2 Binary files /dev/null and b/docs/_assets/img/ml-timeseries-primer/cratedb-model-monitoring.png differ diff --git a/docs/_assets/img/ml-timeseries-primer/cratedb-schema-object.png b/docs/_assets/img/ml-timeseries-primer/cratedb-schema-object.png new file mode 100644 index 00000000..8e807e1f Binary files /dev/null and b/docs/_assets/img/ml-timeseries-primer/cratedb-schema-object.png differ diff --git a/docs/_assets/img/ml-timeseries-primer/cratedb-sensor-record.png b/docs/_assets/img/ml-timeseries-primer/cratedb-sensor-record.png new file mode 100644 index 00000000..396e3306 Binary files /dev/null and b/docs/_assets/img/ml-timeseries-primer/cratedb-sensor-record.png differ diff --git a/docs/_assets/img/ml-timeseries-primer/mlflow-experiment.png b/docs/_assets/img/ml-timeseries-primer/mlflow-experiment.png new file mode 100644 index 00000000..b485d30a Binary files /dev/null and b/docs/_assets/img/ml-timeseries-primer/mlflow-experiment.png differ diff --git a/docs/_assets/img/ml-timeseries-primer/mlflow-model.png b/docs/_assets/img/ml-timeseries-primer/mlflow-model.png new file mode 100644 index 00000000..da0ecc84 Binary files /dev/null and b/docs/_assets/img/ml-timeseries-primer/mlflow-model.png differ diff --git a/docs/_assets/img/ml-timeseries-primer/mlflow-tracks.png b/docs/_assets/img/ml-timeseries-primer/mlflow-tracks.png new file mode 100644 index 00000000..7d482a51 Binary files /dev/null and b/docs/_assets/img/ml-timeseries-primer/mlflow-tracks.png differ diff --git a/docs/_assets/img/ml-timeseries-primer/temperature-anomaly-detected.png b/docs/_assets/img/ml-timeseries-primer/temperature-anomaly-detected.png new file mode 100644 index 00000000..3bbb76b0 Binary files /dev/null and b/docs/_assets/img/ml-timeseries-primer/temperature-anomaly-detected.png differ diff --git a/docs/_assets/img/ml-timeseries-primer/temperature-anomaly-score.png b/docs/_assets/img/ml-timeseries-primer/temperature-anomaly-score.png new file mode 100644 index 00000000..511d99fd Binary files /dev/null and b/docs/_assets/img/ml-timeseries-primer/temperature-anomaly-score.png differ diff --git a/docs/_assets/img/ml-timeseries-primer/temperature-train-test.png b/docs/_assets/img/ml-timeseries-primer/temperature-train-test.png new file mode 100644 index 00000000..082ac5b4 Binary files /dev/null and b/docs/_assets/img/ml-timeseries-primer/temperature-train-test.png differ diff --git a/docs/admin/sharding-partitioning.rst b/docs/admin/sharding-partitioning.rst index f525650e..57d0855d 100644 --- a/docs/admin/sharding-partitioning.rst +++ b/docs/admin/sharding-partitioning.rst @@ -64,7 +64,7 @@ partition as a set of shards. For each partition, the number of shards defined by ``CLUSTERED INTO x SHARDS`` are created, when a first record with a specific ``partition key`` is inserted. -In the following example - which represents a very simple time-series use-case +In the following example - which represents a very simple time series use-case - we added another column ``part`` that automatically generates the current month upon insertion from the ``ts`` column. The ``part`` column is further used as the ``partition key``. @@ -132,12 +132,12 @@ Then, to calculate the number of shards, you should consider that the size of ea shard should roughly be between 5 - 100 GB, and that each node can only manage up to 1000 shards. -Time-series example +Time series example ------------------- To illustrate the steps above, let's use them on behalf of an example. Imagine you want to create a *partitioned table* on a *three-node cluster* to store -time-series data with the following assumptions: +time series data with the following assumptions: - Inserts: 1.000 records/s - Record size: 128 byte/record diff --git a/docs/build.json b/docs/build.json index 5647caf4..5de7837b 100644 --- a/docs/build.json +++ b/docs/build.json @@ -1,5 +1,5 @@ { "schemaVersion": 1, "label": "docs build", - "message": "2.1.1" + "message": "2.1.2" } diff --git a/docs/conf.py b/docs/conf.py index b036c7d5..c7c6b20e 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -15,6 +15,8 @@ r"https://cratedb.com/wp-content/uploads/2018/11/copy_from_population_data.zip", # Forbidden by Stack Overflow. r"https://stackoverflow.com/.*", + # HTTPSConnectionPool(host='aka.ms', port=443): Read timed out. + r"https://aka.ms/vs/.*", ] if "sphinx.ext.intersphinx" not in extensions: diff --git a/docs/domain/document/index.md b/docs/domain/document/index.md index a56a1861..51b0dab5 100644 --- a/docs/domain/document/index.md +++ b/docs/domain/document/index.md @@ -9,8 +9,11 @@ Storing documents in CrateDB provides the same development convenience like the document-oriented storage layer of Lotus Notes / Domino, CouchDB, MongoDB, and PostgreSQL's `JSON(B)` types. +- [](inv:crate-reference#type-object) - [](inv:cloud#object) +- [CrateDB Objects] - [Unleashing the Power of Nested Data: Ingesting and Querying JSON Documents with SQL] +[CrateDB Objects]: https://youtu.be/aQi9MXs2irU?feature=shared [Unleashing the Power of Nested Data: Ingesting and Querying JSON Documents with SQL]: https://youtu.be/S_RHmdz2IQM?feature=shared diff --git a/docs/domain/industrial/index.md b/docs/domain/industrial/index.md index 49a5ea26..67daca7a 100644 --- a/docs/domain/industrial/index.md +++ b/docs/domain/industrial/index.md @@ -5,7 +5,7 @@ # Industrial Data Learn how to use CrateDB in industrial / IIoT / Industry 4.0 scenarios within -engineering, manufacturing, and other operational domains. +engineering, manufacturing, production, and other operational domains. In the realm of Industrial IoT, dealing with diverse data, ranging from slow-moving structured data, to high-frequency measurements, presents unique @@ -15,24 +15,110 @@ The complexities of industrial big data are characterized by its high variety, unstructured features, different data sampling rates, and how these attributes influence data storage, retention, and integration. -Today's warehouses are complex systems with a very high degree of automation. -The key to the successful operation of these warehouses lies in having a -holistic view on the entire system based on data from various components like -sensors, PLCs, embedded controllers and software systems. +(rauch)= +## Rauch Insights + +::::{info-card} + +:::{grid-item} +:columns: 8 + +{material-outlined}`data_exploration;2em`   **Rauch: High-Speed Production Lines** + +_Scaling a high-speed production environment with CrateDB._ + +Rauch is filling 33 cans per second and how that adds up to 400 data records +per second which are being processed, stored, and analyzed. In total, they are +within the range of one to ten billion records persisted in CrateDB. + +- [Rauch: High-Speed Production Lines] + +The use-case of Rauch demonstrates why traditional databases weren't capable to +deal with so many data records and unstructured data. The benefits of CrateDB +made Rauch choose it over other databases, such as PostgreSQL compatibility, +the support for unstructured data, and its excellent customer support. + +:Industry: {tags-secondary}`Food` {tags-secondary}`Packaging` {tags-secondary}`Production` +:Tags: {tags-primary}`SCADA` {tags-primary}`MDE` {tags-primary}`Data Historian` {tags-primary}`Industrial IoT` {tags-primary}`PLC` +::: + +:::{grid-item}   +:columns: 4 + + + +**Date:** 28 Jun 2022 \ +**Speaker:** Arno Breuss +::: +:::: + +(tgw)= ## TGW Insights + +::::{info-card} + +:::{grid-item} +:columns: 8 + +{material-outlined}`inventory;2em`   **TGW: Data acquisition in high-speed logistics** + +_Storing, querying, and analyzing industrial IoT data and metadata without +much hassle._ + +Today's warehouses are complex systems with a very high degree of automation. + +TGW Logistics Group implements key factors to the successful operation of these +warehouses, by having a holistic view on the entire system acquiring data from +various components like sensors, PLCs, embedded controllers, and software +systems. + +- [TGW: Fixing data silos in a high-speed logistics environment] + +TGW states that all these components can be seen as "data silos", +distributed across the entire site, each of them storing just some pieces of +information in various data structures and different ways to access it. + After trying multiple database systems, TGW Logistics moved to CrateDB for -its ability to aggregate different data formats and ability to query this -information without much hassle. - +its ability to aggregate different data formats and the ability to query this +information without further ado. + +:Industry: {tags-secondary}`Logistics` {tags-secondary}`Shipping` +:Tags: {tags-primary}`SCADA` {tags-primary}`MDE` {tags-primary}`Data Historian` {tags-primary}`Industrial IoT` {tags-primary}`PLC` +::: + +:::{grid-item}   +:columns: 4 + + + +**Date:** 22 Jun 2022 \ +**Speakers:** Alexander Mann, Jan Weber +::: + +:::: + + + +::::{info-card} + +:::{grid-item} +:columns: 8 + +{material-outlined}`dashboard;2em`   **TGW: Challenges in storing and analyzing industrial data** + +_Not All Time-Series Are Equal: Challenges in Storing and Analyzing Industrial Data._ + In the second presentation, you will learn how TGW leverages CrateDB to build -digital twins of physical warehouses around the world. +digital twins of physical warehouses around the world, by using its unique set +of features suitable for storing and querying complex industrial big data with +high variety, unstructured features, and at different data frequencies. -- [Fixing data silos in a high-speed logistics environment] -- [Challenges of Storing and Analyzing Industrial Data] +- [CrateDB: Challenges in industrial data] +- [TGW: Storing and analyzing real-world industrial data] **What's inside** @@ -47,6 +133,31 @@ digital twins of physical warehouses around the world. - Real-World Applications: Exploration of actual customer use cases to illustrate how CrateDB can be applied in various industrial scenarios. +:Industry: {tags-secondary}`Logistics` {tags-secondary}`Shipping` +:Tags: {tags-primary}`Data Historian` {tags-primary}`Industrial IoT` {tags-primary}`Digital Twin` +::: + +:::{grid-item}   +:columns: 4 + + + +**Date:** 23 Nov 2022 \ +**Speaker:** Marija Selakovic + + + + +**Date:** 5 Oct 2023 \ +**Speakers:** Alexander Mann, Georg Traar +::: + +:::: + + + -[Challenges of Storing and Analyzing Industrial Data]: https://youtu.be/ugQvihToY0k?feature=shared -[Fixing data silos in a high-speed logistics environment]: https://youtu.be/6dgjVQJtSKI?feature=shared +[CrateDB: Challenges in industrial data]: https://speakerdeck.com/cratedb/not-all-time-series-are-equal-challenges-of-storing-and-analyzing-industrial-data +[Rauch: High-Speed Production Lines]: https://youtu.be/gJPmJ0uXeVs?feature=shared +[TGW: Fixing data silos in a high-speed logistics environment]: https://youtu.be/6dgjVQJtSKI?feature=shared +[TGW: Storing and analyzing real-world industrial data]: https://youtu.be/ugQvihToY0k?feature=shared diff --git a/docs/domain/timeseries/advanced.md b/docs/domain/timeseries/advanced.md new file mode 100644 index 00000000..a8104592 --- /dev/null +++ b/docs/domain/timeseries/advanced.md @@ -0,0 +1,275 @@ +(timeseries-advanced)= +(timeseries-analysis)= + +# Advanced Time Series Analysis + +Learn how to conduct advanced data analysis on large time series datasets +with CrateDB. + +{tags-primary}`Exploratory data analysis` +{tags-primary}`Time series decomposition` +{tags-primary}`Anomaly detection` +{tags-primary}`Forecasting / Prediction` +{tags-primary}`Metadata integration` + + + + + + +(timeseries-anomaly-forecasting)= +## Anomaly Detection and Forecasting + +To gain insights from your data in a one-shot or recurring way, based on +machine learning techniques, you may want to look into applying [anomaly] +detection and/or [forecasting] methods. + +**Examples** + + +::::{info-card} + +:::{grid-item} **Use MLflow for time series anomaly detection and timeseries forecasting** +:columns: 9 + +Guidelines and runnable code to get started with [MLflow] and CrateDB, exercising +time series anomaly detection and timeseries forecasting / prediction using +NumPy, Merlion, and Matplotlib. +::: + +:::{grid-item} +:columns: 3 + +[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/mlops-mlflow/tracking_merlion.ipynb) +[![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/mlops-mlflow/tracking_merlion.ipynb) + +{tags-primary}`Anomaly Detection` +{tags-primary}`Forecasting / Prediction` + +{tags-secondary}`Python` +{tags-secondary}`MLflow` +::: + +:::: + + +::::{info-card} + +:::{grid-item} **Use PyCaret to train time series forecasting models** +:columns: 9 + +This notebook explores the [PyCaret] framework and shows how to use it +to train various timeseries forecasting models. +::: + +:::{grid-item} +:columns: 3 + +[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/automl/automl_timeseries_forecasting_with_pycaret.ipynb) +[![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/automl/automl_timeseries_forecasting_with_pycaret.ipynb) + +{tags-primary}`Forecasting / Prediction` + +{tags-secondary}`Python` +{tags-secondary}`PyCaret` +::: + +:::: + + +:::{tip} +The primer about [](#tsml-primer) will introduce you to the concept of time +series modeling, experiment tracking, and corresponding ML Ops paradigms, +which you can use to apply machine learning procedures to your time series +data. +::: + + +(timeseries-decomposition)= +## Decomposition + +[Decomposition of time series] is a statistical task that deconstructs a [time +series] into several components, each representing one of the underlying +categories of patterns. + +There are two principal types of decomposition, one based on rates of change, +the other based on predictability. + +You can use this method to dissect a time series into multiple components, +typically including trend, seasonal, and random (or irregular) components. + +This process helps in understanding the underlying patterns of the time series +data, such as identifying any long-term direction (trend), recurring patterns +at fixed intervals (seasonality), and randomness (irregular fluctuations) in +the data. + +Decomposition is crucial for analyzing how these components change over time, +improving forecasts, and developing strategies for addressing each element +effectively. + +**Examples** + +::::{info-card} + +:::{grid-item} **Analyze trend, seasonality, and fluctuations with PyCaret and CrateDB** +:columns: 9 + +Learn how to extract data from CrateDB for analysis in PyCaret, how to +further preprocess it and how to use PyCaret to plot time series +decomposition by breaking it down into its basic components: trend, +seasonality, and residual (or irregular) fluctuations. +::: + +:::{grid-item} +:columns: 3 + +[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/time-series-decomposition.ipynb) +[![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/timeseries/time-series-decomposition.ipynb) + +{tags-primary}`Time series decomposition` + +{tags-secondary}`Python` +{tags-secondary}`PyCaret` +::: + +:::: + + +(timeseries-eda)= +## EDA + +[Exploratory data analysis (EDA)] is an approach of analyzing data sets to +summarize their main characteristics, often using statistical graphics and +other data visualization methods. + +EDA involves visualizing, summarizing, and analyzing data, to uncover +patterns, anomalies, or relationships within the dataset. + +The objective of this step is to gain an understanding and intuition of the +data, identify potential issues, and, in machine learning, guide feature +engineering and model building. + +**Examples** + +::::{info-card} + +:::{grid-item} **Exploratory data analysis (EDA) with PyCaret and CrateDB** +:columns: 9 + +Learn how to access time series data from CrateDB using SQL, and how to apply +exploratory data analysis (EDA) with PyCaret. + +The notebook shows how to generate various plots and charts for EDA, helping +you to understand data distributions, relationships between variables, and to +identify patterns. +::: + +:::{grid-item} +:columns: 3 + +[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/exploratory_data_analysis.ipynb) +[![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/timeseries/exploratory_data_analysis.ipynb) + +{tags-primary}`EDA on time series` + +{tags-secondary}`Python` +{tags-secondary}`PyCaret` +::: + +:::: + + +(timeseries-analysis-metadata)= +## Metadata Integration + +CrateDB is particularly effective when you need to combine time-series data +with metadata, for instance, in scenarios where data like sensor readings +or log entries, need to be augmented with additional context for more +insightful analysis. See also [](#document). + +CrateDB supports effective time-series analysis with fast aggregations, a +rich set of built-in functions, and [JOIN](inv:crate-reference#sql_joins) +operations. + +**Examples** + +::::{info-card} + +:::{grid-item} **Analyzing Device Readings with Metadata Integration** +:columns: 9 + +This tutorial illustrates how to augment time-series data with metadata, in +order to enable more comprehensive analysis. It uses a time-series dataset that +captures various device readings, such as battery, CPU, and memory information. +::: + +:::{grid-item} +:columns: 3 + +[![Navigate to Tutorial](https://img.shields.io/badge/Navigate%20to-Tutorial-lightgray?logo=Markdown)](inv:cloud#time-series-advanced) + +{tags-primary}`Rich time series` +{tags-primary}`Metadata` + +{tags-secondary}`SQL` +::: + +:::: + + +(timeseries-visualization)= +## Visualization + +Similar to EDA, just applying [data and information visualization] can yield +significant insights into the characteristics of your data. By using +best-of-breed data visualization tools, initial data exploration is +mostly your first encounter with the data. + +**Examples** + +::::{info-card} + +:::{grid-item} **Display millions of data points using hvPlot, Datashader, and CrateDB** +:columns: 9 + +[HoloViews] and [Datashader] frameworks enable channeling millions of data +points from your backend systems to the browser's glass. + +This notebook plots the venerable NYC Taxi dataset after importing it +into a CrateDB Cloud database cluster. + +🚧 _Please note this notebook is a work in progress._ 🚧 +::: + +:::{grid-item} +:columns: 3 + +[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/amo/cloud-datashader/topic/timeseries/explore/cloud-datashader.ipynb) +[![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/amo/cloud-datashader/topic/timeseries/explore/cloud-datashader.ipynb) + +{tags-primary}`Time series visualization` + +{tags-secondary}`Python` +{tags-secondary}`HoloViews` +{tags-secondary}`hvPlot` +{tags-secondary}`Datashader` +::: + +:::: + + + +[anomaly]: https://en.wikipedia.org/wiki/Anomaly_(natural_sciences) +[Data and information visualization]: https://en.wikipedia.org/wiki/Data_and_information_visualization +[Datashader]: https://datashader.org/ +[Decomposition of time series]: https://en.wikipedia.org/wiki/Decomposition_of_time_series +[Exploratory data analysis (EDA)]: https://en.wikipedia.org/wiki/Exploratory_data_analysis +[forecasting]: https://en.wikipedia.org/wiki/Forecasting +[HoloViews]: https://www.holoviews.org/ +[MLflow]: https://mlflow.org/ +[PyCaret]: https://www.pycaret.org +[Time series]: https://en.wikipedia.org/wiki/Time_series diff --git a/docs/domain/timeseries/basics.md b/docs/domain/timeseries/basics.md new file mode 100644 index 00000000..dad33154 --- /dev/null +++ b/docs/domain/timeseries/basics.md @@ -0,0 +1,50 @@ +(timeseries-basics)= +# Time Series Basics with CrateDB + +## Getting Started + +- [](#timeseries-generate) +- [](#timeseries-normalize) +- [Financial data collection and processing using pandas] +- [](inv:cloud#time-series) +- [Load and visualize time series data using CrateDB, SQL, pandas, and Plotly](#plotly) +- [How to Build Time Series Applications with CrateDB] + +## Downsampling and Interpolation + +- [](#downsampling-timestamp-binning) +- [](#downsampling-lttb) +- [](#ni-interpolate) +- [Interpolating missing time series values] +- [](inv:crate-reference#aggregation-percentile) + +## Machine Learning + +- [](#timeseries-ml-primer) +- [Machine Learning Integrations](#integrate-machine-learning) + +## Operations +- [](#sharding-partitioning) +- [CrateDB partitioned table vs. TimescaleDB Hypertable] + + +:::{tip} +For more in-depth information, please visit the documentation pages about +[](#timeseries-connect) and [](#timeseries-advanced). Alternatively, you +may prefer the [](#timeseries-video). +::: + + +:::{toctree} +:hidden: + +generate/index +normalize-intervals +::: + + + +[CrateDB partitioned table vs. TimescaleDB Hypertable]: https://community.cratedb.com/t/cratedb-partitioned-table-vs-timescaledb-hypertable/1713 +[Financial data collection and processing using pandas]: https://community.cratedb.com/t/automating-financial-data-collection-and-storage-in-cratedb-with-python-and-pandas-2-0-0/916 +[How to Build Time Series Applications with CrateDB]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/dask-weather-data-import.ipynb +[Interpolating missing time series values]: https://community.cratedb.com/t/interpolating-missing-time-series-values/1010 diff --git a/docs/domain/timeseries/connect.md b/docs/domain/timeseries/connect.md new file mode 100644 index 00000000..8abc67c1 --- /dev/null +++ b/docs/domain/timeseries/connect.md @@ -0,0 +1,64 @@ +(timeseries-connect)= +(timeseries-io)= +(timeseries-import-export)= + +# Database / Time Series Connectivity + +CrateDB connectivity options for working with time series data. + +{tags-primary}`Connect` +{tags-primary}`Import` +{tags-primary}`Export` +{tags-primary}`Extract` +{tags-primary}`Load` +{tags-primary}`ETL` + + +## Interfaces and Protocols + +CrateDB supports both the [HTTP protocol] and the [PostgreSQL wire protocol], +which ensures that many clients that work with PostgreSQL, will also work with +CrateDB. Through corresponding drivers, CrateDB is compatible with [ODBC], +[JDBC], and other database API specifications. + +By supporting [SQL], CrateDB is compatible with many standard database +environments out of the box. + +- [CrateDB HTTP interface] +- [CrateDB PostgreSQL interface] +- [CrateDB SQL protocol] + +## Drivers and Integrations + +CrateDB provides plenty of connectivity options with database drivers, +applications, and frameworks, in order to get time series data in and +out of CrateDB, and to connect to other applications. + +- [](inv:crate-clients-tools#connect) +- [](inv:crate-clients-tools#df) +- [](inv:crate-clients-tools#etl) +- [](inv:crate-clients-tools#metrics) + +## Tutorials + +Hands-on tutorials about CrateDB fundamentals about data I/O, as well as about +properly configuring and connecting relevant 3rd-party software components to +work optimally with CrateDB. + +- [Fundamentals of the COPY FROM statement] +- [](#etl) +- [](#metrics) +- [](#performance) +- [Import weather data using Dask] + + +[CrateDB HTTP interface]: inv:crate-reference:*:label#interface-http +[CrateDB PostgreSQL interface]: inv:crate-reference:*:label#interface-postgresql +[CrateDB SQL protocol]: inv:crate-reference:*:label#sql +[Fundamentals of the COPY FROM statement]: https://community.cratedb.com/t/fundamentals-of-the-copy-from-statement/1178 +[HTTP protocol]: https://en.wikipedia.org/wiki/HTTP +[Import weather data using Dask]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/dask-weather-data-import.ipynb +[JDBC]: https://en.wikipedia.org/wiki/Java_Database_Connectivity +[ODBC]: https://en.wikipedia.org/wiki/Open_Database_Connectivity +[PostgreSQL wire protocol]: https://www.postgresql.org/docs/current/protocol.html +[SQL]: https://en.wikipedia.org/wiki/Sql diff --git a/docs/domain/timeseries/generate/index.rst b/docs/domain/timeseries/generate/index.rst index 2dda29ef..1064abe0 100644 --- a/docs/domain/timeseries/generate/index.rst +++ b/docs/domain/timeseries/generate/index.rst @@ -1,4 +1,4 @@ -.. _timeseries-basics: +.. _timeseries-generate: .. _gen-ts: ========================= diff --git a/docs/domain/timeseries/index.md b/docs/domain/timeseries/index.md index f464ecd8..114e249f 100644 --- a/docs/domain/timeseries/index.md +++ b/docs/domain/timeseries/index.md @@ -3,19 +3,96 @@ Learn how to optimally use CrateDB for time series use-cases. -- [](#timeseries-basics) -- [](#timeseries-normalize) -- [Financial data collection and processing using pandas] -- [](inv:cloud#time-series) -- [](inv:cloud#time-series-advanced) -- [Time-series data: From raw data to fast analysis in only three steps] +CrateDB is a distributed and scalable SQL database for storing and analyzing +massive amounts of data in near real-time, even with complex queries. It is +PostgreSQL-compatible, and based on Lucene. + + + + + +::::{grid} 1 2 2 2 +:margin: 4 4 0 0 +:padding: 0 +:gutter: 2 + + +:::{grid-item-card} {material-outlined}`show_chart;2em` Basics +:link: timeseries-basics +:link-type: ref +:link-alt: Time series basics with CrateDB + +Basic introductory tutorials about using CrateDB with time series data. + + +What's inside: +Getting Started, Downsampling and Interpolation, +Operations: Sharding and Partitioning. + +::: + + +:::{grid-item-card} {material-outlined}`analytics;2em` Advanced +:link: timeseries-analysis +:link-type: ref +:link-alt: About time series analysis + +Advanced time series data analysis with CrateDB. + + +What's inside: +Exploratory data analysis (EDA), time series decomposition, +anomaly detection, forecasting. + +::: + + +:::{grid-item-card} {material-outlined}`sync;2em` Import and Export +:link: timeseries-io +:link-type: ref +:link-alt: About time series data import and export + +Import data into and export data from your CrateDB cluster. + + +What's inside: +Connectivity and integration options with database drivers +and applications, libraries, and frameworks. + +::: + + +:::{grid-item-card} {material-outlined}`smart_display;2em` Video +:link: timeseries-video +:link-type: ref +:link-alt: Video tutorials about time series with CrateDB + +Video tutorials about time series data and CrateDB. + + +What's inside: +Time series introduction. Importing, exporting, +and analyzing. Industrial applications. + +::: + +:::: + + :::{toctree} :hidden: -generate/index -normalize-intervals +Basics +Advanced +Connectivity +Video Tutorials