diff --git a/bin/find-notebooks-to-test.sh b/bin/find-notebooks-to-test.sh index 94604f34..64a1a83c 100755 --- a/bin/find-notebooks-to-test.sh +++ b/bin/find-notebooks-to-test.sh @@ -1,6 +1,7 @@ #!/bin/bash # add any notebooks that are currently not testable to the exempt list EXEMPT_NOTEBOOKS=( + "notebooks/esql/esql-getting-started.ipynb" "notebooks/search/07-inference.ipynb" "notebooks/search/08-learning-to-rank.ipynb" "notebooks/langchain/langchain-vector-store.ipynb" diff --git a/notebooks/esql/esql-getting-started.ipynb b/notebooks/esql/esql-getting-started.ipynb new file mode 100644 index 00000000..9a412678 --- /dev/null +++ b/notebooks/esql/esql-getting-started.ipynb @@ -0,0 +1,1402 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "yfVx1CGq7dZU" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Getting started with ES|QL\n", + "\n", + "In this notebook you'll learn the basics of the Elasticsearch Query Language (ES|QL).\n", + "You'll be using the official [Elasticsearch Python client](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html).\n", + "\n", + "You'll learn how to:\n", + "\n", + "- Run an ES|QL query\n", + "- Use processing commands\n", + "- Sort a table\n", + "- Query data\n", + "- Chain processing commands\n", + "- Compute values\n", + "- Calculate statistics\n", + "- Access columns\n", + "- Create a histogram\n", + "- Enrich data\n", + "- Process data\n", + "\n", + "> ⚠️ Do not use ES|QL on production environments. This functionality is in **technical preview** and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Elastic Cloud deployment\n", + "\n", + "If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook) for a free trial.\n", + "\n", + "Once logged in to your Elastic Cloud account, go to the [Create deployment](https://cloud.elastic.co/deployments/create) page and select **Create deployment**. Leave all settings with their default values." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Install packages and import modules\n", + "\n", + "To get started, we'll need to connect to our Elastic deployment using the Python client. Because we're using an Elastic Cloud deployment, we'll use the **Cloud ID** to identify our deployment.\n", + "\n", + "First we need to install the `elasticsearch` Python client." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "vG5NxAmpa-ZF", + "outputId": "878bb1fb-7a1c-401b-b145-53b72622d64b" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Collecting elasticsearch\n", + " Downloading elasticsearch-8.12.1-py3-none-any.whl (432 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m432.1/432.1 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hCollecting elastic-transport<9,>=8 (from elasticsearch)\n", + " Downloading elastic_transport-8.12.0-py3-none-any.whl (59 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m59.9/59.9 kB\u001b[0m \u001b[31m6.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hRequirement already satisfied: urllib3<3,>=1.26.2 in /usr/local/lib/python3.10/dist-packages (from elastic-transport<9,>=8->elasticsearch) (2.0.7)\n", + "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from elastic-transport<9,>=8->elasticsearch) (2024.2.2)\n", + "Installing collected packages: elastic-transport, elasticsearch\n", + "Successfully installed elastic-transport-8.12.0 elasticsearch-8.12.1\n" + ] + } + ], + "source": [ + "# Install packages\n", + "\n", + "!pip install elasticsearch" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "b2vOwH2lbFFF", + "outputId": "80290b0d-e540-4236-d0b6-49c6c4ed5334" + }, + "outputs": [], + "source": [ + "# Import packages\n", + "\n", + "from elasticsearch import Elasticsearch\n", + "from elasticsearch.helpers import bulk\n", + "from getpass import getpass # standard library module for secure credential input" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize the Elasticsearch client\n", + "\n", + "Now we can instantiate the Elasticsearch Python client, providing the CLOUD ID and [API key](https://www.elastic.co/guide/en/kibana/current/api-keys.html#create-api-key) for your deployment.\n", + "\n", + "> ℹ️ If you're running Elasticsearch locally or on self-managed infrastructure, you'll need to pass in the Elasticsearch host instead. [Read the docs](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#_verifying_https_with_certificate_fingerprints_python_3_10_or_later) about how to connect to Elasticsearch locally." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "rBbdkhAIbtym", + "outputId": "12060d90-111f-4b04-9c18-a02a1a017d9e" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Elastic Cloud ID: ··········\n", + "Elastic API Key: ··········\n" + ] + } + ], + "source": [ + "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id\n", + "ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n", + "\n", + "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key\n", + "ELASTIC_API_KEY = getpass(\"Elastic API Key: \")\n", + "\n", + "# Create the client instance\n", + "client = Elasticsearch(\n", + " # For local development\n", + " # hosts=[\"http://localhost:9200\"]\n", + " cloud_id=ELASTIC_CLOUD_ID,\n", + " api_key=ELASTIC_API_KEY,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PMU9fyT07j3C" + }, + "source": [ + "## Add sample data to Elasticsearch\n", + "\n", + "Before we index our sample dataset, let's create an index named `sample_data` with the correct [mappings](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "VZlVzRNP8IXo" + }, + "outputs": [], + "source": [ + "index_name = \"sample_data\"\n", + "\n", + "mappings = {\n", + " \"mappings\": {\n", + " \"properties\": {\"client_ip\": {\"type\": \"ip\"}, \"message\": {\"type\": \"keyword\"}}\n", + " }\n", + "}\n", + "\n", + "# Create the index\n", + "if not client.indices.exists(index=index_name):\n", + " client.indices.create(index=index_name, body=mappings)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QjuRVtqc8esR" + }, + "source": [ + "Now we can index the data using the Elasticsearch Python client's [bulk helpers](https://elasticsearch-py.readthedocs.io/en/latest/helpers.html#bulk-helpers)." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "gKYu3MHQ8te5", + "outputId": "bc69f677-aec1-4ea5-9166-e51444f4e9ee" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Successfully indexed 7 documents.\n" + ] + } + ], + "source": [ + "# Documents to be indexed\n", + "documents = [\n", + " {\n", + " \"@timestamp\": \"2023-10-23T12:15:03.360Z\",\n", + " \"client_ip\": \"172.21.2.162\",\n", + " \"message\": \"Connected to 10.1.0.3\",\n", + " \"event_duration\": 3450233,\n", + " },\n", + " {\n", + " \"@timestamp\": \"2023-10-23T12:27:28.948Z\",\n", + " \"client_ip\": \"172.21.2.113\",\n", + " \"message\": \"Connected to 10.1.0.2\",\n", + " \"event_duration\": 2764889,\n", + " },\n", + " {\n", + " \"@timestamp\": \"2023-10-23T13:33:34.937Z\",\n", + " \"client_ip\": \"172.21.0.5\",\n", + " \"message\": \"Disconnected\",\n", + " \"event_duration\": 1232382,\n", + " },\n", + " {\n", + " \"@timestamp\": \"2023-10-23T13:51:54.732Z\",\n", + " \"client_ip\": \"172.21.3.15\",\n", + " \"message\": \"Connection error\",\n", + " \"event_duration\": 725448,\n", + " },\n", + " {\n", + " \"@timestamp\": \"2023-10-23T13:52:55.015Z\",\n", + " \"client_ip\": \"172.21.3.15\",\n", + " \"message\": \"Connection error\",\n", + " \"event_duration\": 8268153,\n", + " },\n", + " {\n", + " \"@timestamp\": \"2023-10-23T13:53:55.832Z\",\n", + " \"client_ip\": \"172.21.3.15\",\n", + " \"message\": \"Connection error\",\n", + " \"event_duration\": 5033755,\n", + " },\n", + " {\n", + " \"@timestamp\": \"2023-10-23T13:55:01.543Z\",\n", + " \"client_ip\": \"172.21.3.15\",\n", + " \"message\": \"Connected to 10.1.0.1\",\n", + " \"event_duration\": 1756467,\n", + " },\n", + "]\n", + "\n", + "# Prepare the actions for the bulk API using list comprehension\n", + "actions = [{\"_index\": index_name, \"_source\": doc} for doc in documents]\n", + "\n", + "# Perform the bulk index operation and capture the response\n", + "success, failed = bulk(client, actions)\n", + "\n", + "if failed:\n", + " print(f\"Some documents failed to index: {failed}\")\n", + "else:\n", + " print(f\"Successfully indexed {success} documents.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "nkFLjdQzcPjQ" + }, + "outputs": [], + "source": [ + "# Suppress specific Elasticsearch warnings about default limit of [500] that pollute responses\n", + "\n", + "import warnings\n", + "from elasticsearch import ElasticsearchWarning\n", + "\n", + "warnings.filterwarnings(\"ignore\", category=ElasticsearchWarning)" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": { + "id": "UC4KZFRYjzkc" + }, + "outputs": [], + "source": [ + "# Format response to return human-readable tables\n", + "\n", + "\n", + "def format_response(response_data):\n", + " column_names = [col[\"name\"] for col in response_data[\"columns\"]]\n", + " column_widths = [\n", + " max(\n", + " len(name),\n", + " max(\n", + " (\n", + " len(str(row[i]) if row[i] is not None else \"None\")\n", + " for row in response_data[\"values\"]\n", + " ),\n", + " default=0,\n", + " ),\n", + " )\n", + " for i, name in enumerate(column_names)\n", + " ]\n", + " row_format = \" | \".join([\"{:<\" + str(width) + \"}\" for width in column_widths])\n", + " print(row_format.format(*column_names))\n", + " print(\"-\" * sum(column_widths) + \"-\" * (len(column_widths) - 1) * 3)\n", + " for row in response_data[\"values\"]:\n", + " # Convert None values in the row to \"None\" before formatting\n", + " formatted_row = [(str(cell) if cell is not None else \"None\") for cell in row]\n", + " print(row_format.format(*formatted_row))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Your first ES|QL query\n", + "\n", + "Each ES|QL query starts with a [source command](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-source-commands \"Source commands\"). A source command produces a table, typically with data from Elasticsearch.\n", + "\n", + "The [`FROM`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-from \"FROM\") source command returns a table with documents from a data stream, index, or alias. Each row in the resulting table represents a document. This query returns up to 500 documents from the `sample_data` index:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Jyb40LkJh6sN", + "outputId": "c6ede279-3d1e-4b82-ffb7-92a438744d40" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "@timestamp | client_ip | event_duration | message \n", + "--------------------------------------------------------------------------------\n", + "2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3\n", + "2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2\n", + "2023-10-23T13:33:34.937Z | 172.21.0.5 | 1232382 | Disconnected \n", + "2023-10-23T13:51:54.732Z | 172.21.3.15 | 725448 | Connection error \n", + "2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error \n", + "2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error \n", + "2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1\n" + ] + } + ], + "source": [ + "esql_query = \"FROM sample_data\"\n", + "\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each column corresponds to a field, and can be accessed by the name of that field.\n", + "\n", + "ℹ️ ES|QL keywords are case-insensitive. `FROM sample_data` is identical to `from sample_data`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "B1P44sOfmehL" + }, + "source": [ + "## Processing commands\n", + "\n", + "A source command can be followed by one or more [processing commands](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-processing-commands), separated by a pipe character: `|`. Processing commands change an input table by adding, removing, or changing rows and columns. Processing commands can perform filtering, projection, aggregation, and more.\n", + "\n", + "For example, you can use the `LIMIT` command to limit the number of rows that are returned, up to a maximum of 10,000 rows:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "VGhogxc4mEUW", + "outputId": "3c766189-a4f5-4737-edea-949c1e323c7b" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "@timestamp | client_ip | event_duration | message \n", + "--------------------------------------------------------------------------------\n", + "2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3\n", + "2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2\n", + "2023-10-23T13:33:34.937Z | 172.21.0.5 | 1232382 | Disconnected \n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| LIMIT 3\n", + "\"\"\"\n", + "\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RLPlQtYem4fc" + }, + "source": [ + "### Sort a table\n", + "\n", + "Another processing command is the [`SORT`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-sort) command. By default, the rows returned by `FROM` don’t have a defined sort order. Use the `SORT` command to sort rows on one or more columns:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "m5LOqw1kmGkb", + "outputId": "350d5934-8470-421d-8417-dfa3aba76d55" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "@timestamp | client_ip | event_duration | message \n", + "--------------------------------------------------------------------------------\n", + "2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1\n", + "2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error \n", + "2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error \n", + "2023-10-23T13:51:54.732Z | 172.21.3.15 | 725448 | Connection error \n", + "2023-10-23T13:33:34.937Z | 172.21.0.5 | 1232382 | Disconnected \n", + "2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2\n", + "2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3\n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| SORT @timestamp DESC\n", + "\"\"\"\n", + "\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9YBqc9ftoNkb" + }, + "source": [ + "### Query the data\n", + "\n", + "Use the [`WHERE`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-where) command to query the data. For example, to find all events with a duration longer than 5ms:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "wR4Pt165oZ6i", + "outputId": "0c0be7c2-8f88-46ab-c8a6-c0de44a20ecc" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "@timestamp | client_ip | event_duration | message \n", + "--------------------------------------------------------------------------\n", + "2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error\n", + "2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error\n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| WHERE event_duration > 5000000\n", + "\"\"\"\n", + "\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "C_6-0a4ZopKn" + }, + "source": [ + "`WHERE` supports several [operators](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-functions-operators.html#esql-operators).\n", + "\n", + "For example, you can use [`LIKE`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-functions-operators.html#esql-like-operator) to run a wildcard query against the message column:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "SsU1Nsh1ouFs", + "outputId": "f2f26efd-5f78-4f0a-bd26-619ab7bfb41d" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "@timestamp | client_ip | event_duration | message \n", + "--------------------------------------------------------------------------------\n", + "2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3\n", + "2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2\n", + "2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1\n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| WHERE message LIKE \"Connected*\"\n", + "\"\"\"\n", + "\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HjuCP76BEsDz" + }, + "source": [ + "### More processing commands\n", + "\n", + "There are many other processing commands, like [`KEEP`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-keep \"KEEP\") and [`DROP`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-drop \"DROP\") to keep or drop columns, [`ENRICH`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-enrich \"ENRICH\") to enrich a table with data from indices in Elasticsearch, and [`DISSECT`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-dissect \"DISSECT\") and [`GROK`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-grok \"GROK\") to process data. Refer to [Processing commands](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-processing-commands \"Processing commands\") for an overview." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ESaKzB8HFMXj" + }, + "source": [ + "## Chain processing commands\n", + "\n", + "You can chain processing commands, separated by a pipe character: `|`. Each\n", + "processing command works on the output table of the previous command. The result\n", + "of a query is the table produced by the final processing command.\n", + "\n", + "The following example first sorts the table on `@timestamp`, and next limits the\n", + "result set to 3 rows:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ztm9lRiIFTnS", + "outputId": "79aa1c17-b3c8-4fd8-f4c9-6f3bc06f987b" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "@timestamp | client_ip | event_duration | message \n", + "-------------------------------------------------------------------------------\n", + "2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1\n", + "2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error \n", + "2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error \n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| SORT @timestamp DESC\n", + "| LIMIT 3\n", + "\"\"\"\n", + "\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cogdcusMFgB3" + }, + "source": [ + "> ℹ️ The order of processing commands is important. First limiting the result set to 3 rows before sorting those 3 rows would most likely return a result that is different than this example, where the sorting comes before the limit." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vsJ5wFitFlow" + }, + "source": [ + "## Compute values\n", + "\n", + "Use the [`EVAL`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-eval \"EVAL\") command to append columns to a table, with calculated values. For example, the following query appends a `duration_ms` column. The values in the column are computed by dividing `event_duration` by 1,000,000. In other words: `event_duration` converted from nanoseconds to milliseconds." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "bxCjCNwxFx4e", + "outputId": "caefb72f-5f9e-4541-d688-0ee51fda87c5" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "@timestamp | client_ip | event_duration | message | duration_ms\n", + "----------------------------------------------------------------------------------------------\n", + "2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3 | 3.450233 \n", + "2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2 | 2.764889 \n", + "2023-10-23T13:33:34.937Z | 172.21.0.5 | 1232382 | Disconnected | 1.232382 \n", + "2023-10-23T13:51:54.732Z | 172.21.3.15 | 725448 | Connection error | 0.725448 \n", + "2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error | 8.268153 \n", + "2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error | 5.033755 \n", + "2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1 | 1.756467 \n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| EVAL duration_ms = event_duration/1000000.0\n", + "\"\"\"\n", + "\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OY6HN6PtF8zy" + }, + "source": [ + "`EVAL` supports several [functions](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-functions-operators.html#esql-functions). For example, to round a number to the closest number with the specified number of digits, use the [`ROUND`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-functions-operators.html#esql-round \"ROUND\") function:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "RTx1xw2ZF__8", + "outputId": "72293fb7-8e2f-4042-d9a3-db6ff7d53d19" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "@timestamp | client_ip | event_duration | message | duration_ms\n", + "----------------------------------------------------------------------------------------------\n", + "2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3 | 3.5 \n", + "2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2 | 2.8 \n", + "2023-10-23T13:33:34.937Z | 172.21.0.5 | 1232382 | Disconnected | 1.2 \n", + "2023-10-23T13:51:54.732Z | 172.21.3.15 | 725448 | Connection error | 0.7 \n", + "2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error | 8.3 \n", + "2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error | 5.0 \n", + "2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1 | 1.8 \n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| EVAL duration_ms = ROUND(event_duration/1000000.0, 1)\n", + "\"\"\"\n", + "\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NW3NYqMLGI_O" + }, + "source": [ + "## Calculate statistics\n", + "\n", + "You can also use ES|QL to aggregate your data. Use the [`STATS ... BY`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-stats-by \"STATS ... BY\") command to calculate statistics.\n", + "\n", + "For example, to calculate the median duration:" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "v-vamaNVGTwS", + "outputId": "6bf99372-dc6f-4f37-ef51-71b9da34d3e7" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "median_duration\n", + "---------------\n", + "2764889.0 \n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| STATS median_duration = MEDIAN(event_duration)\n", + "\"\"\"\n", + "\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Tyr8TURjGXZm" + }, + "source": [ + "You can calculate multiple stats with one command:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "QLGT8GTvGZIg", + "outputId": "90e0a7a8-4294-4d81-81bf-55c0296e18db" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "median_duration | max_duration\n", + "------------------------------\n", + "2764889.0 | 8268153 \n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| STATS median_duration = MEDIAN(event_duration), max_duration = MAX(event_duration)\n", + "\"\"\"\n", + "\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZeLti0ByGc9S" + }, + "source": [ + "Use BY to group calculated stats by one or more columns. For example, to calculate the median duration per client IP:" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "EWGMRhZrHH-c", + "outputId": "4d106e79-20bd-4979-a653-51f403f086e5" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "median_duration | client_ip \n", + "------------------------------\n", + "1232382.0 | 172.21.0.5 \n", + "2764889.0 | 172.21.2.113\n", + "3450233.0 | 172.21.2.162\n", + "3395111.0 | 172.21.3.15 \n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| STATS median_duration = MEDIAN(event_duration) BY client_ip\n", + "\"\"\"\n", + "\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "y3zS4-zLHO5I" + }, + "source": [ + "## Access columns\n", + "\n", + "You can access columns by their name. If a name contains special characters, [it needs to be quoted](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-syntax.html#esql-identifiers \"Identifiers\") with backticks (`` ` ``).\n", + "\n", + "Assigning an explicit name to a column created by `EVAL` or `STATS` is optional. If you don’t provide a name, the new column name is equal to the function expression. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "V62P4GEgHojF", + "outputId": "6beaf6de-6e5c-49bc-f865-3fb1518cda9a" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "@timestamp | client_ip | event_duration | message | event_duration/1000000.0\n", + "-----------------------------------------------------------------------------------------------------------\n", + "2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3 | 3.450233 \n", + "2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2 | 2.764889 \n", + "2023-10-23T13:33:34.937Z | 172.21.0.5 | 1232382 | Disconnected | 1.232382 \n", + "2023-10-23T13:51:54.732Z | 172.21.3.15 | 725448 | Connection error | 0.725448 \n", + "2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error | 8.268153 \n", + "2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error | 5.033755 \n", + "2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1 | 1.756467 \n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| EVAL event_duration/1000000.0\n", + "\"\"\"\n", + "\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Al_cDCIIHpsR" + }, + "source": [ + "In this query, `EVAL` adds a new column named `event_duration/1000000.0`. Because its name contains special characters, to access this column, quote it with backticks:" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Jv6y2SjLHtBA", + "outputId": "242f2ced-f313-4a4a-bab8-f6ccad917993" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "MEDIAN(`event_duration/1000000.0`)\n", + "----------------------------------\n", + "2.764889 \n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| EVAL event_duration/1000000.0\n", + "| STATS MEDIAN(`event_duration/1000000.0`)\n", + "\"\"\"\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZKB3miTcH1St" + }, + "source": [ + "## Create a histogram\n", + "\n", + "To track statistics over time, ES|QL enables you to create histograms using the [`AUTO_BUCKET`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-functions-operators.html#esql-auto_bucket \"AUTO_BUCKET\") function. `AUTO_BUCKET` creates human-friendly bucket sizes and returns a value for each row that corresponds to the resulting bucket the row falls into.\n", + "\n", + "For example, to create hourly buckets for the data on October 23rd:" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "WtKXFOfyH4zt", + "outputId": "c85f9f3a-fb18-4c15-8f51-25cc9662ea96" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "@timestamp | bucket \n", + "---------------------------------------------------\n", + "2023-10-23T12:15:03.360Z | 2023-10-23T12:00:00.000Z\n", + "2023-10-23T12:27:28.948Z | 2023-10-23T12:00:00.000Z\n", + "2023-10-23T13:33:34.937Z | 2023-10-23T13:00:00.000Z\n", + "2023-10-23T13:51:54.732Z | 2023-10-23T13:00:00.000Z\n", + "2023-10-23T13:52:55.015Z | 2023-10-23T13:00:00.000Z\n", + "2023-10-23T13:53:55.832Z | 2023-10-23T13:00:00.000Z\n", + "2023-10-23T13:55:01.543Z | 2023-10-23T13:00:00.000Z\n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| KEEP @timestamp\n", + "| EVAL bucket = AUTO_BUCKET (@timestamp, 24, \"2023-10-23T00:00:00Z\", \"2023-10-23T23:59:59Z\")\n", + "\"\"\"\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4lXgoIGDH8ZK" + }, + "source": [ + "Combine `AUTO_BUCKET` with [`STATS ... BY`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-stats-by \"STATS ... BY\") to create a histogram. For example, to count the number of events per hour:" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "FgJHT0BpIEqj", + "outputId": "da6dacb1-5a29-4618-d974-60d1134af984" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "COUNT(*) | bucket \n", + "-----------------------------------\n", + "2 | 2023-10-23T12:00:00.000Z\n", + "5 | 2023-10-23T13:00:00.000Z\n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| KEEP @timestamp, event_duration\n", + "| EVAL bucket = AUTO_BUCKET (@timestamp, 24, \"2023-10-23T00:00:00Z\", \"2023-10-23T23:59:59Z\")\n", + "| STATS COUNT(*) BY bucket\n", + "\"\"\"\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LKmuoDfRIH67" + }, + "source": [ + "Or the median duration per hour:" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "gDHT5BHNIHQM", + "outputId": "9eceda80-1fb7-4834-c8c0-c047fdb14c0d" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "median_duration | bucket \n", + "------------------------------------------\n", + "3107561.0 | 2023-10-23T12:00:00.000Z\n", + "1756467.0 | 2023-10-23T13:00:00.000Z\n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| KEEP @timestamp, event_duration\n", + "| EVAL bucket = AUTO_BUCKET (@timestamp, 24, \"2023-10-23T00:00:00Z\", \"2023-10-23T23:59:59Z\")\n", + "| STATS median_duration = MEDIAN(event_duration) BY bucket\n", + "\"\"\"\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ETZb10tIIKyO" + }, + "source": [ + "## Enrich data\n", + "\n", + "ES|QL enables you to [enrich](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-enrich-data.html \"Data enrichment\") a table with data from indices in Elasticsearch, using the [`ENRICH`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-enrich \"ENRICH\") command.\n", + "\n", + "> ℹ️ Before you can use `ENRICH`, you first need to [create](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-enrich-data.html#esql-create-enrich-policy \"Create an enrich policy\") and [execute](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-enrich-data.html#esql-execute-enrich-policy \"Execute the enrich policy\") an [enrich policy](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-enrich-data.html#esql-enrich-policy). We provide a demo environment at [ela.st/ql](https://ela.st/ql/), where an enrich policy called `clientip_policy` has already been created and executed, if you just want to see how it works.\n", + "\n", + "The following requests create and execute a policy called `clientip_policy`. The policy links an IP address to an environment (\"Development\", \"QA\", or \"Production\").\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "2QNKgM--IQtJ", + "outputId": "2288d306-4a78-4d89-c275-a32d0fdb8b79" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "ObjectApiResponse({'task': 'dGhch02RRVuQ2Xb57064RQ:3175562'})" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Define the mapping\n", + "mapping = {\n", + " \"mappings\": {\n", + " \"properties\": {\"client_ip\": {\"type\": \"keyword\"}, \"env\": {\"type\": \"keyword\"}}\n", + " }\n", + "}\n", + "\n", + "# Create the index with the mapping\n", + "client.indices.create(index=\"clientips\", body=mapping)\n", + "\n", + "# Prepare bulk data\n", + "bulk_data = [\n", + " {\"index\": {}},\n", + " {\"client_ip\": \"172.21.0.5\", \"env\": \"Development\"},\n", + " {\"index\": {}},\n", + " {\"client_ip\": \"172.21.2.113\", \"env\": \"QA\"},\n", + " {\"index\": {}},\n", + " {\"client_ip\": \"172.21.2.162\", \"env\": \"QA\"},\n", + " {\"index\": {}},\n", + " {\"client_ip\": \"172.21.3.15\", \"env\": \"Production\"},\n", + " {\"index\": {}},\n", + " {\"client_ip\": \"172.21.3.16\", \"env\": \"Production\"},\n", + "]\n", + "\n", + "# Bulk index the data\n", + "client.bulk(index=\"clientips\", body=bulk_data)\n", + "\n", + "# Define the enrich policy\n", + "policy = {\n", + " \"match\": {\n", + " \"indices\": \"clientips\",\n", + " \"match_field\": \"client_ip\",\n", + " \"enrich_fields\": [\"env\"],\n", + " }\n", + "}\n", + "\n", + "# Put the enrich policy\n", + "client.enrich.put_policy(name=\"clientip_policy\", body=policy)\n", + "\n", + "# Execute the enrich policy without waiting for completion\n", + "client.enrich.execute_policy(name=\"clientip_policy\", wait_for_completion=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GtauD2gHJ683" + }, + "source": [ + "After creating and executing a policy, you can use it with the `ENRICH` command:" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IHER1VM2KsqV", + "outputId": "31523029-d7a5-4624-f743-da3d3c18d1d9" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "@timestamp | event_duration | client_ip | env \n", + "----------------------------------------------------------------------\n", + "2023-10-23T12:15:03.360Z | 3450233 | 172.21.2.162 | QA \n", + "2023-10-23T12:27:28.948Z | 2764889 | 172.21.2.113 | QA \n", + "2023-10-23T13:33:34.937Z | 1232382 | 172.21.0.5 | Development\n", + "2023-10-23T13:51:54.732Z | 725448 | 172.21.3.15 | Production \n", + "2023-10-23T13:52:55.015Z | 8268153 | 172.21.3.15 | Production \n", + "2023-10-23T13:53:55.832Z | 5033755 | 172.21.3.15 | Production \n", + "2023-10-23T13:55:01.543Z | 1756467 | 172.21.3.15 | Production \n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| KEEP @timestamp, client_ip, event_duration\n", + "| EVAL client_ip = TO_STRING(client_ip)\n", + "| ENRICH clientip_policy ON client_ip WITH env\n", + "\"\"\"\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rWrJ99ReKvRT" + }, + "source": [ + "You can use the new `env` column that’s added by the `ENRICH` command in subsequent commands. For example, to calculate the median duration per environment:" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "9Fyz2JkZK47P", + "outputId": "041a5eb3-ec36-498f-ee78-2d65c0d7450c" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "median_duration | env \n", + "-----------------------------\n", + "3107561.0 | QA \n", + "1232382.0 | Development\n", + "3395111.0 | Production \n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| KEEP @timestamp, client_ip, event_duration\n", + "| EVAL client_ip = TO_STRING(client_ip)\n", + "| ENRICH clientip_policy ON client_ip WITH env\n", + "| STATS median_duration = MEDIAN(event_duration) BY env\n", + "\"\"\"\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YonYg0UhK8Rk" + }, + "source": [ + "For more about data enrichment with ES|QL, refer to [Data enrichment](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-enrich-data.html \"Data enrichment\")." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Fsjoqo-eK9qX" + }, + "source": [ + "## Process data\n", + "\n", + "Your data may contain unstructured strings that you want to [structure](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-process-data-with-dissect-and-grok.html \"Data processing with DISSECT and GROK\") to make it easier to analyze the data. For example, the sample data contains log messages like:\n", + "\n", + "```\n", + "\"Connected to 10.1.0.3\"\n", + "```\n", + "\n", + "By extracting the IP address from these messages, you can determine which IP has accepted the most client connections.\n", + "\n", + "To structure unstructured strings at query time, you can use the ES|QL [`DISSECT`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-dissect \"DISSECT\") and [`GROK`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-grok \"GROK\") commands. `DISSECT` works by breaking up a string using a delimiter-based pattern. `GROK` works similarly, but uses regular expressions. This makes `GROK` more powerful, but generally also slower.\n", + "\n", + "In this case, no regular expressions are needed, as the `message` is straightforward: \"Connected to \", followed by the server IP. To match this string, you can use the following `DISSECT` command:" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "kwX_AFgeLEvQ", + "outputId": "58b08ea5-cdd5-4ce0-cc2f-3d1db39b8ddb" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "@timestamp | client_ip | event_duration | message | server_ip\n", + "--------------------------------------------------------------------------------------------\n", + "2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3 | 10.1.0.3 \n", + "2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2 | 10.1.0.2 \n", + "2023-10-23T13:33:34.937Z | 172.21.0.5 | 1232382 | Disconnected | None \n", + "2023-10-23T13:51:54.732Z | 172.21.3.15 | 725448 | Connection error | None \n", + "2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error | None \n", + "2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error | None \n", + "2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1 | 10.1.0.1 \n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| DISSECT message \"Connected to %{server_ip}\"\n", + "\"\"\"\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1llLaCrWLTk_" + }, + "source": [ + "This adds a `server_ip` column to those rows that have a `message` that matches this pattern. For other rows, the value of `server_ip` is `null`.\n", + "\n", + "You can use the new `server_ip` column that’s added by the `DISSECT` command in subsequent commands. For example, to determine how many connections each server has accepted:" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "8_uEN-H5LaXH", + "outputId": "db674a50-6094-427f-8cee-d3e28a7260d0" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "COUNT(*) | server_ip\n", + "--------------------\n", + "1 | 10.1.0.3 \n", + "1 | 10.1.0.2 \n", + "1 | 10.1.0.1 \n" + ] + } + ], + "source": [ + "esql_query = \"\"\"\n", + "FROM sample_data\n", + "| WHERE STARTS_WITH(message, \"Connected to\")\n", + "| DISSECT message \"Connected to %{server_ip}\"\n", + "| STATS COUNT(*) BY server_ip\n", + "\"\"\"\n", + "response = client.esql.query(query=esql_query)\n", + "format_response(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MtyvTCjyL6aC" + }, + "source": [ + "> ℹ️ To learn more about data processing with ES|QL, refer to [Data processing with DISSECT and GROK](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-process-data-with-dissect-and-grok.html \"Data processing with DISSECT and GROK\")." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HRE5HF_SLdqY" + }, + "source": [ + "## Learn more\n", + "\n", + "To learn more about ES|QL, refer to:\n", + "- [_Learning ES|QL_](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-language.html \"Learning ES|QL\")\n", + "- [_Using ES|QL_](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-using.html \"Using ES|QL\")" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.11.5" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}