Chunking notebooks: mention semantic_text (#280)

* Chunking notebooks: mention semantic_text * refer to 8.15 * add link to notebook
elastic · Sep 18, 2024 · b59f3c7 · b59f3c7
1 parent 83da04a
commit b59f3c7
Show file tree

Hide file tree

Showing 3 changed files with 24 additions and 3 deletions.
diff --git a/notebooks/document-chunking/tokenization.ipynb b/notebooks/document-chunking/tokenization.ipynb
@@ -15,7 +15,14 @@
     "\n",
     "For users of Elasticsearch it is important to know how texts are broken up into tokens because currently only the [first 512 tokens per field](https://www.elastic.co/guide/en/machine-learning/8.12/ml-nlp-limitations.html#ml-nlp-elser-v1-limit-512) are considered. This means that when you index longer texts, all tokens after the 512th are ignored in your semantic search. Hence it is valuable to know the number of tokens for your input texts before choosing the right model and indexing method.\n",
     "\n",
-    "Currently it is not possible to get the token count information via the API, so here we share the code for calculating token counts. This notebook also shows how to break longer text up into chunks of the right size so that no information is lost during indexing. Currently (as of version 8.12) this has to be done by the user. Future versions will remove this necessity and Elasticsearch will automatically create chunks behind the scenes."
+    "Currently it is not possible to get the token count information via the API, so here we share the code for calculating token counts. This notebook also shows how to break longer text up into chunks of the right size so that no information is lost during indexing.\n",
+    "\n",
+    "# Prefer the `semantic_text` field type\n",
+    "\n",
+    "**Elasticsearch version 8.15 introduced the [`semantic_text`](https://www.elastic.co/guide/en/elasticsearch/reference/master/semantic-text.html) field type which handles the chunking process behind the scenes. Before continuing with this notebook, we highly recommend looking into this:**\n",
+    "\n",
+    "- **<https://www.elastic.co/search-labs/blog/semantic-search-simplified-semantic-text>**\n",
+    "- **<https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/search/09-semantic-text.ipynb>**"
    ]
   },
   {

diff --git a/notebooks/document-chunking/with-index-pipelines.ipynb b/notebooks/document-chunking/with-index-pipelines.ipynb
@@ -13,7 +13,14 @@
     "This interactive notebook will:\n",
     "- load the model \"sentence-transformers__all-minilm-l6-v2\" from Hugging Face and into Elasticsearch ML Node\n",
     "- create an index and ingest pipeline that will chunk large fields into smaller passages and vectorize them using the model\n",
-    "- perform a search and return docs with the most relevant passages"
+    "- perform a search and return docs with the most relevant passages\n",
+    "\n",
+    "# Prefer the `semantic_text` field type\n",
+    "\n",
+    "**Elasticsearch version 8.15 introduced the [`semantic_text`](https://www.elastic.co/guide/en/elasticsearch/reference/master/semantic-text.html) field type which handles the chunking process behind the scenes. Before continuing with this notebook, we highly recommend looking into this:**\n",
+    "\n",
+    "- **<https://www.elastic.co/search-labs/blog/semantic-search-simplified-semantic-text>**\n",
+    "- **<https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/search/09-semantic-text.ipynb>**"
    ]
   },
   {

diff --git a/notebooks/document-chunking/with-langchain-splitters.ipynb b/notebooks/document-chunking/with-langchain-splitters.ipynb
@@ -12,7 +12,14 @@
     "This interactive notebook will:\n",
     "- load the model \"sentence-transformers__all-minilm-l6-v2\" from Hugging Face and into Elasticsearch ML Node\n",
     "- Use LangChain splitters to chunk the passages into sentences and index them into Elasticsearch with nested dense vector\n",
-    "- perform a search and return docs with the most relevant passages"
+    "- perform a search and return docs with the most relevant passages\n",
+    "\n",
+    "# Prefer the `semantic_text` field type\n",
+    "\n",
+    "**Elasticsearch version 8.15 introduced the [`semantic_text`](https://www.elastic.co/guide/en/elasticsearch/reference/master/semantic-text.html) field type which handles the chunking process behind the scenes. Before continuing with this notebook, we highly recommend looking into this:**\n",
+    "\n",
+    "- **<https://www.elastic.co/search-labs/blog/semantic-search-simplified-semantic-text>**\n",
+    "- **<https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/search/09-semantic-text.ipynb>**"
    ]
   },
   {