Update App Search Migration Notebook to Correct Some Errors and Warnings, and Add More Descriptive Text #348

markjhoy · 2024-11-01T17:23:11Z

Updates the App Search migration notebook for the following:

Prints the synonym set created after it is created
Uses put_ruleset instead of put when adding the query rules (was throwing an error with the latest Elasticsearch client)
Breaks the index creation piece up into smaller code chunks, and adds descriptive text for what each part does
Adds a bit more clarity around the ELSER model name
Adds more description around the App Search query; how it's built and what the sections do.

gitnotebooks · 2024-11-01T17:23:16Z

Found 1 changed notebook. Review the changes at https://app.gitnotebooks.com/elastic/elasticsearch-labs/pull/348

kderusso

The changes look good!

The high level feedback I have though, outside of some of the specific feedback that I gave, is that we should consider pointing people to retrievers in these examples. This will be especially important for the rule query which will not work with rerankers (but the rule retriever will work with rerank retrievers such as RRF or semantic reranking). It wouldn't be too hard to migrate these from the query DSL to retriever syntax and it has the added advantage that it will be consistent with what we do in Playground. Thanks!

kderusso · 2024-11-01T17:35:35Z

notebooks/enterprise-search/app-search-engine-exporter.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(json.dumps(elasticsearch.synonyms.get_synonym(id=ENGINE_NAME).body, indent=2))"


I think this is a clients bug, because GET synonyms is the appropriate call to use here, I would think this should be plural. (See documentation here)

I would tend to agree - but the client has it as get_synonym ... :/

Yeah I ran through the notebook to verify, I would definitely say it's a bug in the specfication.

kderusso · 2024-11-01T17:36:11Z

notebooks/enterprise-search/app-search-engine-exporter.ipynb

@@ -186,8 +202,7 @@
    "        }\n",
    "    )\n",
    "\n",
-    "\n",
-    "elasticsearch.query_ruleset.put(ruleset_id=ENGINE_NAME, rules=query_rules)"
+    "elasticsearch.query_rules.put_ruleset(ruleset_id=ENGINE_NAME, rules=query_rules)\n"


Thanks for updating - this was refactored when we added the CRUD API for individual rules.

kderusso · 2024-11-01T17:36:41Z

notebooks/enterprise-search/app-search-engine-exporter.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "First, we'll start by defining our source and destination indices. We'll also ensure that if the destination index exists, to delete it first so we start fresh."


Suggested change

"First, we'll start by defining our source and destination indices. We'll also ensure that if the destination index exists, to delete it first so we start fresh."

"First, we'll start by defining our source and destination indices. We'll also ensure the destination index is deleted if it exists, so that we start fresh."

kderusso · 2024-11-01T17:39:31Z

notebooks/enterprise-search/app-search-engine-exporter.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we'll create our settings which includes filters and and analyzers to use for our text fields.\n",


Out of the scope of this PR, but I feel like this could be a good documentation guide outside of a notebook, that we could link to here. Similar to some of the other startup guides that @leemthompo has been creating.

Yep - I have a forthcoming blog post that I was planning on diving into this a bit more as well.

kderusso · 2024-11-01T17:59:59Z

notebooks/enterprise-search/app-search-engine-exporter.ipynb

@@ -711,7 +882,9 @@
    "\n",
    "If you [enabled and reindexed your data with ELSER](#add-sparse_vector-fields-for-semantic-search-optional), we can now use this to do semantic search.\n",
    "For each `spare_vector` we will generate a `text_expansion` query. These `text_expansion` queries will be added as `should` clauses to a top-level `bool` query.\n",
-    "We also use `min_score` because we want to exclude less relevant results. "
+    "We also use `min_score` because we want to exclude less relevant results. \n",


We shouldn't be generating text_expansion here if we can help it, we should use sparse_vector or better yet semantic queries instead.

it's all good on my side if @markjhoy wants to follow up on this in a follow up PR or do it here.
changing from text_expansion to use semantic will require changing the section on how we index these fields as well - right now the notebook has a guide on how to use ingest pipelines to use ELSER with text_expansion.
Using semantic_text instead would be much better - we can also point users to use E5 when dealing with a non-english dataset.

ioanatia

I think the changes look good, provided we also address the feedback from Kathleen.
We can make the changes to move from ingest pipelines with ELSER to semantic_text in a follow up PR of here - it's up to you.

I have not looked into the CI failures whether they are related.

markjhoy · 2024-11-05T15:12:37Z

I have not looked into the CI failures whether they are related.

Ugh - it looks like the CI failures are version specific issues... I'll have to see if I can either (a) lock the notebook to 8.16+ - or, make the code version specific for backwards compatibility...

markjhoy · 2024-11-05T15:49:33Z

I have not looked into the CI failures whether they are related.

Ugh - it looks like the CI failures are version specific issues... I'll have to see if I can either (a) lock the notebook to 8.16+ - or, make the code version specific for backwards compatibility...

Oh wait - nevermind this... the errors we are seeing in the CI are actually from other notebooks, namely:

notebooks/document-chunking/with-index-pipelines.ipynb... Failed
notebooks/document-chunking/with-langchain-splitters.ipynb... Failed
notebooks/integrations/hugging-face/loading-model-from-hugging-face.ipynb... Failed
notebooks/langchain/langchain-using-own-model.ipynb... Failed

It appears that our migration notebook is fine...

markjhoy · 2024-11-07T20:53:19Z

I've updated the notebook to use semantic_text search. That new field type makes it so easy to do semantic search now.

@kderusso and @ioanatia

kderusso

Looks good, thanks for iterating!

I left some feedback, but it is non blocking. I do think we should consider being retrievers-first here, but I defer to you and the ent search deprecation team on the final decision.

kderusso · 2024-11-07T21:08:11Z

notebooks/enterprise-search/app-search-engine-exporter.ipynb

    "\n",
-    "One of the advantages of having our exported index directly in Elasticsearch is that we can easily take advantage of doing semantic search with ELSER. To do this, we'll need to add a `sparse_vector` field to our index, set up an ingest pipeline, and reindex our data.\n",
+    "One of the advantages of having our exported index directly in Elasticsearch is that we can easily take advantage of doing semantic search with ELSER. To do this, we'll need to add an inference endpoint using ELSER, and a `semantic_text` field to our index to use it.\n",


Nit: Trying to avoid two uses of advantage in the same sentence

Suggested change

"One of the advantages of having our exported index directly in Elasticsearch is that we can easily take advantage of doing semantic search with ELSER. To do this, we'll need to add an inference endpoint using ELSER, and a `semantic_text` field to our index to use it.\n",

"One of the advantages of exporting our index directly to Elasticsearch is that we can easily perform semantic search with ELSER. To do this, we'll need to add an inference endpoint using ELSER, and a `semantic_text` field to our index to use it.\n",

kderusso · 2024-11-07T21:09:42Z

notebooks/enterprise-search/app-search-engine-exporter.ipynb

+    "    inference_config={\n",
+    "        \"service\": \"elasticsearch\",\n",
+    "        \"service_settings\": {\n",
+    "            \"model_id\": \".elser_model_2_linux-x86_64\",\n",


Should we add a note that this will not work on aarch64? Or is that moot because we're using cloud for the examples?

Yeah - since it's cloud - it's a moot point.

kderusso · 2024-11-07T21:10:49Z

notebooks/enterprise-search/app-search-engine-exporter.ipynb

   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reindex the data\n",
-    "Now that we have created the Elasticsearch index and the ingest pipeline, it's time to reindex our data in the new index. The pipeline definition we created above will create a field for each of the `SPARSE_VECTOR_FIELDS` we defined with a `_semantic` suffix, and then infer the sparse vector values from ELSER as the reindex takes place."
+    "Now that we have created the Elasticsearch index, it's time to reindex our data in the new index. If you are using the `semantic_text` fields as defined above with a `_semantic` suffix, and then the reindexing process with automatically infer the sparse vector values from ELSER and use those for the vectors as the reindex takes place."


Suggested change

"Now that we have created the Elasticsearch index, it's time to reindex our data in the new index. If you are using the `semantic_text` fields as defined above with a `_semantic` suffix, and then the reindexing process with automatically infer the sparse vector values from ELSER and use those for the vectors as the reindex takes place."

"Now that we have created the Elasticsearch index, it's time to reindex our data in the new index. If you are using the `semantic_text` fields as defined above with a `_semantic` suffix, and then the reindexing process will automatically infer the sparse vector values from ELSER and use those for the vectors as the reindex takes place."

kderusso · 2024-11-07T21:12:54Z

notebooks/enterprise-search/app-search-engine-exporter.ipynb

@@ -658,7 +816,7 @@
    "                    ]\n",
    "                }\n",
    "            },\n",
-    "            \"ruleset_id\": ENGINE_NAME,\n",
+    "            \"ruleset_ids\": ENGINE_NAME,\n",


ruleset_ids needs to be an array of strings

kderusso · 2024-11-07T21:13:37Z

notebooks/enterprise-search/app-search-engine-exporter.ipynb

    "print(f\"Query results:\\n{json.dumps(results.body, indent=2)}\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### How to combine App Search queries with ELSER\n",
+    "### How to combine App Search queries with Semantic Text\n",


Do we want this to be capitalized?

kderusso · 2024-11-07T21:15:48Z

notebooks/enterprise-search/app-search-engine-exporter.ipynb

    "\n",
    "app_search_query_payload = {\n",
    "    \"query\": {\n",
-    "        \"rule_query\": {\n",
+    "        \"rule\": {\n",


It would have been nice to use retrievers here, perhaps we could wrap this query under a standard retriever (or a rule retriever!) or even use RRF?

markjhoy · 2024-11-07T21:21:59Z

I left some feedback, but it is non blocking. I do think we should consider being retrievers-first here, but I defer to you and the ent search deprecation team on the final decision.

Thanks @kderusso! Yeah - for the time being, I'd like to get this version out now, and come back and do a pass using retrievers. I need to still brush up a bit more myself on them but at least using this version with the semantic text fields is a solid upgrade.

markjhoy · 2024-11-08T00:35:27Z

The failures in the CI are from other notebooks and not this (isolated) one. Going to merge this.

Update for errors, warnings; add descriptions

d706402

markjhoy requested review from afoucret, ioanatia and kderusso November 1, 2024 17:23

kderusso reviewed Nov 1, 2024

View reviewed changes

ioanatia approved these changes Nov 5, 2024

View reviewed changes

markjhoy added 2 commits November 7, 2024 15:48

Update notebook to use semantic text search

fcd1391

fix lint

d67f17c

kderusso approved these changes Nov 7, 2024

View reviewed changes

update text, spelling and clarity

d7c1432

linter / black reformatted

0210f01

markjhoy merged commit 90891b7 into main Nov 8, 2024
3 of 5 checks passed

	"First, we'll start by defining our source and destination indices. We'll also ensure that if the destination index exists, to delete it first so we start fresh."
	"First, we'll start by defining our source and destination indices. We'll also ensure the destination index is deleted if it exists, so that we start fresh."

	"One of the advantages of having our exported index directly in Elasticsearch is that we can easily take advantage of doing semantic search with ELSER. To do this, we'll need to add an inference endpoint using ELSER, and a `semantic_text` field to our index to use it.\n",
	"One of the advantages of exporting our index directly to Elasticsearch is that we can easily perform semantic search with ELSER. To do this, we'll need to add an inference endpoint using ELSER, and a `semantic_text` field to our index to use it.\n",

	"Now that we have created the Elasticsearch index, it's time to reindex our data in the new index. If you are using the `semantic_text` fields as defined above with a `_semantic` suffix, and then the reindexing process with automatically infer the sparse vector values from ELSER and use those for the vectors as the reindex takes place."
	"Now that we have created the Elasticsearch index, it's time to reindex our data in the new index. If you are using the `semantic_text` fields as defined above with a `_semantic` suffix, and then the reindexing process will automatically infer the sparse vector values from ELSER and use those for the vectors as the reindex takes place."

Update App Search Migration Notebook to Correct Some Errors and Warnings, and Add More Descriptive Text #348

Update App Search Migration Notebook to Correct Some Errors and Warnings, and Add More Descriptive Text #348

Uh oh!

Conversation

markjhoy commented Nov 1, 2024

Uh oh!

gitnotebooks bot commented Nov 1, 2024

Uh oh!

kderusso left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ioanatia left a comment

Choose a reason for hiding this comment

Uh oh!

markjhoy commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markjhoy commented Nov 5, 2024

Uh oh!

markjhoy commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kderusso left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

markjhoy commented Nov 7, 2024

Uh oh!

markjhoy commented Nov 8, 2024

Uh oh!

Uh oh!

Uh oh!

markjhoy commented Nov 5, 2024 •

edited

Loading

markjhoy commented Nov 7, 2024 •

edited

Loading