Skip to content

Commit

Permalink
Update tutorial text.
Browse files Browse the repository at this point in the history
egoebelbecker committed Oct 26, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
1 parent 689f73f commit 0716ec4
Showing 1 changed file with 37 additions and 9 deletions.
46 changes: 37 additions & 9 deletions solutions/nlp/recommender_system/recommender_system.ipynb
Original file line number Diff line number Diff line change
@@ -237,24 +237,54 @@
"source": [
"So, with the meta data stored in Redis, it's time to calculate the embeddings and add them to Milvus.\n",
"\n",
"First, you need a collection to store them in. Create a simple one that stores the movie ID and embeddings for in the **Movies** field.\n",
"First, you need a collection to store them in. Create a simple one that stores the title and embeddings for in the **Movies** field, while also allowing dynamic fields. You'll use the dynamic fields for metadata.\n",
"\n",
"Then, you'll index that field to make searches more efficent."
"Then, you'll index the embedding field to make searches more efficent."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 70,
"id": "aa7ab317-a6f9-48bf-9c80-1792537c99ab",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collection created.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"alloc_timestamp unimplemented, ignore it\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collection indexed!\n"
]
}
],
"source": [
"COLLECTION_NAME = 'film_vectors'\n",
"PARTITION_NAME = 'Movie'\n",
"\n",
"# Here's our record schema\n",
"\"\"\"\n",
"\"title\": Film title,\n",
"\"overview\": description,\n",
"\"release_date\": film release date,\n",
"\"genres\": film generes,\n",
"\"embedding\": embedding\n",
"\"\"\"\n",
"\n",
"id = FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=500, is_primary=True)\n",
"field = FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=384)\n",
"#meta = FieldSchema(name='Meta', dtype=DataType.JSON)\n",
"\n",
"schema = CollectionSchema(fields=[id, field], description=\"movie recommender: film vectors\", enable_dynamic_field=True)\n",
"\n",
@@ -329,9 +359,7 @@
"id": "447355dd-b82b-4660-b192-f614918901fa",
"metadata": {},
"source": [
"Now, you can create the embeddings. This dataset is too large to send to Milvus in a single insert statement, but sending them one at a time would create unnecessary network traffic and add too much time. So, this code uses batches. You can play with the batch size to suit your individual needs and preferences.\n",
"\n",
"A few movies will fail for ids that cannot be cast to integers. You could fix this above with a schema change or by verifying their format. "
"Now, you can create the embeddings. This dataset is too large to send to Milvus in a single insert statement, but sending them one at a time would create unnecessary network traffic and add too much time. So, this code uses batches. You can play with the batch size to suit your individual needs and preferences."
]
},
{
@@ -374,7 +402,7 @@
"\n",
"First, you need a transformer to convert the user's search string to an embedding. For this, **embed_search** takes their criteria and passed it to the same transformer you used to populate Milvus.\n",
"\n",
"Milvus will return a set of movie ids. You need to use them to retrieve data about those ids from Redis. This happens in **collate_results**.\n",
"By setting the title and overview fields in the return set, you can simply print the result set for the user.\n",
"\n",
"Finally, **search_for_movies** performs the actual vector search, using the other two functions for support."
]

0 comments on commit 0716ec4

Please sign in to comment.