-
Notifications
You must be signed in to change notification settings - Fork 122
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Milvus-doc-bot
authored and
Milvus-doc-bot
committed
Sep 23, 2024
1 parent
c4eaf3b
commit 21c1839
Showing
85 changed files
with
3,432 additions
and
12 deletions.
There are no files selected for viewing
1 change: 1 addition & 0 deletions
1
localization/v2.4.x/site/de/embeddings/embed-with-instructor.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"codeList":["pip install --upgrade pymilvus\npip install \"pymilvus[model]\"\n","from pymilvus.model.dense import InstructorEmbeddingFunction\n\nef = InstructorEmbeddingFunction(\n model_name=\"hkunlp/instructor-xl\", # Defaults to `hkunlp/instructor-xl`\n query_instruction=\"Represent the question for retrieval:\",\n doc_instruction=\"Represent the document for retrieval:\"\n)\n","docs = [\n \"Artificial intelligence was founded as an academic discipline in 1956.\",\n \"Alan Turing was the first person to conduct substantial research in AI.\",\n \"Born in Maida Vale, London, Turing was raised in southern England.\",\n]\n\ndocs_embeddings = ef.encode_documents(docs)\n\n# Print embeddings\nprint(\"Embeddings:\", docs_embeddings)\n# Print dimension and shape of embeddings\nprint(\"Dim:\", ef.dim, docs_embeddings[0].shape)\n","Embeddings: [array([ 1.08575663e-02, 3.87877878e-03, 3.18090729e-02, -8.12458917e-02,\n -4.68971021e-02, -5.85585833e-02, -5.95418774e-02, -8.55880603e-03,\n -5.54775111e-02, -6.08020350e-02, 1.76202394e-02, 1.06648318e-02,\n -5.89960292e-02, -7.46861771e-02, 6.60329172e-03, -4.25189249e-02,\n ...\n -1.26921125e-02, 3.01475357e-02, 8.25323071e-03, -1.88470203e-02,\n 6.04814291e-03, -2.81618331e-02, 5.91602828e-03, 7.13866428e-02],\n dtype=float32)]\nDim: 768 (768,)\n","queries = [\"When was artificial intelligence founded\",\n \"Where was Alan Turing born?\"]\n\nquery_embeddings = ef.encode_queries(queries)\n\nprint(\"Embeddings:\", query_embeddings)\nprint(\"Dim\", ef.dim, query_embeddings[0].shape)\n","Embeddings: [array([ 1.21721877e-02, 1.88485277e-03, 3.01732980e-02, -8.10302645e-02,\n -6.13401756e-02, -3.98149453e-02, -5.18723316e-02, -6.76784338e-03,\n -6.59285188e-02, -5.38365729e-02, -5.13435388e-03, -2.49210224e-02,\n -5.74403182e-02, -7.03031123e-02, 6.63730130e-03, -3.42259370e-02,\n ...\n 7.36595877e-03, 2.85532661e-02, -1.55952033e-02, 2.13342719e-02,\n 1.51187545e-02, -2.82798670e-02, 2.69396193e-02, 6.16136603e-02],\n dtype=float32)]\nDim 768 (768,)\n"],"headingContent":"Instructor","anchorList":[{"label":"Ausbilder","href":"Instructor","type":1,"isActive":false}]} |
92 changes: 92 additions & 0 deletions
92
localization/v2.4.x/site/de/embeddings/embed-with-instructor.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
--- | ||
id: embed-with-instructor.md | ||
order: 10 | ||
summary: >- | ||
Dieser Artikel beschreibt die Verwendung der InstructorEmbeddingFunction zur | ||
Kodierung von Dokumenten und Abfragen mit dem Instructor Embedding Model. | ||
title: Ausbilder | ||
--- | ||
<h1 id="Instructor" class="common-anchor-header">Ausbilder<button data-href="#Instructor" class="anchor-icon" translate="no"> | ||
<svg translate="no" | ||
aria-hidden="true" | ||
focusable="false" | ||
height="20" | ||
version="1.1" | ||
viewBox="0 0 16 16" | ||
width="16" | ||
> | ||
<path | ||
fill="#0092E4" | ||
fill-rule="evenodd" | ||
d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z" | ||
></path> | ||
</svg> | ||
</button></h1><p><a href="https://instructor-embedding.github.io/">Instructor</a> ist ein anweisungsgesteuertes Texteinbettungsmodell, das Texteinbettungen für beliebige Aufgaben (z. B. Klassifizierung, Retrieval, Clustering, Textbewertung usw.) und Domänen (z. B. Wissenschaft, Finanzen usw.) generieren kann, indem es einfach die Aufgabenanweisung bereitstellt, ohne jegliche Feinabstimmung.</p> | ||
<p>Milvus lässt sich über die Klasse InstructorEmbeddingFunction mit den Einbettungsmodellen von Instructor integrieren. Diese Klasse bietet Methoden zur Kodierung von Dokumenten und Abfragen unter Verwendung der Instructor-Einbettungsmodelle und gibt die Einbettungen als dichte Vektoren zurück, die mit der Milvus-Indizierung kompatibel sind.</p> | ||
<p>Um diese Funktion zu nutzen, installieren Sie die notwendigen Abhängigkeiten:</p> | ||
<pre><code translate="no" class="language-python">pip install --upgrade pymilvus | ||
pip install <span class="hljs-string">"pymilvus[model]"</span> | ||
<button class="copy-code-btn"></button></code></pre> | ||
<p>Dann instanziieren Sie die InstructorEmbeddingFunction:</p> | ||
<pre><code translate="no" class="language-python"><span class="hljs-keyword">from</span> pymilvus.model.dense <span class="hljs-keyword">import</span> InstructorEmbeddingFunction | ||
|
||
ef = InstructorEmbeddingFunction( | ||
model_name=<span class="hljs-string">"hkunlp/instructor-xl"</span>, <span class="hljs-comment"># Defaults to `hkunlp/instructor-xl`</span> | ||
query_instruction=<span class="hljs-string">"Represent the question for retrieval:"</span>, | ||
doc_instruction=<span class="hljs-string">"Represent the document for retrieval:"</span> | ||
) | ||
<button class="copy-code-btn"></button></code></pre> | ||
<p><strong>Parameter</strong>:</p> | ||
<ul> | ||
<li><p><code translate="no">model_name</code> <em>(string</em>)</p> | ||
<p>Der Name des Mistral AI Einbettungsmodells, das für die Kodierung verwendet werden soll. Der Wert ist standardmäßig <code translate="no">hkunlp/instructor-xl</code>. Weitere Informationen finden Sie unter <a href="https://github.com/xlang-ai/instructor-embedding?tab=readme-ov-file#model-list">Model List</a>.</p></li> | ||
<li><p><code translate="no">query_instruction</code> <em>(Zeichenkette</em>)</p> | ||
<p>Aufgabenspezifische Anweisung, die das Modell anleitet, wie es eine Einbettung für eine Abfrage oder eine Frage generieren soll.</p></li> | ||
<li><p><code translate="no">doc_instruction</code> <em>(Zeichenkette</em>)</p> | ||
<p>Aufgabenspezifische Anweisung, die das Modell anleitet, eine Einbettung für ein Dokument zu erzeugen.</p></li> | ||
</ul> | ||
<p>Um Einbettungen für Dokumente zu erstellen, verwenden Sie die Methode <code translate="no">encode_documents()</code>:</p> | ||
<pre><code translate="no" class="language-python">docs = [ | ||
<span class="hljs-string">"Artificial intelligence was founded as an academic discipline in 1956."</span>, | ||
<span class="hljs-string">"Alan Turing was the first person to conduct substantial research in AI."</span>, | ||
<span class="hljs-string">"Born in Maida Vale, London, Turing was raised in southern England."</span>, | ||
] | ||
|
||
docs_embeddings = ef.encode_documents(docs) | ||
|
||
<span class="hljs-comment"># Print embeddings</span> | ||
<span class="hljs-built_in">print</span>(<span class="hljs-string">"Embeddings:"</span>, docs_embeddings) | ||
<span class="hljs-comment"># Print dimension and shape of embeddings</span> | ||
<span class="hljs-built_in">print</span>(<span class="hljs-string">"Dim:"</span>, ef.dim, docs_embeddings[<span class="hljs-number">0</span>].shape) | ||
<button class="copy-code-btn"></button></code></pre> | ||
<p>Die erwartete Ausgabe ist ähnlich wie die folgende:</p> | ||
<pre><code translate="no" class="language-python">Embeddings: [array([ <span class="hljs-number">1.08575663e-02</span>, <span class="hljs-number">3.87877878e-03</span>, <span class="hljs-number">3.18090729e-02</span>, <span class="hljs-number">-8.12458917e-02</span>, | ||
<span class="hljs-number">-4.68971021e-02</span>, <span class="hljs-number">-5.85585833e-02</span>, <span class="hljs-number">-5.95418774e-02</span>, <span class="hljs-number">-8.55880603e-03</span>, | ||
<span class="hljs-number">-5.54775111e-02</span>, <span class="hljs-number">-6.08020350e-02</span>, <span class="hljs-number">1.76202394e-02</span>, <span class="hljs-number">1.06648318e-02</span>, | ||
<span class="hljs-number">-5.89960292e-02</span>, <span class="hljs-number">-7.46861771e-02</span>, <span class="hljs-number">6.60329172e-03</span>, <span class="hljs-number">-4.25189249e-02</span>, | ||
... | ||
<span class="hljs-number">-1.26921125e-02</span>, <span class="hljs-number">3.01475357e-02</span>, <span class="hljs-number">8.25323071e-03</span>, <span class="hljs-number">-1.88470203e-02</span>, | ||
<span class="hljs-number">6.04814291e-03</span>, <span class="hljs-number">-2.81618331e-02</span>, <span class="hljs-number">5.91602828e-03</span>, <span class="hljs-number">7.13866428e-02</span>], | ||
dtype=<span class="hljs-type">float32</span>)] | ||
Dim: <span class="hljs-number">768</span> (<span class="hljs-number">768</span>,) | ||
<button class="copy-code-btn"></button></code></pre> | ||
<p>Um Einbettungen für Abfragen zu erstellen, verwenden Sie die Methode <code translate="no">encode_queries()</code>:</p> | ||
<pre><code translate="no" class="language-python">queries = [<span class="hljs-string">"When was artificial intelligence founded"</span>, | ||
<span class="hljs-string">"Where was Alan Turing born?"</span>] | ||
|
||
query_embeddings = ef.encode_queries(queries) | ||
|
||
<span class="hljs-built_in">print</span>(<span class="hljs-string">"Embeddings:"</span>, query_embeddings) | ||
<span class="hljs-built_in">print</span>(<span class="hljs-string">"Dim"</span>, ef.dim, query_embeddings[<span class="hljs-number">0</span>].shape) | ||
<button class="copy-code-btn"></button></code></pre> | ||
<p>Die erwartete Ausgabe ist ähnlich wie die folgende:</p> | ||
<pre><code translate="no" class="language-python">Embeddings: [array([ <span class="hljs-number">1.21721877e-02</span>, <span class="hljs-number">1.88485277e-03</span>, <span class="hljs-number">3.01732980e-02</span>, <span class="hljs-number">-8.10302645e-02</span>, | ||
<span class="hljs-number">-6.13401756e-02</span>, <span class="hljs-number">-3.98149453e-02</span>, <span class="hljs-number">-5.18723316e-02</span>, <span class="hljs-number">-6.76784338e-03</span>, | ||
<span class="hljs-number">-6.59285188e-02</span>, <span class="hljs-number">-5.38365729e-02</span>, <span class="hljs-number">-5.13435388e-03</span>, <span class="hljs-number">-2.49210224e-02</span>, | ||
<span class="hljs-number">-5.74403182e-02</span>, <span class="hljs-number">-7.03031123e-02</span>, <span class="hljs-number">6.63730130e-03</span>, <span class="hljs-number">-3.42259370e-02</span>, | ||
... | ||
<span class="hljs-number">7.36595877e-03</span>, <span class="hljs-number">2.85532661e-02</span>, <span class="hljs-number">-1.55952033e-02</span>, <span class="hljs-number">2.13342719e-02</span>, | ||
<span class="hljs-number">1.51187545e-02</span>, <span class="hljs-number">-2.82798670e-02</span>, <span class="hljs-number">2.69396193e-02</span>, <span class="hljs-number">6.16136603e-02</span>], | ||
dtype=<span class="hljs-type">float32</span>)] | ||
Dim <span class="hljs-number">768</span> (<span class="hljs-number">768</span>,) | ||
<button class="copy-code-btn"></button></code></pre> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"codeList":["pip install --upgrade pymilvus\npip install \"pymilvus[model]\"\n","from pymilvus.model.hybrid import MGTEEmbeddingFunction\n\nef = MGTEEmbeddingFunction(\n model_name=\"Alibaba-NLP/gte-multilingual-base\", # Defaults to `Alibaba-NLP/gte-multilingual-base`\n)\n","docs = [\n \"Artificial intelligence was founded as an academic discipline in 1956.\",\n \"Alan Turing was the first person to conduct substantial research in AI.\",\n \"Born in Maida Vale, London, Turing was raised in southern England.\",\n]\n\ndocs_embeddings = ef.encode_documents(docs)\n\n# Print embeddings\nprint(\"Embeddings:\", docs_embeddings)\n# Print dimension of embeddings\nprint(ef.dim)\n","Embeddings: {'dense': [tensor([-4.9149e-03, 1.6553e-02, -9.5524e-03, -2.1800e-02, 1.2075e-02,\n 1.8500e-02, -3.0632e-02, 5.5909e-02, 8.7365e-02, 1.8763e-02,\n 2.1708e-03, -2.7530e-02, -1.1523e-01, 6.5810e-03, -6.4674e-02,\n 6.7966e-02, 1.3005e-01, 1.1942e-01, -1.2174e-02, -4.0426e-02,\n ...\n 2.0129e-02, -2.3657e-02, 2.2626e-02, 2.1858e-02, -1.9181e-02,\n 6.0706e-02, -2.0558e-02, -4.2050e-02], device='mps:0')], \n 'sparse': <Compressed Sparse Row sparse array of dtype 'float64'\n with 41 stored elements and shape (3, 250002)>}\n\n{'dense': 768, 'sparse': 250002}\n","queries = [\"When was artificial intelligence founded\",\n \"Where was Alan Turing born?\"]\n\nquery_embeddings = ef.encode_queries(queries)\n\nprint(\"Embeddings:\", query_embeddings)\nprint(ef.dim)\n","Embeddings: {'dense': [tensor([ 6.5883e-03, -7.9415e-03, -3.3669e-02, -2.6450e-02, 1.4345e-02,\n 1.9612e-02, -8.1679e-02, 5.6361e-02, 6.9020e-02, 1.9827e-02,\n -9.2933e-03, -1.9995e-02, -1.0055e-01, -5.4053e-02, -8.5991e-02,\n 8.3004e-02, 1.0870e-01, 1.1565e-01, 2.1268e-02, -1.3782e-02,\n ...\n 3.2847e-02, -2.3751e-02, 3.4475e-02, 5.3623e-02, -3.3894e-02,\n 7.9408e-02, 8.2720e-03, -2.3459e-02], device='mps:0')], \n 'sparse': <Compressed Sparse Row sparse array of dtype 'float64'\n with 13 stored elements and shape (2, 250002)>}\n\n{'dense': 768, 'sparse': 250002}\n"],"headingContent":"mGTE","anchorList":[{"label":"mGTE","href":"mGTE","type":1,"isActive":false}]} |
Oops, something went wrong.