Skip to content

Commit

Permalink
Generate en docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Milvus-doc-bot authored and Milvus-doc-bot committed Sep 23, 2024
1 parent c4eaf3b commit 21c1839
Show file tree
Hide file tree
Showing 85 changed files with 3,432 additions and 12 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"codeList":["pip install --upgrade pymilvus\npip install \"pymilvus[model]\"\n","from pymilvus.model.dense import InstructorEmbeddingFunction\n\nef = InstructorEmbeddingFunction(\n model_name=\"hkunlp/instructor-xl\", # Defaults to `hkunlp/instructor-xl`\n query_instruction=\"Represent the question for retrieval:\",\n doc_instruction=\"Represent the document for retrieval:\"\n)\n","docs = [\n \"Artificial intelligence was founded as an academic discipline in 1956.\",\n \"Alan Turing was the first person to conduct substantial research in AI.\",\n \"Born in Maida Vale, London, Turing was raised in southern England.\",\n]\n\ndocs_embeddings = ef.encode_documents(docs)\n\n# Print embeddings\nprint(\"Embeddings:\", docs_embeddings)\n# Print dimension and shape of embeddings\nprint(\"Dim:\", ef.dim, docs_embeddings[0].shape)\n","Embeddings: [array([ 1.08575663e-02, 3.87877878e-03, 3.18090729e-02, -8.12458917e-02,\n -4.68971021e-02, -5.85585833e-02, -5.95418774e-02, -8.55880603e-03,\n -5.54775111e-02, -6.08020350e-02, 1.76202394e-02, 1.06648318e-02,\n -5.89960292e-02, -7.46861771e-02, 6.60329172e-03, -4.25189249e-02,\n ...\n -1.26921125e-02, 3.01475357e-02, 8.25323071e-03, -1.88470203e-02,\n 6.04814291e-03, -2.81618331e-02, 5.91602828e-03, 7.13866428e-02],\n dtype=float32)]\nDim: 768 (768,)\n","queries = [\"When was artificial intelligence founded\",\n \"Where was Alan Turing born?\"]\n\nquery_embeddings = ef.encode_queries(queries)\n\nprint(\"Embeddings:\", query_embeddings)\nprint(\"Dim\", ef.dim, query_embeddings[0].shape)\n","Embeddings: [array([ 1.21721877e-02, 1.88485277e-03, 3.01732980e-02, -8.10302645e-02,\n -6.13401756e-02, -3.98149453e-02, -5.18723316e-02, -6.76784338e-03,\n -6.59285188e-02, -5.38365729e-02, -5.13435388e-03, -2.49210224e-02,\n -5.74403182e-02, -7.03031123e-02, 6.63730130e-03, -3.42259370e-02,\n ...\n 7.36595877e-03, 2.85532661e-02, -1.55952033e-02, 2.13342719e-02,\n 1.51187545e-02, -2.82798670e-02, 2.69396193e-02, 6.16136603e-02],\n dtype=float32)]\nDim 768 (768,)\n"],"headingContent":"Instructor","anchorList":[{"label":"Ausbilder","href":"Instructor","type":1,"isActive":false}]}
92 changes: 92 additions & 0 deletions localization/v2.4.x/site/de/embeddings/embed-with-instructor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
id: embed-with-instructor.md
order: 10
summary: >-
Dieser Artikel beschreibt die Verwendung der InstructorEmbeddingFunction zur
Kodierung von Dokumenten und Abfragen mit dem Instructor Embedding Model.
title: Ausbilder
---
<h1 id="Instructor" class="common-anchor-header">Ausbilder<button data-href="#Instructor" class="anchor-icon" translate="no">
<svg translate="no"
aria-hidden="true"
focusable="false"
height="20"
version="1.1"
viewBox="0 0 16 16"
width="16"
>
<path
fill="#0092E4"
fill-rule="evenodd"
d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"
></path>
</svg>
</button></h1><p><a href="https://instructor-embedding.github.io/">Instructor</a> ist ein anweisungsgesteuertes Texteinbettungsmodell, das Texteinbettungen für beliebige Aufgaben (z. B. Klassifizierung, Retrieval, Clustering, Textbewertung usw.) und Domänen (z. B. Wissenschaft, Finanzen usw.) generieren kann, indem es einfach die Aufgabenanweisung bereitstellt, ohne jegliche Feinabstimmung.</p>
<p>Milvus lässt sich über die Klasse InstructorEmbeddingFunction mit den Einbettungsmodellen von Instructor integrieren. Diese Klasse bietet Methoden zur Kodierung von Dokumenten und Abfragen unter Verwendung der Instructor-Einbettungsmodelle und gibt die Einbettungen als dichte Vektoren zurück, die mit der Milvus-Indizierung kompatibel sind.</p>
<p>Um diese Funktion zu nutzen, installieren Sie die notwendigen Abhängigkeiten:</p>
<pre><code translate="no" class="language-python">pip install --upgrade pymilvus
pip install <span class="hljs-string">&quot;pymilvus[model]&quot;</span>
<button class="copy-code-btn"></button></code></pre>
<p>Dann instanziieren Sie die InstructorEmbeddingFunction:</p>
<pre><code translate="no" class="language-python"><span class="hljs-keyword">from</span> pymilvus.model.dense <span class="hljs-keyword">import</span> InstructorEmbeddingFunction

ef = InstructorEmbeddingFunction(
model_name=<span class="hljs-string">&quot;hkunlp/instructor-xl&quot;</span>, <span class="hljs-comment"># Defaults to `hkunlp/instructor-xl`</span>
query_instruction=<span class="hljs-string">&quot;Represent the question for retrieval:&quot;</span>,
doc_instruction=<span class="hljs-string">&quot;Represent the document for retrieval:&quot;</span>
)
<button class="copy-code-btn"></button></code></pre>
<p><strong>Parameter</strong>:</p>
<ul>
<li><p><code translate="no">model_name</code> <em>(string</em>)</p>
<p>Der Name des Mistral AI Einbettungsmodells, das für die Kodierung verwendet werden soll. Der Wert ist standardmäßig <code translate="no">hkunlp/instructor-xl</code>. Weitere Informationen finden Sie unter <a href="https://github.com/xlang-ai/instructor-embedding?tab=readme-ov-file#model-list">Model List</a>.</p></li>
<li><p><code translate="no">query_instruction</code> <em>(Zeichenkette</em>)</p>
<p>Aufgabenspezifische Anweisung, die das Modell anleitet, wie es eine Einbettung für eine Abfrage oder eine Frage generieren soll.</p></li>
<li><p><code translate="no">doc_instruction</code> <em>(Zeichenkette</em>)</p>
<p>Aufgabenspezifische Anweisung, die das Modell anleitet, eine Einbettung für ein Dokument zu erzeugen.</p></li>
</ul>
<p>Um Einbettungen für Dokumente zu erstellen, verwenden Sie die Methode <code translate="no">encode_documents()</code>:</p>
<pre><code translate="no" class="language-python">docs = [
<span class="hljs-string">&quot;Artificial intelligence was founded as an academic discipline in 1956.&quot;</span>,
<span class="hljs-string">&quot;Alan Turing was the first person to conduct substantial research in AI.&quot;</span>,
<span class="hljs-string">&quot;Born in Maida Vale, London, Turing was raised in southern England.&quot;</span>,
]

docs_embeddings = ef.encode_documents(docs)

<span class="hljs-comment"># Print embeddings</span>
<span class="hljs-built_in">print</span>(<span class="hljs-string">&quot;Embeddings:&quot;</span>, docs_embeddings)
<span class="hljs-comment"># Print dimension and shape of embeddings</span>
<span class="hljs-built_in">print</span>(<span class="hljs-string">&quot;Dim:&quot;</span>, ef.dim, docs_embeddings[<span class="hljs-number">0</span>].shape)
<button class="copy-code-btn"></button></code></pre>
<p>Die erwartete Ausgabe ist ähnlich wie die folgende:</p>
<pre><code translate="no" class="language-python">Embeddings: [array([ <span class="hljs-number">1.08575663e-02</span>, <span class="hljs-number">3.87877878e-03</span>, <span class="hljs-number">3.18090729e-02</span>, <span class="hljs-number">-8.12458917e-02</span>,
<span class="hljs-number">-4.68971021e-02</span>, <span class="hljs-number">-5.85585833e-02</span>, <span class="hljs-number">-5.95418774e-02</span>, <span class="hljs-number">-8.55880603e-03</span>,
<span class="hljs-number">-5.54775111e-02</span>, <span class="hljs-number">-6.08020350e-02</span>, <span class="hljs-number">1.76202394e-02</span>, <span class="hljs-number">1.06648318e-02</span>,
<span class="hljs-number">-5.89960292e-02</span>, <span class="hljs-number">-7.46861771e-02</span>, <span class="hljs-number">6.60329172e-03</span>, <span class="hljs-number">-4.25189249e-02</span>,
...
<span class="hljs-number">-1.26921125e-02</span>, <span class="hljs-number">3.01475357e-02</span>, <span class="hljs-number">8.25323071e-03</span>, <span class="hljs-number">-1.88470203e-02</span>,
<span class="hljs-number">6.04814291e-03</span>, <span class="hljs-number">-2.81618331e-02</span>, <span class="hljs-number">5.91602828e-03</span>, <span class="hljs-number">7.13866428e-02</span>],
dtype=<span class="hljs-type">float32</span>)]
Dim: <span class="hljs-number">768</span> (<span class="hljs-number">768</span>,)
<button class="copy-code-btn"></button></code></pre>
<p>Um Einbettungen für Abfragen zu erstellen, verwenden Sie die Methode <code translate="no">encode_queries()</code>:</p>
<pre><code translate="no" class="language-python">queries = [<span class="hljs-string">&quot;When was artificial intelligence founded&quot;</span>,
<span class="hljs-string">&quot;Where was Alan Turing born?&quot;</span>]

query_embeddings = ef.encode_queries(queries)

<span class="hljs-built_in">print</span>(<span class="hljs-string">&quot;Embeddings:&quot;</span>, query_embeddings)
<span class="hljs-built_in">print</span>(<span class="hljs-string">&quot;Dim&quot;</span>, ef.dim, query_embeddings[<span class="hljs-number">0</span>].shape)
<button class="copy-code-btn"></button></code></pre>
<p>Die erwartete Ausgabe ist ähnlich wie die folgende:</p>
<pre><code translate="no" class="language-python">Embeddings: [array([ <span class="hljs-number">1.21721877e-02</span>, <span class="hljs-number">1.88485277e-03</span>, <span class="hljs-number">3.01732980e-02</span>, <span class="hljs-number">-8.10302645e-02</span>,
<span class="hljs-number">-6.13401756e-02</span>, <span class="hljs-number">-3.98149453e-02</span>, <span class="hljs-number">-5.18723316e-02</span>, <span class="hljs-number">-6.76784338e-03</span>,
<span class="hljs-number">-6.59285188e-02</span>, <span class="hljs-number">-5.38365729e-02</span>, <span class="hljs-number">-5.13435388e-03</span>, <span class="hljs-number">-2.49210224e-02</span>,
<span class="hljs-number">-5.74403182e-02</span>, <span class="hljs-number">-7.03031123e-02</span>, <span class="hljs-number">6.63730130e-03</span>, <span class="hljs-number">-3.42259370e-02</span>,
...
<span class="hljs-number">7.36595877e-03</span>, <span class="hljs-number">2.85532661e-02</span>, <span class="hljs-number">-1.55952033e-02</span>, <span class="hljs-number">2.13342719e-02</span>,
<span class="hljs-number">1.51187545e-02</span>, <span class="hljs-number">-2.82798670e-02</span>, <span class="hljs-number">2.69396193e-02</span>, <span class="hljs-number">6.16136603e-02</span>],
dtype=<span class="hljs-type">float32</span>)]
Dim <span class="hljs-number">768</span> (<span class="hljs-number">768</span>,)
<button class="copy-code-btn"></button></code></pre>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"codeList":["pip install --upgrade pymilvus\npip install \"pymilvus[model]\"\n","from pymilvus.model.hybrid import MGTEEmbeddingFunction\n\nef = MGTEEmbeddingFunction(\n model_name=\"Alibaba-NLP/gte-multilingual-base\", # Defaults to `Alibaba-NLP/gte-multilingual-base`\n)\n","docs = [\n \"Artificial intelligence was founded as an academic discipline in 1956.\",\n \"Alan Turing was the first person to conduct substantial research in AI.\",\n \"Born in Maida Vale, London, Turing was raised in southern England.\",\n]\n\ndocs_embeddings = ef.encode_documents(docs)\n\n# Print embeddings\nprint(\"Embeddings:\", docs_embeddings)\n# Print dimension of embeddings\nprint(ef.dim)\n","Embeddings: {'dense': [tensor([-4.9149e-03, 1.6553e-02, -9.5524e-03, -2.1800e-02, 1.2075e-02,\n 1.8500e-02, -3.0632e-02, 5.5909e-02, 8.7365e-02, 1.8763e-02,\n 2.1708e-03, -2.7530e-02, -1.1523e-01, 6.5810e-03, -6.4674e-02,\n 6.7966e-02, 1.3005e-01, 1.1942e-01, -1.2174e-02, -4.0426e-02,\n ...\n 2.0129e-02, -2.3657e-02, 2.2626e-02, 2.1858e-02, -1.9181e-02,\n 6.0706e-02, -2.0558e-02, -4.2050e-02], device='mps:0')], \n 'sparse': <Compressed Sparse Row sparse array of dtype 'float64'\n with 41 stored elements and shape (3, 250002)>}\n\n{'dense': 768, 'sparse': 250002}\n","queries = [\"When was artificial intelligence founded\",\n \"Where was Alan Turing born?\"]\n\nquery_embeddings = ef.encode_queries(queries)\n\nprint(\"Embeddings:\", query_embeddings)\nprint(ef.dim)\n","Embeddings: {'dense': [tensor([ 6.5883e-03, -7.9415e-03, -3.3669e-02, -2.6450e-02, 1.4345e-02,\n 1.9612e-02, -8.1679e-02, 5.6361e-02, 6.9020e-02, 1.9827e-02,\n -9.2933e-03, -1.9995e-02, -1.0055e-01, -5.4053e-02, -8.5991e-02,\n 8.3004e-02, 1.0870e-01, 1.1565e-01, 2.1268e-02, -1.3782e-02,\n ...\n 3.2847e-02, -2.3751e-02, 3.4475e-02, 5.3623e-02, -3.3894e-02,\n 7.9408e-02, 8.2720e-03, -2.3459e-02], device='mps:0')], \n 'sparse': <Compressed Sparse Row sparse array of dtype 'float64'\n with 13 stored elements and shape (2, 250002)>}\n\n{'dense': 768, 'sparse': 250002}\n"],"headingContent":"mGTE","anchorList":[{"label":"mGTE","href":"mGTE","type":1,"isActive":false}]}
Loading

0 comments on commit 21c1839

Please sign in to comment.