diff --git a/content/posts/2023-10-27-haystack-series-rag.md b/content/posts/2023-10-27-haystack-series-rag.md
index 8e3a1e22..44d1f162 100644
--- a/content/posts/2023-10-27-haystack-series-rag.md
+++ b/content/posts/2023-10-27-haystack-series-rag.md
@@ -7,19 +7,19 @@ series: ["Haystack 2.0 Series"]
featuredImage: "/posts/2023-10-27-haystack-series-rag/cover.png"
---
-Since the start of this series, one use case that I constantly brought up is Retrieval Augmented Generation, or RAG for short.
+*Last updated: 21/11/2023*
-RAG is quickly becoming an essential technique to make LLMs more reliable and effective at answering any question, regardless of how specific. To stay relevant in today's NLP landscape, Haystack must enable it.
+Retrieval Augmented Generation (RAG) is quickly becoming an essential technique to make LLMs more reliable and effective at answering any question, regardless of how specific. To stay relevant in today's NLP landscape, Haystack must enable it.
Let's see how to build such applications with Haystack 2.0, from a direct call to an LLM to a fully-fledged, production-ready RAG pipeline that scales. At the end of this post, we will have an application that can answer questions about world countries based on data stored in a private database. At that point, the knowledge of the LLM will be only limited by the content of our data store, and all of this can be accomplished without fine-tuning language models.
{{< notice info >}}
-💡 *I recently gave a talk about RAG applications in Haystack 2.0, so if you prefer videos to blog posts, you can find the recording [here](http://zansara.dev/talks/2023-10-12-office-hours-rag-pipelines/). Keep in mind that the code might be slightly outdated.*
+💡 *I recently gave a talk about RAG applications in Haystack 2.0, so if you prefer videos to blog posts, you can find the recording [here](https://zansara.dev/talks/2023-10-12-office-hours-rag-pipelines/). Keep in mind that the code might be slightly outdated.*
{{< /notice >}}
-# What is RAG?
+## What is RAG?
The idea of Retrieval Augmented Generation was first defined in a [paper](https://arxiv.org/abs/2005.11401) by Meta in 2020. It was designed to solve a few of the inherent limitations of seq2seq models (language models that, given a sentence, can finish writing it for you), such as:
@@ -30,7 +30,7 @@ The idea of Retrieval Augmented Generation was first defined in a [paper](https:
RAG solves these issues of "grounding" the LLM to reality by providing some relevant, up-to-date, and trusted information to the model together with the question. In this way, the LLM doesn't need to draw information from its internal knowledge, but it can base its replies on the snippets provided by the user.
-![RAG Paper diagram](/posts/2023-10-27-haystack-series-rag/rag-paper-image.png)
+![RAG Paper diagram](/posts/2023-10-27-haystack-series-rag/rag-paper-image.png "A visual representation of RAG from the original paper")
As you can see in the image above (taken directly from the original paper), a system such as RAG is made of two parts: one that finds text snippets that are relevant to the question asked by the user and a generative model, usually an LLM, that rephrases the snippets into a coherent answer for the question.
@@ -38,25 +38,24 @@ Let's build one of these with Haystack 2.0!
{{< notice info >}}
-💡 *Do you want to see this code in action? Check out the Colab notebook [here](https://colab.research.google.com/drive/1vX_2WIRuqsXmoPMsJbqE45SYn21yuDjf?usp=drive_link) or the [gist](https://gist.github.com/ZanSara/cad6f772d3a894058db34f566e2c4042).*
+💡 *Do you want to see this code in action? Check out the Colab notebook [here](https://colab.research.google.com/drive/1FkDNS3hTO4oPXHFbXQcldls0kf-KTq-r?usp=sharing) or the gist [here](https://gist.github.com/ZanSara/0af1c2ac6c71d0a723c179cc6ec1ac41)*.
{{< /notice >}}
-
{{< notice warning >}}
-⚠️ **Warning:** *This code was tested on `haystack-ai==0.88.0`. Haystack 2.0 is still unstable, so later versions might introduce breaking changes without notice until Haystack 2.0 is officially released. The concepts and components however stay the same.*
+⚠️ **Warning:** *This code was tested on `haystack-ai==0.149.0`. Haystack 2.0 is still unstable, so later versions might introduce breaking changes without notice until Haystack 2.0 is officially released. The concepts and components however stay the same.*
{{< /notice >}}
-# Generators: Haystack's LLM components
+## Generators: Haystack's LLM components
As every NLP framework that deserves its name, Haystack supports LLMs in different ways. The easiest way to query an LLM in Haystack 2.0 is through a Generator component: depending on which LLM and how you intend to query it (chat, text completion, etc...), you should pick the appropriate class.
-We're going to use ChatGPT for these examples, so the component we need is [`GPTGenerator`](https://github.com/deepset-ai/haystack/blob/main/haystack/preview/components/generators/openai/gpt.py). Here is all the code required to use it to query OpenAI's ChatGPT:
+We're going to use `gpt-3.5-turbo` (the model behind ChatGPT) for these examples, so the component we need is [`GPTGenerator`](https://github.com/deepset-ai/haystack/blob/main/haystack/preview/components/generators/openai.py). Here is all the code required to use it to query OpenAI's `gpt-3.5-turbo` :
```python
-from haystack.preview.components.generators.openai.gpt import GPTGenerator
+from haystack.preview.components.generators import GPTGenerator
generator = GPTGenerator(api_key=api_key)
generator.run(prompt="What's the official language of France?")
@@ -66,12 +65,12 @@ You can select your favorite OpenAI model by specifying a `model_name` at initia
Note that in this case, we're passing the API key to the component's constructor. This is unnecessary: `GPTGenerator` can read the value from the `OPENAI_API_KEY` environment variable and also from the `api_key` module variable of [`openai`'s SDK](https://github.com/openai/openai-python#usage).
-Right now, Haystack supports HuggingFace models through the [`HuggingFaceLocalGenerator`](https://github.com/deepset-ai/haystack/blob/f76fc04ed05df7b941c658ba85adbf1f87723153/haystack/preview/components/generators/hugging_face/hugging_face_local.py#L65) component, and many more LLMs are coming soon.
+Right now, Haystack supports HuggingFace models through the [`HuggingFaceLocalGenerator`](https://github.com/deepset-ai/haystack/blob/main/haystack/preview/components/generators/hugging_face_local.py) and [`HuggingFaceTGIGenerator`](https://github.com/deepset-ai/haystack/blob/main/haystack/preview/components/generators/hugging_face_tgi.py) components, and many more LLMs are coming soon.
-# PromptBuilder: structured prompts from templates
+## PromptBuilder: structured prompts from templates
-Let's imagine that our LLM-powered chatbot also comes with some pre-defined questions that the user can select instead of typing in full. For example, instead of asking them to type `What's the official language of France?`, we let them select `Tell me the official languages` from a list, and they simply need to type "France" (or "Wakanda" for a change - our chatbot needs some challenges too).
+Let's imagine that our LLM-powered application also comes with some pre-defined questions that the user can select instead of typing in full. For example, instead of asking them to type `What's the official language of France?`, we let them select `Tell me the official languages` from a list, and they simply need to type "France" (or "Wakanda" for a change - our chatbot needs some challenges too).
In this scenario, we have two pieces of the prompt: a variable (the country name, like "France") and a prompt template, which in this case is `"What's the official language of {{ country }}?"`
@@ -89,13 +88,13 @@ Note how we defined a variable, `country`, by wrapping its name in double curly
This syntax comes from [Jinja2](https://jinja.palletsprojects.com/en/3.0.x/intro/), a popular templating library for Python. If you have ever used Flask, Django, or Ansible, you will feel at home with `PromptBuilder`. Instead, if you never heard of any of these libraries, you can check out the [syntax](https://jinja.palletsprojects.com/en/3.0.x/templates/) on Jinja's documentation. Jinja has a powerful templating language and offers way more features than you'll ever need in prompt templates, ranging from simple if statements and for loops to object access through dot notation, nesting of templates, variables manipulation, macros, full-fledged import and encapsulation of templates, and more.
-# A Simple Generative Pipeline
+## A Simple Generative Pipeline
With these two components, we can assemble a minimal pipeline to see how they work together. Connecting them is trivial: `PromptBuilder` generates a `prompt` output, and `GPTGenerator` expects an input with the same name and type.
```python
from haystack.preview import Pipeline
-from haystack.preview.components.generators.openai.gpt import GPTGenerator
+from haystack.preview.components.generators import GPTGenerator
from haystack.preview.components.builders.prompt_builder import PromptBuilder
pipe = Pipeline()
@@ -111,7 +110,7 @@ Here is the pipeline graph:
![Simple LLM pipeline](/posts/2023-10-27-haystack-series-rag/simple-llm-pipeline.png)
-# Make the LLM cheat
+## Make the LLM cheat
Building the Generative part of a RAG application was very simple! So far, we only provided the question to the LLM, but no information to base its answers on. Nowadays, LLMs possess a lot of general knowledge, so questions about famous countries such as France or Germany are easy for them to reply to correctly. However, when using an app about world countries, some users may be interested in knowing more about obscure or defunct microstates that don't exist anymore. In this case, ChatGPT is unlikely to provide the correct answer without any help.
@@ -214,21 +213,15 @@ pipe.run({
![PromptBuilder with two inputs pipeline](/posts/2023-10-27-haystack-series-rag/double-variable-promptbuilder-pipeline.png)
-# Retrieving the context
+## Retrieving the context
For now, we've been playing with prompts, but the fundamental question remains unanswered: where do we get the correct text snippet for the question the user is asking? We can't expect such information as part of the input: we need our system to be able to fetch this information independently, based uniquely on the query.
-Thankfully, retrieving relevant information from large [corpora](https://en.wikipedia.org/wiki/Text_corpus) (a technical term for extensive collections of data, usually text) is a task that Haystack excels at since its inception: the components that perform this task are called [Retrievers](https://docs.haystack.deepset.ai/docs/retriever).
-
-{{< notice warning >}}
-
-*At the time of writing, the [documentation](https://docs.haystack.deepset.ai/docs/retriever) still refers to the Haystack 1.x component. The high-level concepts are unchanged, but the code is very different.*
-
-{{< /notice >}}
+Thankfully, retrieving relevant information from large [corpora](https://en.wikipedia.org/wiki/Text_corpus) (a technical term for extensive collections of data, usually text) is a task that Haystack excels at since its inception: the components that perform this task are called [Retrievers](https://docs.haystack.deepset.ai/v2.0/docs/retrievers).
Retrieval can be performed on different data sources: to begin, let's assume we're searching for data in a local database, which is the use case that most Retrievers are geared towards.
-Let's create a small local database to store information about some European countries. Haystack offers a neat object for these small-scale demos: `InMemoryDocumentStore`. This document store is little more than a Python dictionary under the hood but provides the same exact API as much more powerful data stores and vector stores, such as [Elasticsearch](https://github.com/deepset-ai/haystack-core-integrations/pull/41) or [ChromaDB](https://haystack.deepset.ai/integrations/chroma-documentstore). Keep in mind that the object is called "Document Store" and not simply "datastore" because what it stores is Haystack's Document objects: a small dataclass that helps other components make sense of the data that they receive.
+Let's create a small local database to store information about some European countries. Haystack offers a neat object for these small-scale demos: `InMemoryDocumentStore`. This document store is little more than a Python dictionary under the hood but provides the same exact API as much more powerful data stores and vector stores, such as [Elasticsearch](https://github.com/deepset-ai/haystack-core-integrations/tree/main/document_stores/elasticsearch) or [ChromaDB](https://haystack.deepset.ai/integrations/chroma-documentstore). Keep in mind that the object is called "Document Store" and not simply "datastore" because what it stores is Haystack's Document objects: a small dataclass that helps other components make sense of the data that they receive.
So, let's initialize an `InMemoryDocumentStore` and write some `Documents` into it.
@@ -237,20 +230,20 @@ from haystack.preview.dataclasses import Document
from haystack.preview.document_stores import InMemoryDocumentStore
documents = [
- Document(text="German is the the official language of Germany."),
- Document(text="The capital of France is Paris, and its official language is French."),
- Document(text="Italy recognizes a few official languages, but the most widespread one is Italian."),
- Document(text="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")
+ Document(content="German is the the official language of Germany."),
+ Document(content="The capital of France is Paris, and its official language is French."),
+ Document(content="Italy recognizes a few official languages, but the most widespread one is Italian."),
+ Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")
]
docstore = InMemoryDocumentStore()
docstore.write_documents(documents=documents)
docstore.filter_documents()
# returns [
-# Document(text="German is the the official language of Germany."),
-# Document(text="The capital of France is Paris, and its official language is French."),
-# Document(text="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea."),
-# Document(text="Italy recognizes a few official languages, but the most widespread one is Italian."),
+# Document(content="German is the the official language of Germany."),
+# Document(content="The capital of France is Paris, and its official language is French."),
+# Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea."),
+# Document(content="Italy recognizes a few official languages, but the most widespread one is Italian."),
# ]
```
@@ -259,23 +252,23 @@ Once the document store is set up, we can initialize a retriever. In Haystack 2.
Let's start with the BM25-based retriever, which is slightly easier to set up. Let's first use it in isolation to see how it behaves.
```python
-from haystack.preview.components.retrievers.memory_bm25_retriever import MemoryBM25Retriever
+from haystack.preview.components.retrievers import InMemoryBM25Retriever
retriever = InMemoryBM25Retriever(document_store=docstore)
retriever.run(query="Rose Island", top_k=1)
# returns [
-# Document(text="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")
+# Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")
# ]
retriever.run(query="Rose Island", top_k=3)
# returns [
-# Document(text="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")
-# Document(text="Italy recognizes a few official languages, but the most widespread one is Italian."),
-# Document(text="The capital of France is Paris, and its official language is French."),
+# Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")
+# Document(content="Italy recognizes a few official languages, but the most widespread one is Italian."),
+# Document(content="The capital of France is Paris, and its official language is French."),
# ]
```
-We see that `InMemoryBM25Retriever` accepts a few parameters. `query` is the question we want to find relevant documents for. In the case of BM25, the algorithm only searches for exact word matches. The resulting retriever is very fast, but it doesn't fail gracefully: it can't handle spelling mistakes, synonyms, or descriptions of an entity. For example, documents containing the word "cat" would be considered irrelevant against a query such as "felines".
+We see that [`InMemoryBM25Retriever`](https://docs.haystack.deepset.ai/v2.0/reference/retriever-api#inmemorybm25retriever) accepts a few parameters. `query` is the question we want to find relevant documents for. In the case of BM25, the algorithm only searches for exact word matches. The resulting retriever is very fast, but it doesn't fail gracefully: it can't handle spelling mistakes, synonyms, or descriptions of an entity. For example, documents containing the word "cat" would be considered irrelevant against a query such as "felines".
`top_k` controls the number of documents returned. We can see that in the first example, only one document is returned, the correct one. In the second, where `top_k = 3`, the retriever is forced to return three documents even if just one is relevant, so it picks the other two randomly. Although the behavior is not optimal, BM25 guarantees that if there is a document that is relevant to the query, it will be in the first position, so for now, we can use it with `top_k=1`.
@@ -283,7 +276,7 @@ Retrievers also accepts a `filters` parameter, which lets you pre-filter the doc
Let's now make use of this new component in our Pipeline.
-# Our first RAG Pipeline
+## Our first RAG Pipeline
The retriever does not return a single string but a list of Documents. How do we put the content of these objects into our prompt template?
@@ -294,13 +287,13 @@ Given the following information, answer the question.
Context:
{% for document in documents %}
- {{ document.text }}
+ {{ document.content }}
{% endfor %}
Question: What's the official language of {{ country }}?
```
-Notice how, despite the slightly alien syntax for a Python programmer, what the template does is reasonably evident: it iterates over the documents and, for each of them, renders their `text` field.
+Notice how, despite the slightly alien syntax for a Python programmer, what the template does is reasonably evident: it iterates over the documents and, for each of them, renders their `content` field.
With all these pieces set up, we can finally put them all together.
@@ -310,7 +303,7 @@ Given the following information, answer the question.
Context:
{% for document in documents %}
- {{ document.text }}
+ {{ document.content }}
{% endfor %}
Question: What's the official language of {{ country }}?
@@ -324,6 +317,7 @@ pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")
pipe.run({
+ "retriever": {"query": country},
"prompt_builder": {
"country": "the Republic of Rose Island"
}
@@ -342,7 +336,7 @@ pipe.run({
Congratulations! We've just built our first, true-to-its-name RAG Pipeline.
-# Scaling up: Elasticsearch
+## Scaling up: Elasticsearch
So, we now have our running prototype. What does it take to scale this system up for production workloads?
@@ -350,11 +344,6 @@ Of course, scaling up a system to production readiness is no simple task that ca
`InMemoryDocumentStore` is clearly a toy implementation: Haystack supports much more performant document stores such as [Elasticsearch](https://haystack.deepset.ai/integrations/elasticsearch-document-store), [ChromaDB](https://haystack.deepset.ai/integrations/chroma-documentstore) and [Marqo](https://haystack.deepset.ai/integrations/marqo-document-store). Since we have built our app with a BM25 retriever, let's select Elasticsearch as our production-ready document store of choice.
-{{< notice warning >}}
-
-⚠️ **Warning:** *at the time of writing, Elasticsearch support for Haystack 2.0 is still [unstable](https://github.com/deepset-ai/haystack-core-integrations/pull/41). Keep an eye on the [integrations repository](https://github.com/deepset-ai/haystack-core-integrations) for updates about its upcoming release. To know how to make it work today, check out [the Colab notebook](https://colab.research.google.com/drive/1vX_2WIRuqsXmoPMsJbqE45SYn21yuDjf?usp=drive_link) or the [gist](https://gist.github.com/ZanSara/cad6f772d3a894058db34f566e2c4042).*
-
-{{< /notice >}}
How do we use Elasticsearch on our pipeline? All it takes is to swap out `InMemoryDocumentStore` and `InMemoryBM25Retriever` with their Elasticsearch counterparts, which offer nearly identical APIs.
@@ -379,10 +368,10 @@ Now, let's write again our four documents into the store. In this case, we speci
```python
from haystack.preview.document_stores import DuplicatePolicy
documents = [
- Document(text="German is the the official language of Germany."),
- Document(text="The capital of France is Paris, and its official language is French."),
- Document(text="Italy recognizes a few official languages, but the most widespread one is Italian."),
- Document(text="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")
+ Document(content="German is the the official language of Germany."),
+ Document(content="The capital of France is Paris, and its official language is French."),
+ Document(content="Italy recognizes a few official languages, but the most widespread one is Italian."),
+ Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")
]
docstore.write_documents(documents=documents, policy=DuplicatePolicy.OVERWRITE)
```
@@ -397,7 +386,7 @@ Given the following information, answer the question.
Context:
{% for document in documents %}
- {{ document.text }}
+ {{ document.content }}
{% endfor %}
Question: What's the official language of {{ country }}?
@@ -430,7 +419,7 @@ pipe.run({
That's it! We're now running the same pipeline over a production-ready Elasticsearch instance.
-# Wrapping up
+## Wrapping up
In this post, we've detailed some fundamental components that make RAG applications possible with Haystack: Generators, the PromptBuilder, and Retrievers. We've seen how they can all be used in isolation and how you can make Pipelines out of them to achieve the same goal. Last, we've experimented with some of the (very early!) features that make Haystack 2.0 production-ready and easy to scale up from a simple demo with minimal changes.