I am a Python and Gen AI software engineer based in Portugal, currently working as Lead AI Engineer for
+Kwal on voice agents and conversation analysis with LLMs for the recruitment industry.
+I am also open to collaborations on short-term projects and to give talks on the topic of AI, LLMs, and
+related fields.
+
Until recently I worked for deepset, a German startup working on NLP
+since “before it was cool”, where I was the
+main contributor of
+Haystack, their open-source framework for building highly customizable,
+production-ready NLP and LLM applications.
+
Previously I worked for a few years at CERN, where I began my software engineering career.
+During my time there I had the privilege of driving one major decision to migrate the graphical
+interface’s software of the accelerator’s control systems from Java to PyQt, and then of helping a client department
+migrate to this stack. I have also worked on other infrastructure and data pipelines, some of
+which resulted in publication.
+
Outside of work I have too many pet projects to follow up with than the free time I can dedicate to them.
+I love science fiction and space exploration, I enjoy challenging hikes in nature and learning languages, as much as
+such process can be enjoyed.
+
I speak native Italian and fluent English, but I’ve also learned French during my time at CERN, I’m studying Hungarian
+for family reasons, and Portuguese because I currently live there. I still can understand some Russian and I have a
+very basic understanding of Chinese, both from my teenage and university years.
+
You can find my latest CV here. Check out also my projects,
+my publications and my talks. If you prefer newsletters you can find my posts also on
+Substack.
+
The best way to get in touch with me is through email or
+LinkedIn.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/archetypes/default.md b/archetypes/default.md
deleted file mode 100644
index 00e77bd7..00000000
--- a/archetypes/default.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-title: "{{ replace .Name "-" " " | title }}"
-date: {{ .Date }}
-draft: true
----
-
diff --git a/config.toml b/config.toml
deleted file mode 100644
index 5984ac32..00000000
--- a/config.toml
+++ /dev/null
@@ -1,124 +0,0 @@
-baseurl = "https://www.zansara.dev"
-title = "Sara Zan"
-theme = "hugo-coder"
-languagecode = "en"
-defaultcontentlanguage = "en"
-buildFuture = true
-
-paginate = 20
-
-[markup.highlight]
-style = "github-dark"
-
-[markup]
- [markup.goldmark]
- [markup.goldmark.renderer]
- unsafe = true
-
-[params]
- navbar = "Home"
- author = "Sara Zan"
- info = """
- Lead AI Engineer at [Kwal](https://www.kwal.ai/),
- [main contributor](https://github.com/deepset-ai/haystack/graphs/contributors) of [Haystack](https://haystack.deepset.ai/),
- former [CERN](https://home.cern/) employee.
- I'm also an opinionated sci-fi reader, hiker, tinkerer and somewhat polyglot. Currently busy trying to learn Portuguese and Hungarian at the same time.
- """
- description = "Sara Zan's Personal Website"
- keywords = "blog,developer,personal,python,llm,nlp,swe,software-engineering,open-source"
- avatarurl = "/me/avatar-color.svg"
-
- #gravatar = "john.doe@example.com"
-
- faviconSVG = "/favicon.svg"
- favicon_32 = "/favicon.png"
- touchicon = "/favicon.png"
- # favicon_16 = "/img/favicon-16x16.png"
-
- since = 2023
-
- enableTwemoji = true
-
- colorScheme = "auto"
- hidecolorschemetoggle = false
-
-# Social links
-[[params.social]]
- name = "Github"
- icon = "fa fa-github fa-2x"
- weight = 1
- url = "https://github.com/ZanSara/"
-[[params.social]]
- name = "Linkedin"
- icon = "fa fa-linkedin fa-2x"
- weight = 1
- url = "https://www.linkedin.com/in/sarazanzottera"
-[[params.social]]
- name = "Twitter"
- icon = "fa fa-twitter fa-2x"
- weight = 1
- url = "https://twitter.com/zansara_dev"
-[[params.social]]
- name = "Mastodon"
- icon = "fa fa-brands fa-mastodon fa-2x"
- weight = 1
- url = "https://mastodon.social/@zansara"
-[[params.social]]
- name = "Y Combinator"
- icon = "fa fa-y-combinator-square fa-2x"
- weight = 1
- url = "https://news.ycombinator.com/user?id=zansara"
-[[params.social]]
- name = "Stackoverflow"
- icon = "fa fa-stack-overflow fa-2x"
- weight = 1
- url = "https://stackoverflow.com/users/19108168/zansara"
-[[params.social]]
- name = "Google Scholar"
- icon = "fa fa-graduation-cap fa-2x"
- weight = 1
- url = "https://scholar.google.com/citations?hl=en&user=IsXR9HAAAAAJ"
-[[params.social]]
- name = "Email"
- icon = "fa fa-envelope fa-2x"
- weight = 1
- url = "mailto:blog@zansara.dev"
-[[params.social]]
- name = "RSS"
- icon = "fa fa-rss fa-2x"
- weight = 1
- url = "/index.xml"
-[[params.social]]
- name = "Substack"
- icon = "fa fa-bookmark fa-2x"
- weight = 1
- url = "https://zansara.substack.com/"
-
-[taxonomies]
- series = "series"
-
-# Menu links
-[[menu.main]]
- name = "About"
- weight = 1
- url = "about"
-[[menu.main]]
- name = "Posts"
- weight = 2
- url = "posts/"
-[[menu.main]]
- name = "Projects"
- weight = 3
- url = "projects/"
-[[menu.main]]
- name = "Publications"
- weight = 4
- url = "publications/"
-[[menu.main]]
- name = "Talks"
- weight = 5
- url = "talks/"
-[[menu.main]]
- name = "Stats"
- weight = 6
- url = "https://zansaradev.goatcounter.com"
diff --git a/contact/index.html b/contact/index.html
new file mode 100644
index 00000000..79e9573f
--- /dev/null
+++ b/contact/index.html
@@ -0,0 +1,10 @@
+
+
+
+ https://www.zansara.dev/about/
+
+
+
+
+
+
diff --git a/content/about.md b/content/about.md
deleted file mode 100644
index 9be5d6ba..00000000
--- a/content/about.md
+++ /dev/null
@@ -1,38 +0,0 @@
----
-title: "About"
-description: "A short introduction"
-aliases: ["about-me", "zansara", "contact"]
-author: "ZanSara"
----
-
-I am a Python and Gen AI software engineer based in Portugal, currently working as Lead AI Engineer for
-[Kwal](https://www.kwal.ai/) on voice agents and conversation analysis with LLMs for the recruitment industry.
-I am also open to collaborations on short-term projects and to give [talks](/talks) on the topic of AI, LLMs, and
-related fields.
-
-Until recently I worked for [deepset](https://www.deepset.ai/), a German startup working on NLP
-[since "before it was cool"](https://www.deepset.ai/about), where I was the
-[main contributor](https://github.com/deepset-ai/haystack/graphs/contributors) of
-[Haystack](https://haystack.deepset.ai/), their open-source framework for building highly customizable,
-production-ready NLP and LLM applications.
-
-Previously I worked for a few years at [CERN](https://home.cern/), where I began my software engineering career.
-During my time there I had the privilege of driving one [major decision](/publications/tucpr03/) to migrate the graphical
-interface's software of the accelerator's control systems from Java to PyQt, and then of helping a client department
-[migrate](/publications/thpv014/) to this stack. I have also worked on other infrastructure and data pipelines, some of
-which resulted in [publication](/publications/thpv042/).
-
-Outside of work I have too many [pet projects](/projects) to follow up with than the free time I can dedicate to them.
-I love science fiction and space exploration, I enjoy challenging hikes in nature and learning languages, as much as
-such process can be enjoyed.
-
-I speak native Italian and fluent English, but I've also learned French during my time at CERN, I'm studying Hungarian
-for family reasons, and Portuguese because I currently live there. I still can understand some Russian and I have a
-very basic understanding of Chinese, both from my teenage and university years.
-
-You can find my latest CV [here](/me/sara_zanzottera_cv.pdf). Check out also my [projects](/projects),
-my [publications](/publications) and my [talks](/talks). If you prefer newsletters you can find my posts also on
-[Substack](https://zansara.substack.com/).
-
-The best way to get in touch with me is through [email](mailto:blog@zansara.dev) or
-[LinkedIn](https://www.linkedin.com/in/sarazanzottera).
diff --git a/content/posts/2021-12-11-dotfiles.md b/content/posts/2021-12-11-dotfiles.md
deleted file mode 100644
index 034658fa..00000000
--- a/content/posts/2021-12-11-dotfiles.md
+++ /dev/null
@@ -1,24 +0,0 @@
----
-title: "My Dotfiles"
-date: 2021-12-11
-author: "ZanSara"
-featuredImage: "/posts/2021-12-11-dotfiles/cover.png"
----
-
-GitHub Repo: https://github.com/ZanSara/dotfiles
-
----
-
-What Linux developer would I be if I didn't also have my very own dotfiles repo?
-
-After many years of iterations I finally found a combination that lasted quite a while, so I figured it's time to treat them as a real project. It was originally optimized for my laptop, but then I realized it works quite well on my three-monitor desk setup as well without major issues.
-
-It sports:
-- [i3-wm](https://github.com/Airblader/i3) as window manager (of course, with gaps),
-- The typical trio of [polybar](https://github.com/polybar/polybar) , [rofi](https://github.com/davatorium/rofi) and [dunst](https://github.com/dunst-project/dunst) to handle top bar, start menu and notifications respectively,
-- The odd choice of [Ly](https://github.com/nullgemm/ly) as my display manager. I just love the minimal, TUI aesthetics of it. Don't forget to enable Doom's flames!
-- A minimalistic animated background from [xscreensaver](https://www.jwz.org/xscreensaver/screenshots/), [Grav](https://www.youtube.com/watch?v=spQRFDmDMeg). It's configured to leave no trails and stay black and white. An odd choice, and yet it manages to use no resources, stay very minimal, and bring a very (in my opinion) futuristic look to the entire setup.
-- [OhMyBash](https://github.com/ohmybash/oh-my-bash/tree/master/themes/font) with the [font](https://github.com/ohmybash/oh-my-bash/tree/master/themes/font) theme,
-- Other small amenities, like [nmtui](https://docs.rockylinux.org/gemstones/nmtui/) for network management, Japanese-numerals as workspace indicators, etc..
-
-Feel free to take what you like. If you end up using any of these, make sure to share the outcomes!
diff --git a/content/posts/2023-09-10-python-verbix-sdk.md b/content/posts/2023-09-10-python-verbix-sdk.md
deleted file mode 100644
index 13fd233c..00000000
--- a/content/posts/2023-09-10-python-verbix-sdk.md
+++ /dev/null
@@ -1,30 +0,0 @@
----
-title: "An (unofficial) Python SDK for Verbix"
-date: 2023-09-10
-author: "ZanSara"
-featuredImage: "/posts/2023-09-10-python-verbix-sdk/cover.png"
----
-
-PyPI package: https://pypi.org/project/verbix-sdk/
-
-GitHub Repo: https://github.com/ZanSara/verbix-sdk
-
-Minimal Docs: https://github.com/ZanSara/verbix-sdk/blob/main/README.md
-
----
-
-As part of a larger side project which is still in the works ([Ebisu Flashcards](https://github.com/ebisu-flashcards)), these days I found myself looking for some decent API for verbs conjugations in different languages. My requirements were "simple":
-
-- Supports many languages, including Italian, Portuguese and Hungarian
-- Conjugates irregulars properly
-- Offers an API access to the conjugation tables
-- Refuses to conjugate anything except for known verbs
-- (Optional) Highlights the irregularities in some way
-
-Surprisingly these seem to be a shortage of good alternatives in this field. All websites that host polished conjugation data don't seem to offer API access (looking at you, [Reverso](https://conjugator.reverso.net) -- you'll get your own post one day), and most of the simples ones use heuristics to conjugate, which makes them very prone to errors. So for now I ended up choosing [Verbix](https://verbix.com) to start from.
-
-Unfortunately the website doesn't inspire much confidence. I attempted to email the creator just to see them [close their email account](https://verbix.com/contact.html) a while later, an [update in their API](https://api.verbix.com/) seems to have stalled half-way, and the [blog seems dead](https://verb-blog.verbix.com/). I often have the feeling this site might go under any minute, as soon as their domain registration expires.
-
-But there are pros to it, as long as it lasts. Verbix offers verbs conjugation and nouns declination tables for some [very niche languages, dialects and conlangs](https://verbix.com/languages/), to a degree that many other popular websites does not even come close. To support such variety they use heuristic to create the conjugation tables, which is not the best: for Hungarian, for example, I could easily get it to conjugate for me [verbs that don't exist](https://verbix.com/webverbix/go.php?T1=meegy&Submit=Go&D1=121&H1=221) or that have spelling mistakes. On the other hand their API do have a field that says whether the verb is known or not, which is a great way to filter out false positives.
-
-So I decided to go the extra mile and I wrote a small Python SDK for their API: [verbix-sdk](https://pypi.org/project/verbix-sdk/). Enjoy it while it lasts...
diff --git a/content/posts/2023-10-10-haystack-series-intro.md b/content/posts/2023-10-10-haystack-series-intro.md
deleted file mode 100644
index dfccc550..00000000
--- a/content/posts/2023-10-10-haystack-series-intro.md
+++ /dev/null
@@ -1,29 +0,0 @@
----
-title: "Haystack 2.0: What is it?"
-date: 2023-10-10
-author: "ZanSara"
-series: ["Haystack 2.0 Series"]
-featuredImage: "/posts/2023-10-10-haystack-series-intro/cover.png"
----
-
-December is finally approaching, and with it the release of a [Haystack](https://github.com/deepset-ai/haystack) 2.0. At [deepset](https://www.deepset.ai/), we’ve been talking about it for months, we’ve been iterating on the core concepts what feels like a million times, and it looks like we’re finally getting ready for the approaching deadline.
-
-But what is it that makes this release so special?
-
-In short, Haystack 2.0 is a complete rewrite. A huge, big-bang style change. Almost no code survived the migration unmodified: we’ve been across the entire 100,000+ lines of the codebase and redone everything in under a year. For our small team, this is a huge accomplishment.
-
-In this series, I want to explain what Haystack 2 is from the perspective of the team that developed it. I'm gonna talk about what makes the new Pipeline so different from the old one, how to use new components and features, how these compare with the equivalent in Haystack 1 (when possible) and the principles that led the redesign. I had the pleasure (and sometimes the burden) of being involved in nearly all aspects of this process, from the requirements definition to the release, and I drove many of them through several iterations. In these posts, you can expect a mix of technical details and some diversions on the history and rationale behind each decision, as I’ve seen and understood them.
-
-For the curious readers, we have already released a lot of information about Haystack 2.0: check out this [this Github Discussion](https://github.com/deepset-ai/haystack/discussions/5568), or join us on [Haystack's Discord server](https://discord.com/invite/VBpFzsgRVF) and peek into the `haystack-2.0` channel for regular updates. We are also slowly building [brand new documentation](https://docs.haystack.deepset.ai/v2.0/docs) for everything, and don’t worry: we’ll make sure to make it as outstanding as the Haystack 1.x version is.
-
-We also regularly feature 2.0 features in our Office Hours on Discord. Follow [@Haystack_AI](https://twitter.com/Haystack_AI) or [@deepset_ai](https://twitter.com/deepset_ai) on Twitter to stay up-to-date, or [deepset](https://www.linkedin.com/company/deepset-ai) on Linkedin. And you’ll find me and the rest of the team on [GitHub](https://github.com/deepset-ai/haystack) frantically (re)writing code and filing down the rough edges before the big release.
-
-Stay tuned!
-
----
-
-*Next: [Why rewriting Haystack?!](/posts/2023-10-11-haystack-series-why)*
-
-*See the entire series here: [Haystack 2.0 series](/series/haystack-2.0-series/)*
-
-
diff --git a/content/posts/2023-10-11-haystack-series-why.md b/content/posts/2023-10-11-haystack-series-why.md
deleted file mode 100644
index 923280b1..00000000
--- a/content/posts/2023-10-11-haystack-series-why.md
+++ /dev/null
@@ -1,78 +0,0 @@
----
-title: "Why rewriting Haystack?!"
-date: 2023-10-11
-author: "ZanSara"
-series: ["Haystack 2.0 Series"]
-featuredImage: "/posts/2023-10-11-haystack-series-why/cover.png"
----
-
-Before even diving into what Haystack 2.0 is, how it was built, and how it works, let's spend a few words about the whats and the whys.
-
-First of all, *what is* Haystack?
-
-And next, why on Earth did we decide to rewrite it from the ground up?
-
-### A Pioneer Framework
-
-Haystack is a relatively young framework, its initial release dating back to [November 28th, 2019](https://github.com/deepset-ai/haystack/releases/tag/0.1.0). Back then, Natural Language Processing was a field that had just started moving its first step outside of research labs, and Haystack was one of the first libraries that promised enterprise-grade, production-ready NLP features. We were proud to enable use cases such as [semantic search](https://medium.com/deepset-ai/what-semantic-search-can-do-for-you-ea5b1e8dfa7f), [FAQ matching](https://medium.com/deepset-ai/semantic-faq-search-with-haystack-6a03b1e13053), document similarity, document summarization, machine translation, language-agnostic search, and so on.
-
-The field was niche but constantly moving, and research was lively. [The BERT paper](https://arxiv.org/abs/1810.04805) had been published a few months before Haystack's first release, unlocking a small revolution. In the shade of much larger research labs, [deepset](https://www.deepset.ai/), then just a pre-seed stage startup, was also pouring effort into [research](https://arxiv.org/abs/2104.12741) and [model training](https://huggingface.co/deepset).
-
-In those times, competition was close to non-existent. The field was still quite technical, and most people didn't fully understand its potential. We were free to explore features and use cases at our own pace and set the direction for our product. This allowed us to decide what to work on, what to double down on, and what to deprioritize, postpone, or ignore. Haystack was nurturing its own garden in what was fundamentally a green field.
-
-
-### ChatGPT
-
-This rather idyllic situation came to an end all too abruptly at the end of November 2022, when [ChatGPT was released](https://openai.com/blog/chatgpt).
-
-For us in the NLP field, everything seemed to change overnight. Day by day. For *months*.
-
-The speed of progress went from lively to faster-than-light all at once. Every company with the budget to train an LLM seemed to be doing so, and researchers kept releasing new models just as quickly. Open-source contributors pushed to reduce the hardware requirements for inference lower and lower. My best memory of those times is the drama of [LlaMa's first "release"](https://github.com/facebookresearch/llama/pull/73): I remember betting on March 2nd that within a week I would be running LlaMa models on my laptop, and I wasn't even surprised when my prediction [turned out true](https://news.ycombinator.com/item?id=35100086) with the release of [llama.cpp](https://github.com/ggerganov/llama.cpp) on March 10th.
-
-Of course, keeping up with this situation was far beyond us. Competitors started to spawn like mushrooms, and our space was quickly crowded with new startups, far more agile and aggressive than us. We suddenly needed to compete and realized we weren't used to it.
-
-### PromptNode vs FARMReader
-
-Luckily, Haystack seemed capable of keeping up, at least for a while. Thanks to the efforts of [Vladimir Blagojevic](https://twitter.com/vladblagoje), a few weeks after ChatGPT became a sensation, we added some decent support for LLMs in the form of [PromptNode](https://github.com/deepset-ai/haystack/pull/3665). Our SaaS team could soon bring new LLM-powered features to our customers. We even managed to add support for [Agents](https://github.com/deepset-ai/haystack/pull/3925), another hot topic in the wake of ChatGPT.
-
-However, the go-to library for LLMs was not Haystack in the mind of most developers. It was [LangChain](https://docs.langchain.com/docs/), and for a long time, it seemed like we would never be able to challenge their status and popularity. Everyone was talking about it, everyone was building demos, products, and startups on it, its development speed was unbelievable and, in the day-to-day discourse of the newly born LLM community, Haystack was nowhere to be found.
-
-Why?
-
-That's because no one even realized that Haystack, the semantic search framework from 2019, also supported LLMs. All our documentation, tutorials, blog posts, research efforts, models on HuggingFace, *everything* was pointing towards semantic search. LLMs were nowhere to be seen.
-
-And semantic search was going down *fast*.
-
-![Reader Models downloads graph](/posts/2023-10-11-haystack-series-why/reader-model-downloads.png)
-
-The image above shows today's monthly downloads for one of deepset's most successful models on HuggingFace,
-[deepset/roberta-base-squad2](https://huggingface.co/deepset/roberta-base-squad2). This model performs [extractive Question Answering](https://huggingface.co/tasks/question-answering), our former primary use case before the release of ChatGPT. Even with more than one and a half million downloads monthly, this model is experiencing a disastrous collapse in popularity, and in the current landscape, it is unlikely to ever recover.
-
-
-### A (Sort Of) Pivot
-
-In this context, around February 2023, we decided to bet on the rise of LLMs and committed to focus all our efforts towards becoming the #1 framework powering production-grade LLM applications.
-
-As we quickly realized, this was by far not an easy proposition. Extractive QA was not only ingrained deeply in our public image but in our codebase as well: implementing and maintaining PromptNode was proving more and more painful by the day, and when we tried to fit the concept of Agents into Haystack, it felt uncomfortably like trying to force a square peg into a round hole.
-
-Haystack pipelines made extractive QA straightforward for the users and were highly optimized for this use case. But supporting LLMs was nothing like enabling extractive QA. Using Haystack for LLMs was quite a painful experience, and at the same time, modifying the Pipeline class to accommodate them seemed like the best way to mess with all the users that relied on the current Pipeline for their existing, value-generating applications. Making mistakes with Pipeline could ruin us.
-
-With this realization in mind, we took what seemed the best option for the future of Haystack: a rewrite. The knowledge and experience we gained while working on Haystack 1 could fuel the design of Haystack 2 and act as a reference frame for it. Unlike our competitors, we already knew a lot about how to make NLP work at scale. We made many mistakes we would avoid in our next iteration. We knew that focusing on the best possible developer experience fueled the growth of Haystack 1 in the early days, and we were committed to doing the same for the next version of it.
-
-So, the redesign of Haystack started, and it started from the concept of Pipeline.
-
-### Fast-forward
-
-Haystack 2.0 hasn't been released yet, but for now, it seems that we have made the right decision at the start of the year.
-
-Haystack's name is starting to appear more often in discussions around LLMs. The general tone of the community is steadily shifting, and scaling up, rather than experimenting, is now the focus. Competitors are re-orienting themselves toward production-readiness, something we're visibly more experienced with. At the same time, LangChain is becoming a victim of its own success, collecting more and more criticism for its lack of documentation, leaky abstractions, and confusing architecture. Other competitors are gaining steam, but the overall landscape no longer feels as hostile.
-
-In the next post, I will explore the technical side of Haystack 2.0 and delve deeper into the concept of Pipelines: what they are, how to use them, how they evolved from Haystack 1 to Haystack 2, and why.
-
----
-
-*Next: [Haystack's Pipeline - A Deep Dive](/posts/2023-10-15-haystack-series-pipeline)*
-
-*Previous: [Haystack 2.0: What is it?](/posts/2023-10-10-haystack-series-intro)*
-
-*See the entire series here: [Haystack 2.0 series](/series/haystack-2.0-series/)*
\ No newline at end of file
diff --git a/content/posts/2023-10-15-haystack-series-pipeline.md b/content/posts/2023-10-15-haystack-series-pipeline.md
deleted file mode 100644
index 9dad446b..00000000
--- a/content/posts/2023-10-15-haystack-series-pipeline.md
+++ /dev/null
@@ -1,436 +0,0 @@
----
-title: "Haystack's Pipeline - A Deep Dive"
-date: 2023-10-15
-author: "ZanSara"
-series: ["Haystack 2.0 Series"]
-featuredImage: "/posts/2023-10-15-haystack-series-pipeline/cover.png"
----
-If you've ever looked at Haystack before, you must have come across the [Pipeline](https://docs.haystack.deepset.ai/docs/pipelines), one of the most prominent concepts of the framework. However, this abstraction is by no means an obvious choice when it comes to NLP libraries. Why did we adopt this concept, and what does it bring us?
-
-In this post, I go into all the details of how the Pipeline abstraction works in Haystack now, why it works this way, and its strengths and weaknesses. This deep dive into the current state of the framework is also a premise for the next episode, where I will explain how Haystack 2.0 addresses this version's shortcomings.
-
-If you think you already know how Haystack Pipelines work, give this post a chance: I might manage to change your mind.
-
-## A Bit Of History
-
-Interestingly, in the very first releases of Haystack, Pipelines were not a thing. Version 0.1.0 was released with a simpler object, the [Finder](https://github.com/deepset-ai/haystack/blob/d2c77f307788899eb562d3cb6e42c69b968b9f2a/haystack/__init__.py#L16), that did little more than gluing together a [Retriever](https://docs.haystack.deepset.ai/docs/retriever) and a [Reader](https://docs.haystack.deepset.ai/docs/reader), the two fundamental building blocks of a [semantic search](https://docs.haystack.deepset.ai/docs/glossary#semantic-search) application.
-
-In the next few months, however, the capabilities of language models expanded to enable many more use cases. One hot topic was [hybrid retrieval](https://haystack.deepset.ai/blog/hybrid-retrieval): a system composed of two different Retrievers, an optional [Ranker](https://docs.haystack.deepset.ai/docs/ranker), and an optional Reader. This kind of application clearly didn't fit the Finder's design, so in [version 0.6.0](https://github.com/deepset-ai/haystack/releases/tag/v0.6.0) the [Pipeline](https://docs.haystack.deepset.ai/docs/pipelines) object was introduced: a new abstraction that helped users build applications as a graph of components.
-
-Pipeline's API was a huge step forward from Finder. It instantly enabled seemingly endless combinations of components, unlocked almost all use cases conceivable, and became a foundational Haystack concept meant to stay for a very long time. In fact, the API offered by the first version of Pipeline changed very little since its initial release.
-
-This is the snippet included in the release notes of version 0.6.0 to showcase hybrid retrieval. Does it look familiar?
-
-```python
-p = Pipeline()
-p.add_node(component=es_retriever, name="ESRetriever", inputs=["Query"])
-p.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["Query"])
-p.add_node(component=JoinDocuments(join_mode="concatenate"), name="JoinResults", inputs=["ESRetriever", "DPRRetriever"])
-p.add_node(component=reader, name="QAReader", inputs=["JoinResults"])
-res = p.run(query="What did Einstein work on?", top_k_retriever=1)
-```
-
-## A Powerful Abstraction
-
-One fascinating aspect of this Pipeline model is the simplicity of its user-facing API. In almost all examples, you see only two or three methods used:
-
-- `add_node`: to add a component to the graph and connect it to the others.
-- `run`: to run the Pipeline from start to finish.
-- `draw`: to draw the graph of the Pipeline to an image.
-
-At this level, users don't need to know what kind of data the components need to function, what they produce, or even what the components *do*: all they need to know is the place they must occupy in the graph for the system to work.
-
-For example, as long as the users know that their hybrid retrieval pipeline should look more or less like this (note: this is the output of `Pipeline.draw()`), translating it into a Haystack Pipeline object using a few `add_node` calls is mostly straightforward.
-
-![Hybrid Retrieval](/posts/2023-10-15-haystack-series-pipeline/hybrid-retrieval.png)
-
-This fact is reflected by the documentation of the various components as well. For example, this is how the documentation page for Ranker opens:
-
-![Ranker Documentation](/posts/2023-10-15-haystack-series-pipeline/ranker-docs.png)
-
-Note how the first information about this component is *where to place it*. Right after, it specifies its inputs and outputs, even though it's not immediately clear why we need this information, and then lists which specific classes can cover the role of a Ranker.
-
-The message is clear: all Ranker classes are functionally interchangeable, and as long as you place them correctly in the Pipeline, they will fulfill the function of Ranker as you expect them to. Users don't need to understand what distinguishes `CohereRanker` from `RecentnessReranker` unless they want to: the documentation promises that you can swap them safely, and thanks to the Pipeline abstraction, this statement mostly holds true.
-
-## Ready-made Pipelines
-
-But how can the users know which sort of graph they have to build?
-
-Most NLP applications are made by a relatively limited number of high-level components: Retriever, Readers, Rankers, plus the occasional Classifier, Translator, or Summarizer. Systems requiring something more than these components used to be really rare, at least when talking about "query" pipelines (more on this later).
-
-Therefore, at this level of abstraction, there are just a few graph topologies possible. Better yet, they could each be mapped to high-level use cases such as semantic search, language-agnostic document search, hybrid retrieval, and so on.
-
-But the crucial point is that, in most cases, tailoring the application did not require any changes to the graph's shape. Users only need to identify their use case, find an example or a tutorial defining the shape of the Pipeline they need, and then swap the single components with other instances from the same category until they find the best combination for their exact requirements.
-
-This workflow was evident and encouraged: it was the philosophy behind Finder as well, and from version 0.6.0, Haystack immediately provided what are called "[Ready-made Pipelines](https://docs.haystack.deepset.ai/docs/ready_made_pipelines)": objects that initialized the graph on the user's behalf, and expected as input the components to place in each point of the graph: for example a Reader and a Retriever, in case of simple Extractive QA.
-
-With this further abstraction on top of Pipeline, creating an NLP application became an action that doesn't even require the user to be aware of the existence of the graph. In fact:
-
-```python
-pipeline = ExtractiveQAPipeline(reader, retriever)
-```
-
-is enough to get your Extractive QA applications ready to answer your questions. And you can do so with just another line.
-
-```python
-answers = pipeline.run(query="What did Einstein work on?")
-```
-
-## "Flexibility powered by DAGs"
-
-This abstraction is extremely powerful for the use cases that it was designed for. There are a few layers of ease of use vs. customization the user can choose from depending on their expertise, which help them progress from a simple ready-made Pipeline to fully custom graphs.
-
-However, the focus was oriented so much on the initial stages of the user's journey that power-users' needs were sometimes forgotten. Such issues didn't show immediately, but quickly added friction as soon as the users tried to customize their system beyond the examples from the tutorials and the documentation.
-
-For an example of these issues, let's talk about pipelines with branches. Here are two small, apparently very similar pipelines.
-
-![Query Classification vs Hybrid Retrieval](/posts/2023-10-15-haystack-series-pipeline/branching-query-pipelines.png)
-
-The first Pipeline represents the Hybrid Retrieval use case we've met with before. Here, the Query node sends its outputs to both retrievers, and they both produce some output. For the Reader to make sense of this data, we need a Join node that merges the two lists into one and a Ranker that takes the lists and sorts them again by similarity to the query. Ranker then sends the rearranged list to the Reader.
-
-The second Pipeline instead performs a simpler form of Hybrid Retrieval. Here, the Query node sends its outputs to a Query Classifier, which then triggers only one of the two retrievers, the one that is expected to perform better on it. The triggered Retriever then sends its output directly to the Reader, which doesn't need to know which Retriever the data comes from. So, in this case, we don't need the Join node.
-
-The two pipelines are built as you would expect, with a bunch of `add_node` calls. You can even run them with the same identical code, which is the same code needed for every other Pipeline we've seen so far.
-
-```python
-pipeline_1 = Pipeline()
-pipeline_1.add_node(component=sparse_retriever, name="SparseRetriever", inputs=["Query"])
-pipeline_1.add_node(component=dense_retriever, name="DenseRetriever", inputs=["Query"])
-pipeline_1.add_node(component=join_documents, name="JoinDocuments", inputs=["SparseRetriever", "DenseRetriever"])
-pipeline_1.add_node(component=rerank, name="Ranker", inputs=["JoinDocuments"])
-pipeline_1.add_node(component=reader, name="Reader", inputs=["SparseRetriever", "DenseRetriever"])
-
-answers = pipeline_1.run(query="What did Einstein work on?")
-```
-```python
-pipeline_2 = Pipeline()
-pipeline_2.add_node(component=query_classifier, name="QueryClassifier", inputs=["Query"])
-pipeline_2.add_node(component=sparse_retriever, name="DPRRetriever", inputs=["QueryClassifier"])
-pipeline_2.add_node(component=dense_retriever, name="ESRetriever", inputs=["QueryClassifier"])
-pipeline_2.add_node(component=reader, name="Reader", inputs=["SparseRetriever", "DenseRetriever"])
-
-answers = pipeline_2.run(query="What did Einstein work on?")
-```
-
-Both pipelines run as you would expect them to. Hooray! Pipelines can branch and join!
-
-Now, let's take the first Pipeline and customize it further.
-
-For example, imagine we want to expand language support to include French. The dense Retriever has no issues handling several languages as long as we select a multilingual model; however, the sparse Retriever needs the keywords to match, so we must translate the queries to English to find some relevant documents in our English-only knowledge base.
-
-Here is what the Pipeline ends up looking like. Language Classifier sends all French queries over `output_1` and all English queries over `output_2`. In this way, the query passes through the Translator node only if it is written in French.
-
-![Multilingual Hybrid Retrieval](/posts/2023-10-15-haystack-series-pipeline/multilingual-hybrid-retrieval.png)
-
-```python
-pipeline = Pipeline()
-pipeline.add_node(component=language_classifier, name="LanguageClassifier", inputs=["Query"])
-pipeline.add_node(component=translator, name="Translator", inputs=["LanguageClassifier.output_1"])
-pipeline.add_node(component=sparse_retriever, name="SparseRetriever", inputs=["Translator", "LanguageClassifier.output_2"])
-pipeline.add_node(component=dense_retriever, name="DenseRetriever", inputs=["LanguageClassifier.output_1", "LanguageClassifier.output_2"])
-pipeline.add_node(component=join_documents, name="JoinDocuments", inputs=["SparseRetriever", "DenseRetriever"])
-pipeline.add_node(component=rerank, name="Ranker", inputs=["JoinDocuments"])
-pipeline.add_node(component=reader, name="Reader", inputs=["Ranker"])
-```
-
-But... wait. Let's look again at the graph and at the code. DenseRetriever should receive *two* inputs from Language Classifier: both `output_1` and `output_2`, because it can handle both languages. What's going on? Is this a bug in `draw()`?
-
-Thanks to the `debug=True` parameter of `Pipeline.run()`, we start inspecting what each node saw during the execution, and we realize quickly that our worst fears are true: this is a bug in the Pipeline implementation. The underlying library powering the Pipeline's graphs takes the definition of Directed Acyclic Graphs very seriously and does not allow two nodes to be connected by more than one edge. There are, of course, other graph classes supporting this case, but Haystack happens to use the wrong one.
-
-Interestingly, Pipeline doesn't even notice the problem and does not fail. It runs as the drawing suggests: when the query happens to be in French, only the sparse Retriever will process it.
-
-Clearly, this is not good for us.
-
-Well, let's look for a workaround. Given that we're Haystack power users by now, we realize that we can use a Join node with a single input as a "no-op" node. If we put it along one of the edges, that edge won't directly connect Language Classifier and Dense Retriever, so the bug should be solved.
-
-So here is our current Pipeline:
-
-![Multilingual Hybrid Retrieval with No-Op Joiner](/posts/2023-10-15-haystack-series-pipeline/multilingual-hybrid-retrieval-with-noop.png)
-
-```python
-pipeline = Pipeline()
-pipeline.add_node(component=language_classifier, name="LanguageClassifier", inputs=["Query"])
-pipeline.add_node(component=translator, name="Translator", inputs=["LanguageClassifier.output_1"])
-pipeline.add_node(component=sparse_retriever, name="SparseRetriever", inputs=["Translator", "LanguageClassifier.output_2"])
-pipeline.add_node(component=no_op_join, name="NoOpJoin", inputs=["LanguageClassifier.output_1"])
-pipeline.add_node(component=dense_retriever, name="DenseRetriever", inputs=["NoOpJoin", "LanguageClassifier.output_2"])
-pipeline.add_node(component=join_documents, name="JoinDocuments", inputs=["SparseRetriever", "DenseRetriever"])
-pipeline.add_node(component=rerank, name="Ranker", inputs=["JoinDocuments"])
-pipeline.add_node(component=reader, name="Reader", inputs=["Ranker"])
-```
-
-Great news: the Pipeline now runs as we expect! However, when we run a French query, the results are better but still surprisingly bad.
-
-What now? Is the dense Retriever still not running? Is the Translation node doing a poor job?
-
-Some debugging later, we realize that the Translator is amazingly good and the Retrievers are both running. But we forgot another piece of the puzzle: Ranker needs the query to be in the same language as the documents. It requires the English version of the query, just like the sparse Retriever does. However, right now, it receives the original French query, and that's the reason for the lack of performance. We soon realize that this is very important also for the Reader.
-
-So... how does the Pipeline pass the query down to the Ranker?
-
-Until this point, we didn't need to know how exactly values are passed from one component to the next. We didn't need to care about their inputs and outputs at all: Pipeline was doing all this dirty work for us. Suddenly, we need to tell the Pipeline which query to pass to the Ranker and we have no idea how to do that.
-
-Worse yet. There is *no way* to reliably do that. The documentation seems to blissfully ignore the topic, docstrings give us no pointers, and looking at [the routing code of Pipeline](https://github.com/deepset-ai/haystack/blob/aaee03aee87e96acd8791b9eff999055a8203237/haystack/pipelines/base.py#L483) we quickly get dizzy and cut the chase. We dig through the Pipeline API several times until we're confident that there's nothing that can help.
-
-Well, there must be at least some workaround. Maybe we can forget about this issue by rearranging the nodes.
-
-One easy way out is to translate the query for both retrievers instead of only for the sparse one. This solution also eliminates the NoOpJoin node we introduced earlier, so it doesn't sound too bad.
-
-The Pipeline looks like this now.
-
-![Multilingual Hybrid Retrieval with two Translators](/posts/2023-10-15-haystack-series-pipeline/multilingual-hybrid-retrieval-two-translators.png)
-
-```python
-pipeline = Pipeline()
-pipeline.add_node(component=language_classifier, name="LanguageClassifier", inputs=["Query"])
-pipeline.add_node(component=translator, name="Translator", inputs=["LanguageClassifier.output_1"])
-pipeline.add_node(component=sparse_retriever, name="SparseRetriever", inputs=["Translator", "LanguageClassifier.output_2"])
-pipeline.add_node(component=translator_2, name="Translator2", inputs=["LanguageClassifier.output_1"])
-pipeline.add_node(component=dense_retriever, name="DenseRetriever", inputs=["Translator2", "LanguageClassifier.output_2"])
-pipeline.add_node(component=join_documents, name="JoinDocuments", inputs=["SparseRetriever", "DenseRetriever"])
-pipeline.add_node(component=rerank, name="Ranker", inputs=["JoinDocuments"])
-pipeline.add_node(component=reader, name="Reader", inputs=["Ranker"])
-```
-
-We now have two nodes that contain identical translator components. Given that they are stateless, we can surely place the same instance in both places, with different names, and avoid doubling its memory footprint just to work around a couple of Pipeline bugs. After all, Translator nodes use relatively heavy models for machine translation.
-
-This is what Pipeline replies as soon as we try.
-
-```
-PipelineConfigError: Cannot add node 'Translator2'. You have already added the same
-instance to the Pipeline under the name 'Translator'.
-```
-
-Okay, so it seems like we can't re-use components in two places: there is an explicit check against this, for some reason. Alright, let's rearrange *again* this Pipeline with this new constraint in mind.
-
-How about we first translate the query and then distribute it?
-
-![Multilingual Hybrid Retrieval, translate-and-distribute](/posts/2023-10-15-haystack-series-pipeline/multilingual-hybrid-retrieval-translate-and-distribute.png)
-
-```python
-pipeline = Pipeline()
-pipeline.add_node(component=language_classifier, name="LanguageClassifier", inputs=["Query"])
-pipeline.add_node(component=translator, name="Translator", inputs=["LanguageClassifier.output_1"])
-pipeline.add_node(component=sparse_retriever, name="SparseRetriever", inputs=["Translator", "LanguageClassifier.output_2"])
-pipeline.add_node(component=dense_retriever, name="DenseRetriever", inputs=["Translator", "LanguageClassifier.output_2"])
-pipeline.add_node(component=join_documents, name="JoinDocuments", inputs=["SparseRetriever", "DenseRetriever"])
-pipeline.add_node(component=rerank, name="Ranker", inputs=["JoinDocuments"])
-pipeline.add_node(component=reader, name="Reader", inputs=["Ranker"])
-```
-
-Looks neat: there is no way now for the original French query to reach Ranker now. Right?
-
-We run the pipeline again and soon realize that nothing has changed. The query received by Ranker is still in French, untranslated. Shuffling the order of the `add_node` calls and the names of the components in the `inputs` parameters seems to have no effect on the graph. We even try to connect Translator directly with Ranker in a desperate attempt to forward the correct value, but Pipeline now starts throwing obscure, apparently meaningless error messages like:
-
-```
-BaseRanker.run() missing 1 required positional argument: 'documents'
-```
-
-Isn't Ranker receiving the documents from JoinDocuments? Where did they go?
-
-Having wasted far too much time on this relatively simple Pipeline, we throw the towel, go to Haystack's Discord server, and ask for help.
-
-Soon enough, one of the maintainers shows up and promises a workaround ASAP. You're skeptical at this point, but the workaround, in fact, exists.
-
-It's just not very pretty.
-
-![Multilingual Hybrid Retrieval, working version](/posts/2023-10-15-haystack-series-pipeline/multilingual-hybrid-retrieval-workaround.png)
-
-```python
-pipeline = Pipeline()
-pipeline.add_node(component=language_classifier, name="LanguageClassifier", inputs=["Query"])
-pipeline.add_node(component=translator_workaround, name="TranslatorWorkaround", inputs=["LanguageClassifier.output_2"])
-pipeline.add_node(component=sparse_retriever, name="SparseRetriever", inputs=["LanguageClassifier.output_1", "TranslatorWorkaround"])
-pipeline.add_node(component=dense_retriever, name="DenseRetriever", inputs=["LanguageClassifier.output_1", "TranslatorWorkaround"])
-pipeline.add_node(component=join_documents, name="JoinDocuments", inputs=["SparseRetriever", "DenseRetriever"])
-pipeline.add_node(component=join_query_workaround, name="JoinQueryWorkaround", inputs=["TranslatorWorkaround", "JoinDocuments"])
-pipeline.add_node(component=rerank, name="Ranker", inputs=["JoinQueryWorkaround"])
-pipeline.add_node(component=reader, name="Reader", inputs=["Ranker"])
-```
-
-Note that you need two custom nodes: a wrapper for the Translator and a brand-new Join node.
-
-```python
-class TranslatorWorkaround(TransformersTranslator):
-
- outgoing_edges = 1
-
- def run(self, query):
- results, edge = super().run(query=query)
- return {**results, "documents": [] }, "output_1"
-
- def run_batch(self, queries):
- pass
-
-
-class JoinQueryWorkaround(JoinNode):
-
- def run_accumulated(self, inputs, *args, **kwargs):
- return {"query": inputs[0].get("query", None), "documents": inputs[1].get("documents", None)}, "output_1"
-
- def run_batch_accumulated(self, inputs):
- pass
-
-```
-
-Along with this beautiful code, we also receive an explanation about how the `JoinQueryWorkaround` node works only for this specific Pipeline and is pretty hard to generalize, which is why it's not present in Haystack right now. I'll spare you the details: you will have an idea why by the end of this journey.
-
-Wanna play with this Pipeline yourself and try to make it work in another way? Check out the [Colab](https://drive.google.com/file/d/18Gqfd0O828T71Gc-IHeU4v7OXwaPk7Fc/view?usp=sharing) or the [gist](https://gist.github.com/ZanSara/33020a980f2f535e2529df4ca4e8f08a) and have fun.
-
-Having learned only that it's better not to implement unusual branching patterns with Haystack unless you're ready for a fight, let's now turn to the indexing side of your application. We'll stick to the basics this time.
-
-## Indexing Pipelines
-
-Indexing pipelines' main goal is to transform files into Documents from which a query pipeline can later retrieve information. They mostly look like the following.
-
-![Indexing Pipeline](/posts/2023-10-15-haystack-series-pipeline/indexing-pipeline.png)
-
-And the code looks just like how you would expect it.
-
-```python
-pipeline = Pipeline()
-pipeline.add_node(component=file_type_classifier, name="FileTypeClassifier", inputs=["File"])
-pipeline.add_node(component=text_converter, name="TextConverter", inputs=["FileTypeClassifier.output_1"])
-pipeline.add_node(component=pdf_converter, name="PdfConverter", inputs=["FileTypeClassifier.output_2"])
-pipeline.add_node(component=docx_converter, name="DocxConverter", inputs=["FileTypeClassifier.output_4"])
-pipeline.add_node(component=join_documents, name="JoinDocuments", inputs=["TextConverter", "PdfConverter", "DocxConverter"])
-pipeline.add_node(component=preprocessor, name="Preprocessor", inputs=["JoinDocuments"])
-pipeline.add_node(component=document_store, name="DocumentStore", inputs=["Preprocessor"])
-
-pipeline.run(file_paths=paths)
-```
-There is no surprising stuff here. The starting node is File instead of Query, which seems logical given that this Pipeline expects a list of files, not a query. There is a document store at the end which we didn't use in query pipelines so far, but it's not looking too strange. It's all quite intuitive.
-
-Indexing pipelines are run by giving them the paths of the files to convert. In this scenario, more than one Converter may run, so we place a Join node before the PreProcessor to make sense of the merge. We make sure that the directory contains only files that we can convert, in this case, .txt, .pdf, and .docx, and then we run the code above.
-
-The code, however, fails.
-
-```
-ValueError: Multiple non-default file types are not allowed at once.
-```
-
-The more we look at the error, the less it makes sense. What are non-default file types? Why are they not allowed at once, and what can I do to fix that?
-
-We head for the documentation, where we find a lead.
-
-![`FileTypeClassifier documentation`](/posts/2023-10-15-haystack-series-pipeline/filetypeclassifier-docs.png)
-
-So it seems like the File Classifier can only process the files if they're all of the same type.
-
-After all we've been through with the Hybrid Retrieval pipelines, this sounds wrong. We know that Pipeline can run two branches at the same time. We've been doing it all the time just a moment ago. Why can't FileTypeClassifier send data to two converters just like LanguageClassifier sends data to two retrievers?
-
-Turns out, this is *not* the same thing.
-
-Let's compare the three pipelines and try to spot the difference.
-
-![All branching pipelines, side by side](/posts/2023-10-15-haystack-series-pipeline/all-branching-pipelines.png)
-
-In the first case, Query sends the same identical value to both Retrievers. So, from the component's perspective, there's a single output being produced: the Pipeline takes care of copying it for all nodes connected to it.
-
-In the second case, QueryClassifier can send the query to either Retriever but never to both. So, the component can produce two different outputs, but at every run, it will always return just one.
-
-In the third case, FileTypeClassifier may need to produce two different outputs simultaneously: for example, one with a list of text files and one with a list of PDFs. And it turns out this can't be done. This is, unfortunately, a well-known limitation of the Pipeline/BaseComponent API design.
-The output of a component is defined as a tuple, `(output_values, output_edge)`, and nodes can't produce a list of these tuples to send different values to different nodes.
-
-That's the end of the story. This time, there is no workaround. You must pass the files individually or forget about using a Pipeline for this task.
-
-## Validation
-
-On top of these challenges, other tradeoffs had to be taken for the API to look so simple at first impact. One of these is connection validation.
-
-Let's imagine we quickly skimmed through a tutorial and got one bit of information wrong: we mistakenly believe that in an Extractive QA Pipeline, you need to place a Reader in front of a Retriever. So we sit down and write this.
-
-```python
-p = Pipeline()
-p.add_node(component=reader, name="Reader", inputs=["Query"])
-p.add_node(component=retriever, name="Retriever", inputs=["Reader"])
-```
-
-Up to this point, running the script raises no error. Haystack is happy to connect these two components in this order. You can even `draw()` this Pipeline just fine.
-
-![Swapper Retriever/Reader Pipeline](/posts/2023-10-15-haystack-series-pipeline/swapped-retriever-reader.png)
-
-Alright, so what happens when we run it?
-
-```python
-res = p.run(query="What did Einstein work on?")
-```
-```
-BaseReader.run() missing 1 required positional argument: 'documents'
-```
-
-This is the same error we've seen in the translating hybrid retrieval pipeline earlier, but fear not! Here, we can follow the suggestion of the error message by doing:
-
-```python
-res = p.run(query="What did Einstein work on?", documents=document_store.get_all_documents())
-```
-
-And to our surprise, this Pipeline doesn't crash. It just hangs there, showing an insanely slow progress bar, telling us that some inference is in progress. A few hours later, we kill the process and consider switching to another framework because this one is clearly very slow.
-
-What happened?
-
-The cause of this issue is the same that makes connecting Haystack components in a Pipeline so effortless, and it's related to the way components and Pipeline communicate. If you check `Pipeline.run()`'s signature, you'll see that it looks like this:
-
-
-```python
-def run(
- self,
- query: Optional[str] = None,
- file_paths: Optional[List[str]] = None,
- labels: Optional[MultiLabel] = None,
- documents: Optional[List[Document]] = None,
- meta: Optional[Union[dict, List[dict]]] = None,
- params: Optional[dict] = None,
- debug: Optional[bool] = None,
-):
-```
-
-which mirrors the `BaseComponent.run()` signature, the base class nodes have to inherit from.
-
-```python
-@abstractmethod
-def run(
- self,
- query: Optional[str] = None,
- file_paths: Optional[List[str]] = None,
- labels: Optional[MultiLabel] = None,
- documents: Optional[List[Document]] = None,
- meta: Optional[dict] = None,
-) -> Tuple[Dict, str]:
-```
-
-This match means a few things:
-
-- Every component can be connected to every other because their inputs are identical.
-
-- Every component can only output the same variables received as input.
-
-- It's impossible to tell if it makes sense to connect two components because their inputs and outputs always match.
-
-Take this with a grain of salt: the actual implementation is far more nuanced than what I just showed you, but the problem is fundamentally this: components are trying to be as compatible as possible with all others and they have no way to signal, to the Pipeline or to the users, that they're meant to be connected only to some nodes and not to others.
-
-In addition to this problem, to respect the shared signature, components often take inputs that they don't use. A Ranker only needs documents, so all the other inputs required by the run method signature go unused. What do components do with the values? It depends:
-
-- Some have them in the signature and forward them unchanged.
-- Some have them in the signature and don't forward them.
-- Some don't have them in the signature, breaking the inheritance pattern, and Pipeline reacts by assuming that they should be added unchanged to the output dictionary.
-
-If you check closely the two workaround nodes for the Hybrid Retrieval pipeline we tried to build before, you'll notice the fix entirely focuses on altering the routing of the unused parameters `query` and `documents` to make the Pipeline behave the way the user expects. However, this behavior does not generalize: a different pipeline would require another behavior, which is why the components behave differently in the first place.
-
-
-## Wrapping up
-
-I could go on for ages talking about the shortcomings of complex Pipelines, but I'd rather stop here.
-
-Along this journey into the guts of Haystack Pipelines, we've seen at the same time some beautiful APIs and the ugly consequences of their implementation. As always, there's no free lunch: trying to over-simplify the interface will bite back as soon as the use cases become nontrivial.
-
-However, we believe that this concept has a huge potential and that this version of Pipeline can be improved a lot before the impact on the API becomes too heavy. In Haystack 2.0, armed with the experience we gained working with this implementation of Pipeline, we reimplemented it in a fundamentally different way, which will prevent many of these issues.
-
-In the next post, we're going to see how.
-
----
-
-*Next: [Canals: a new concept of Pipeline](/posts/2023-10-26-haystack-series-canals)*
-
-*Previous: [Why rewriting Haystack?!](/posts/2023-10-11-haystack-series-why)*
-
-*See the entire series here: [Haystack 2.0 series](/series/haystack-2.0-series/)*
\ No newline at end of file
diff --git a/content/posts/2023-10-26-haystack-series-canals.md b/content/posts/2023-10-26-haystack-series-canals.md
deleted file mode 100644
index cdc4c726..00000000
--- a/content/posts/2023-10-26-haystack-series-canals.md
+++ /dev/null
@@ -1,417 +0,0 @@
----
-title: "A New Approach to Haystack Pipelines"
-date: 2023-10-26
-author: "ZanSara"
-series: ["Haystack 2.0 Series"]
-featuredImage: "/posts/2023-10-26-haystack-series-canals/cover-updated.png"
----
-
-_Updated on 21/12/2023_
-
----
-
-As we have seen in [the previous episode of this series](https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/), Haystack's Pipeline is a powerful concept that comes with its set of benefits and shortcomings. In Haystack 2.0, the pipeline was one of the first items that we focused our attention on, and it was the starting point of the entire rewrite.
-
-What does this mean in practice? Let's look at what Haystack Pipelines in 2.0 will be like, how they differ from their 1.x counterparts, and the pros and cons of this new paradigm.
-
-## New Use Cases
-
-I've already written [at length](https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/) about what made the original Pipeline concept so powerful and its weaknesses. Pipelines were overly effective for the use cases we could conceive while developing them, but they didn't generalize well on unforeseen situations.
-
-For a long time, Haystack was able to afford not focusing on use cases that didn't fit its architecture, as I have mentioned in my [previous post](https://www.zansara.dev/posts/2023-10-11-haystack-series-why/) about the reasons for the rewrite. The pipeline was then more than sufficient for its purposes.
-
-However, the situation flipped as LLMs and Generative AI "entered" the scene abruptly at the end of 2022 (although it's certainly been around for longer). Our `Pipeline` although useable and still quite powerful in many LLM use-cases, seemingly overfit the original use-cases it was designed for.
-
-Let's take one of these use cases and see where it leads us.
-
-## RAG Pipelines
-
-Let's take one typical example: [retrieval augmented generation](https://www.deepset.ai/blog/llms-retrieval-augmentation), or RAG for short. This technique has been used since the very early days of the Generative AI boom as an easy way to strongly [reduce hallucinations](https://haystack.deepset.ai/blog/generative-vs-extractive-models) and improve the alignment of LLMs. The basic idea is: instead of asking directly a question, such as `"What's the capital of France?"`, we send to the model a more complex prompt, that includes both the question and the answer. Such a prompt might be:
-
-```text
-Given the following paragraph, answer the question.
-
-Paragraph: France is a unitary semi-presidential republic with its capital in Paris,
-the country's largest city and main cultural and commercial centre; other major urban
-areas include Marseille, Lyon, Toulouse, Lille, Bordeaux, Strasbourg and Nice.
-
-Question: What's the capital of France?
-
-Answer:
-```
-
-In this situation, the task of the LLM becomes far easier: instead of drawing facts from its internal knowledge, which might be lacking, inaccurate, or out-of-date, the model can use the paragraph's content to answer the question, improving the model's performance significantly.
-
-We now have a new problem, though. How can we provide the correct snippets of text to the LLM? This is where the "retrieval" keyword comes up.
-
-One of Haystack's primary use cases had been [Extractive Question Answering](https://huggingface.co/tasks/question-answering): a system where a Retriever component searches a Document Store (such as a vector or SQL database) for snippets of text that are the most relevant to a given question. It then sends such snippets to a Reader (an extractive model), which highlights the keywords that answer the original question.
-
-By replacing a Reader model with an LLM, we get a Retrieval Augmented Generation Pipeline. Easy!
-
-![Generative vs Extractive QA Pipeline Graph](/posts/2023-10-26-haystack-series-canals/gen-vs-ext-qa-pipeline.png)
-
-So far, everything checks out. Supporting RAG with Haystack feels not only possible but natural. Let's take this simple example one step forward: what if, instead of getting the data from a document store, I want to retrieve data from the Internet?
-
-## Web RAG
-
-At first glance, the task may not seem daunting. We surely need a special Retriever that, instead of searching through a DB, searches through the Internet using a search engine. But the core concepts stay the same, and so, we assume, should the pipeline's graph. The end result should be something like this:
-
-![Initial Web RAG Pipeline Graph](/posts/2023-10-26-haystack-series-canals/initial-web-rag-pipeline.png)
-
-However, the problem doesn't end there. Search engines return links, which need to be accessed, and the content of the webpage downloaded. Such pages may be extensive and contain artifacts, so the resulting text needs to be cleaned, reduced into paragraphs, potentially embedded by a retrieval model, ranked against the original query, and only the top few resulting pieces of text need to be passed over to the LLM. Just by including these minimal requirements, our pipeline already looks like this:
-
-![Linear Web RAG Pipeline Graph](/posts/2023-10-26-haystack-series-canals/linear-web-rag-pipeline.png)
-
-And we still need to consider that URLs may reference not HTML pages but PDFs, videos, zip files, and so on. We need file converters, zip extractors, audio transcribers, and so on.
-
-![Multiple File Type Web RAG Pipeline Graph](/posts/2023-10-26-haystack-series-canals/multifile-web-rag-pipeline.png)
-
-You may notice how this use case moved quickly from looking like a simple query pipeline into a strange overlap of a query and an indexing pipeline. As we've learned in the previous post, indexing pipelines have their own set of quirks, one of which is that they can't simultaneously process files of different types. But we can only expect the Search Engine to retrieve HTML files or PDFs if we filter them out on purpose, which makes the pipeline less effective. In fact, a pipeline that can read content from different file types, such as the one above, can't really be made to work.
-
-And what if, on top of this, we need to cache the resulting documents to reduce latency? What if I wanted to get the results from Google's page 2, but only if the content of page 1 did not answer our question? At this point, the pipeline is hard to imagine, let alone draw.
-
-Although Web RAG is somewhat possible in Haystack, it stretches far beyond what the pipeline was designed to handle. Can we do better?
-
-## Pinpointing the issue
-
-When we went back to the drawing board to address these concerns, the first step was pinpointing the issue.
-
-The root problem, as we realized, is that Haystack Pipelines treats each component as a locomotive treats its wagons. They all look the same from the pipeline's perspective, they can all be connected in any order, and they all go from A to B rolling over the same pair of rails, passing all through the same stations.
-
-![Cargo Train](/posts/2023-10-26-haystack-series-canals/train.png)
-
-In Haystack 1, components are designed to serve the pipeline's needs first. A good component is identical to all the others, provides the exact interface the pipeline requires, and can be connected to any other in any order. The components are awkward to use outside of a pipeline due to the same `run()` method that makes the pipeline so ergonomic. Why does the Ranker, which needs only a query and a list of Documents to operate, also accept `file_paths` and `meta` in its `run()` method? It does so uniquely to satisfy the pipeline's requirements, which in turn only exist to make all components forcefully compatible with each other.
-
-Just like a locomotive, the pipeline pushes the components over the input data one by one. When seen in this light, it's painfully obvious why the indexing pipeline we've seen earlier can't work: the "pipeline train" can only go on one branch at a time. Component trains can't split mid-execution. They are designed to all see the same data all the time. Even when branching happens, all branches always see the same data. Sending different wagons onto different rails is not possible by design.
-
-## Breaking it down
-
-The issue's core is more evident when seen in this light. The pipeline is the only object that drives the execution, while components tend to be as passive and uniform as possible. This approach doesn't scale: components are fundamentally different, and asking them to all appear equal forces them to hide their differences, making bugs and odd behavior more likely. As the number of components to handle grows, their variety will increase regardless, so the pipeline must always be aware of all the possibilities to manage them and progressively add edge cases that rapidly increase its complexity.
-
-Therefore, the pipeline rewrite for Haystack 2.0 focused on one core principle: the components will define and drive the execution process. There is no locomotive anymore: every component needs to find its way, such as grabbing the data they need from the producers and sending their results to whoever needs them by declaring the proper connections. In the railway metaphor, it's like adding a steering wheel to each container: the result is a truck, and the resulting system looks now like a highway.
-
-![Highway](/posts/2023-10-26-haystack-series-canals/highway.png)
-
-Just as railways are excellent at going from A to B when you only need to take a few well-known routes and never another, highways are unbeatable at reaching every possible destination with the same effort, even though they need a driver for each wagon. A "highway" Pipeline requires more work from the Components' side, but it frees them to go wherever they need to with a precision that a "railway" pipeline cannot accomplish.
-
-## The Structure of Haystack 2.0
-
-By design, pipelines in Haystack 2.0 is not geared toward specific NLP use cases, but it's a minimal, generic [ETL](https://en.wikipedia.org/wiki/Extract,_transform,_load)-like class.
-
-At its core, Haystack 2.0 builds upon these two fundamental concepts:
-
-- The `Component` protocol, a well-defined API that Python classes need to respect to be understood by the pipeline.
-
-- The `Pipeline` object, the graph resolver and execution engine that also performs validation and provides a few utilities on top.
-
-Let's explore these two concepts one by one.
-
-### The Pipeline API
-
-The new `Pipeline` object may remind vaguely of Haystack's original pipeline, and using one should feel very familiar. For example, this is how you assemble a simple Pipeline that performs two additions in Haystack 2.0.
-
-
-```python
-from canals import Pipeline
-from sample_components import AddFixedValue
-
-# Create the Pipeline object
-pipeline = Pipeline()
-
-# Add the components - note the missing`inputs` parameter
-pipeline.add_component("add_one", AddFixedValue(add=1))
-pipeline.add_component("add_two", AddFixedValue(add=2))
-
-# Connect them together
-pipeline.connect("add_one.result", "add_two.value")
-
-# Draw the pipeline
-pipeline.draw("two_additions_pipeline.png")
-
-# Run the pipeline
-results = pipeline.run({"add_one": {"value": 1}})
-
-print(results)
-# prints '{"add_two": {"result": 4}}'
-```
-
-Creating the pipeline requires no special attention: however, you can now pass a `max_loops_allowed` parameter, to limit looping when it's a risk. On the contrary, old Haystack 1.x Pipelines did not support loops at all.
-
-Next, components are added by calling the `Pipeline.add_component(name, component)` method. This is also subject to very similar requirements to the previous `pipeline.add_node`:
-- Every component needs a unique name.
-- Some are reserved (for now, only `_debug`).
-- Instances are not reusable.
-- The object needs to be a component.
-However, we no longer connect the components to each other using this function because, although it is possible to implement in principle, it feels more awkward to use in the case of loops.
-
-Consequently, we introduced a new method, `Pipeline.connect()`. This method follows the syntax `("producer_component.output_name_", "consumer_component.input_name")`: so we don't simply line up two components one after the other, but we connect one of their outputs to one of their inputs in an explicit manner.
-
-This change allows pipelines to perform a much more careful validation of such connections. As we will discover soon, pipeline components in Haystack 2.0 must declare the type of their inputs and outputs. In this way, pipelines not only can make sure that the inputs and outputs exist for the given component, but they can also check whether their types match and can explain connection failures in great detail. For example, if there were a type mismatch, `Pipeline.connect()` will return an error such as:
-
-```markdown
-Cannot connect 'greeter.greeting' with 'add_two.value': their declared input and output
-types do not match.
-
-greeter:
-- greeting: str
-add_two:
-- value: int (available)
-- add: Optional[int] (available)
-```
-
-Once the components are connected together, the resulting pipeline can be drawn. Pipeline drawings in Haystack 2.0 show far more details than their predecessors because the components are forced to share much more information about what they need to run, the types of these variables, and so on. The pipeline above draws the following image:
-
-![A Pipeline making two additions](/posts/2023-10-26-haystack-series-canals/two_additions_pipeline.png)
-
-You can see how the components classes, their inputs and outputs, and all the connections are named and typed.
-
-So, how do you run such a pipeline? By just providing a dictionary of input values. Each starting component should have a small dictionary with all the necessary inputs. In the example above, we pass `1` to the `value` input of `add_one`. The results mirror the input's structure: `add_two` is at the end of the pipeline, so the pipeline will return a dictionary where under the `add_two` key there is a dictionary: `{"result": 4}`.
-
-By looking at the diagram, you may have noticed that these two components have optional inputs. They're not necessary for the pipeline to run, but they can be used to dynamically control the behavior of these components. In this case, `add` controls the "fixed value" this component adds to its primary input. For example:
-
-```python
-pipeline.run({"add_one": {"value": 1, "add": 2}})
-# returns '{"add_two": {"result": 5}}'
-```
-
-```python
-pipeline.run({"add_one": {"value": 1}, "add_two": {"add": 10}})
-# returns '{"add_two": {"result": 12}}'
-```
-
-One evident difficulty of this API is that it might be challenging to understand what to provide to the run method for each component. This issue has also been considered: the pipeline offers a `Pipeline.inputs()` method that returns a structured representation of all the expected input. For our pipeline, it looks like:
-
-```python
-{
- "add_one": {
- "value": {
- "type": int,
- "is_optional": False
- },
- "add": {
- "type": typing.Optional[int],
- "is_optional": True
- }
- },
- "add_two": {
- "add": {
- "type": typing.Optional[int],
- "is_optional": True
- }
- }
-}
-```
-
-## The Component API
-
-Now that we covered the Pipeline's API, let's have a look at what it takes for a Python class to be treated as a pipeline component.
-
-You are going to need:
-
-- **A `@component` decorator**. All component classes must be decorated with the `@component` decorator. This allows a pipeline to discover and validate them.
-
-- **A `run()` method**. This is the method where the main functionality of the component should be carried out. It's invoked by `Pipeline.run()` and has a few constraints, which we will describe later.
-
-- **A `@component.output_types()` decorator for the `run()` method**. This allows the pipeline to validate the connections between components.
-
-- Optionally, **a `warm_up()` method**. It can be used to defer the loading of a heavy resource (think a local LLM or an embedding model) to the warm-up stage that occurs right before the first execution of the pipeline. Components that use `warm_up()` can be added to a Pipeline and connected before the heavy operations are carried out. In this way, the validation that a `Pipeline` performs can happen before resources are wasted.
-
-To summarize, a minimal component can look like this:
-
-```python
-from canals import component
-
-@component
-class Double:
-
- @component.output_types(result=int)
- def run(self, value: int):
- return {"result": value * 2}
-```
-
-### Pipeline Validation
-Note how the `run()` method has a few peculiar features. One is that all the method parameters need to be typed: if `value` was not declared as `value: int`, the pipeline would raise an exception demanding for typing.
-
-This is the way components declare to the pipeline which inputs they expect and of which type: this is the first half of the information needed to perform the validation that `Pipeline.connect()` carries out.
-
-The other half of the information comes from the `@component.output_types` decorator. Pipelines demand that components declare how many outputs the component will produce and of what type. One may ask why not rely on typing for the outputs, just as we've done for the inputs. So why not simply declare components as:
-
-
-```python
-@component
-class Double:
-
- def run(self, value: int) -> int:
- return value * 2
-```
-
-For `Double`, this is a legitimate solution. However, let's see an example with another component called `CheckParity`: if a component's input value is even, it sends it unchanged over the `even` output, while if it's odd, it will send it over the `odd` output. The following clearly doesn't work: we're not communicating anywhere to Canals which output is even and which one is odd.
-
-```python
-@component
-class CheckParity:
-
- def run(self, value: int) -> int:
- if value % 2 == 0:
- return value
- return value
-```
-
-How about this instead?
-
-```python
-@component
-class CheckParity:
-
- def run(self, value: int) -> Dict[str, int]:
- if value % 2 == 0:
- return {"even": value}
- return {"odd": value}
-```
-
-This approach carries all the information required. However, such information is only available after the `run()` method is called. Unless we parse the method to discover all return statements and their keys (which is not always possible), pipelines cannot know all the keys the return dictionary may have. So, it can't validate the connections when `Pipeline.connect()` is called.
-
-The decorator bridges the gap by allowing the class to declare in advance what outputs it will produce and of which type. Pipeline trusts this information to be correct and validates the connections accordingly.
-
-Okay, but what if the component is very dynamic? The output type may depend on the input type. Perhaps the number of inputs depends on some initialization parameter. In these cases, pipelines allow components to declare the inputs and output types in their init method as such:
-
-```python
-@component
-class HighlyDynamicComponent:
-
- def __init__(self, ...):
- component.set_input_types(self, input_name=input_type, ...)
- component.set_output_types(self, output_name=output_type, ...)
-
- def run(self, **kwargs):
- ...
-```
-
-Note that there's no more typing on `run()`, and the decorator is gone. The information provided in the init method is sufficient for the pipeline to validate the connections.
-
-One more feature of the inputs and output declarations relates to optional and variadic values. Pipelines in Haystack 2.0 support this both through a mix of type checking and signature inspection. For example, let's have a look at how the `AddFixedValue` we've seen earlier looks like:
-
-```python
-from typing import Optional
-from canals import component
-
-
-@component
-class AddFixedValue:
- """
- Adds two values together.
- """
-
- def __init__(self, add: int = 1):
- self.add = add
-
- @component.output_types(result=int)
- def run(self, value: int, add: Optional[int] = None):
- """
- Adds two values together.
- """
- if add is None:
- add = self.add
- return {"result": value + add}
-```
-
-You can see that `add`, the optional parameter we met before, has a default value. Adding a default value to a parameter in the `run()` signature tells the pipeline that the parameter itself is optional, so the component can run even if that specific input doesn't receive any value from the pipeline's input or other components.
-
-Another component that generalizes the sum operation is `Sum`, which instead looks like this:
-
-```python
-from canals import component
-from canals.component.types import Variadic
-
-@component
-class Sum:
- """
- Adds all its inputs together.
- """
-
- @component.output_types(total=int)
- def run(self, values: Variadic[int]):
- """
- :param values: the values to sum
- """
- return {"total": sum(v for v in values if v is not None)}
-```
-
-In this case, we used the special type `Variadic` to tell the pipeline that the `values` input can receive data from multiple producers, instead of just one. Therefore, `values` is going to be a list type, but it can be connected to single `int` outputs, making it a valuable aggregator.
-
-## Serialization
-
-Just like old Haystack Pipelines, the new pipelines can be serialized. However, this feature suffered from similar problems plaguing the execution model, so it was changed radically.
-
-The original pipeline gathered intrusive information about each of its components when initialized, leveraging the shared `BaseComponent` class. Conversely, the `Pipeline` delegates the serialization process entirely to its components.
-
-If a component wishes to be serializable, it must provide two additional methods, `to_dict` and `from_dict`, which perform serialization and deserialization to a dictionary. The pipeline limits itself to calling each of its component's methods, collecting their output, grouping them together with some limited extra information (such as the connections between them), and returning the result.
-
-For example, if `AddFixedValue` were serializable, its serialized version could look like this:
-
-```python
-{
- "type": "AddFixedValue",
- "init_parameters": {
- "add": 1
- }
-}
-```
-
-The entire pipeline we used above would end up as follows:
-
-```python
-{
- "max_loops_allowed": 100,
- "components": {
- "add_one": {
- "type": "AddFixedValue",
- "init_parameters": {
- "add": 1
- }
- },
- "add_two": {
- "type": "AddFixedValue",
- "init_parameters": {
- "add": 2
- }
- }
- },
- "connections": [
- {
- "sender": "add_one.result",
- "receiver": "add_two.value",
- }
- ]
-}
-```
-
-Notice how the components are free to perform serialization in the way they see fit. The only requirement imposed by the `Pipeline` is the presence of two top-level keys, `type` and `init_parameters`, which are necessary for the pipeline to deserialize each component into the correct class.
-
-This is useful, especially if the component's state includes some non-trivial values, such as objects, API keys, or other special values. Pipeline no longer needs to know how to serialize everything the Components may contain: the task is fully delegated to them, which always knows best what needs to be done.
-
-## But... do we need any of this?
-
-Having done a tour of the new `Pipeline` features, one might have noticed one detail. There's a bit more work involved in using a Pipeline than there was before: you can't just chain every component after every other. There are connections to be made, validation to perform, graphs to assemble, and so on.
-
-In exchange, the pipeline is now way more powerful than before. Sure, but so is a plain Python script. Do we *really* need the Pipeline object? And what do we need it for?
-
-- **Validation**. While components normally validate their inputs and outputs, the pipeline does all the validation before the components run, even before loading heavy resources. This makes the whole system far less likely to fail at runtime for a simple input/output mismatch, which can be priceless for complex applications.
-
-- **Serialization**. Redistributing code is always tricky: redistributing a JSON file is much safer. Pipelines make it possible to represent complex systems in a readable JSON file that can be edited, shared, stored, deployed, and re-deployed on different backends at need.
-
-- **Drawing**: The new Pipeline offers a way to see your system clearly and automatically, which is often very handy for debugging, inspecting the system, and collaborating on the pipeline's design.
-
-- On top of this, the pipeline abstraction promotes flatter API surfaces by discouraging components nesting one within the other and providing easy-to-use, single-responsibility components that are easy to reason about.
-
-Having said all of this, however, we don't believe that the pipeline design makes Haystack win or lose. Pipelines are just a bonus on top of what provides the real value: a broad set of components that reliably perform well-defined tasks. That's why the Component API does not make the `run()` method awkward to use outside of a Pipeline: calling `Sum.run(values=[1, 2, 3])` feels Pythonic outside of a pipeline and always will.
-
-In the following posts, I will explore the world of Haystack components, starting from our now familiar use cases: RAG Pipelines.
-
----
-
-*Next: [RAG Pipelines from scratch](/posts/2023-10-27-haystack-series-rag)*
-
-*Previous: [Haystack's Pipeline](/posts/2023-10-15-haystack-series-pipeline)*
-
-*See the entire series here: [Haystack 2.0 series](/series/haystack-2.0-series/)*
\ No newline at end of file
diff --git a/content/posts/2023-10-27-haystack-series-rag.md b/content/posts/2023-10-27-haystack-series-rag.md
deleted file mode 100644
index 2df2db1f..00000000
--- a/content/posts/2023-10-27-haystack-series-rag.md
+++ /dev/null
@@ -1,442 +0,0 @@
----
-title: "RAG Pipelines from scratch"
-date: 2023-10-27
-author: "ZanSara"
-series: ["Haystack 2.0 Series"]
-featuredImage: "/posts/2023-10-27-haystack-series-rag/cover.png"
----
-
-*Last updated: 18/01/2023 - Read it on the [Haystack Blog](https://haystack.deepset.ai/blog/rag-pipelines-from-scratch).*
-
----
-
-Retrieval Augmented Generation (RAG) is quickly becoming an essential technique to make LLMs more reliable and effective at answering any question, regardless of how specific. To stay relevant in today's NLP landscape, Haystack must enable it.
-
-Let's see how to build such applications with Haystack 2.0, from a direct call to an LLM to a fully-fledged, production-ready RAG pipeline that scales. At the end of this post, we will have an application that can answer questions about world countries based on data stored in a private database. At that point, the knowledge of the LLM will be only limited by the content of our data store, and all of this can be accomplished without fine-tuning language models.
-
-{{< notice info >}}
-
-💡 *I recently gave a talk about RAG applications in Haystack 2.0, so if you prefer videos to blog posts, you can find the recording [here](https://zansara.dev/talks/2023-10-12-office-hours-rag-pipelines/). Keep in mind that the code shown might be outdated.*
-
-{{< /notice >}}
-
-## What is RAG?
-
-The idea of Retrieval Augmented Generation was first defined in a [paper](https://arxiv.org/abs/2005.11401) by Meta in 2020. It was designed to solve a few of the inherent limitations of seq2seq models (language models that, given a sentence, can finish writing it for you), such as:
-
-- Their internal knowledge, as vast as it may be, will always be limited and at least slightly out of date.
-- They work best on generic topics rather than niche and specific areas unless they're fine-tuned on purpose, which is a costly and slow process.
-- All models, even those with subject-matter expertise, tend to "hallucinate": they confidently produce false statements backed by apparently solid reasoning.
-- They cannot reliably cite their sources or tell where their knowledge comes from, which makes fact-checking their replies nontrivial.
-
-RAG solves these issues of "grounding" the LLM to reality by providing some relevant, up-to-date, and trusted information to the model together with the question. In this way, the LLM doesn't need to draw information from its internal knowledge, but it can base its replies on the snippets provided by the user.
-
-![RAG Paper diagram](/posts/2023-10-27-haystack-series-rag/rag-paper-image.png "A visual representation of RAG from the original paper")
-
-As you can see in the image above (taken directly from the original paper), a system such as RAG is made of two parts: one that finds text snippets that are relevant to the question asked by the user and a generative model, usually an LLM, that rephrases the snippets into a coherent answer for the question.
-
-Let's build one of these with Haystack 2.0!
-
-{{< notice info >}}
-
-💡 *Do you want to see this code in action? Check out the Colab notebook [here](https://colab.research.google.com/drive/1FkDNS3hTO4oPXHFbXQcldls0kf-KTq-r?usp=sharing) or the gist [here](https://gist.github.com/ZanSara/0af1c2ac6c71d0a723c179cc6ec1ac41)*.
-
-{{< /notice >}}
-
-{{< notice warning >}}
-
-⚠️ **Warning:** *This code was tested on `haystack-ai==2.0.0b5`. Haystack 2.0 is still unstable, so later versions might introduce breaking changes without notice until Haystack 2.0 is officially released. The concepts and components however stay the same.*
-
-{{< /notice >}}
-
-## Generators: Haystack's LLM components
-
-As every NLP framework that deserves its name, Haystack supports LLMs in different ways. The easiest way to query an LLM in Haystack 2.0 is through a Generator component: depending on which LLM and how you intend to query it (chat, text completion, etc...), you should pick the appropriate class.
-
-We're going to use `gpt-3.5-turbo` (the model behind ChatGPT) for these examples, so the component we need is [`OpenAIGenerator`](https://docs.haystack.deepset.ai/v2.0/docs/openaigenerator). Here is all the code required to use it to query OpenAI's `gpt-3.5-turbo` :
-
-```python
-from haystack.components.generators import OpenAIGenerator
-
-generator = OpenAIGenerator(api_key=api_key)
-generator.run(prompt="What's the official language of France?")
-# returns {"replies": ['The official language of France is French.']}
-```
-You can select your favorite OpenAI model by specifying a `model_name` at initialization, for example, `gpt-4`. It also supports setting an `api_base_url` for private deployments, a `streaming_callback` if you want to see the output generated live in the terminal, and optional `kwargs` to let you pass whatever other parameter the model understands, such as the number of answers (`n`), the temperature (`temperature`), etc.
-
-Note that in this case, we're passing the API key to the component's constructor. This is unnecessary: `OpenAIGenerator` can read the value from the `OPENAI_API_KEY` environment variable and also from the `api_key` module variable of [`openai`'s SDK](https://github.com/openai/openai-python#usage).
-
-Right now, Haystack supports HuggingFace models through the [`HuggingFaceLocalGenerator`](https://docs.haystack.deepset.ai/v2.0/docs/huggingfacelocalgenerator) and [`HuggingFaceTGIGenerator`](https://docs.haystack.deepset.ai/v2.0/docs/huggingfacetgigenerator) components, and many more LLMs are coming soon.
-
-
-## PromptBuilder: structured prompts from templates
-
-Let's imagine that our LLM-powered application also comes with some pre-defined questions that the user can select instead of typing in full. For example, instead of asking them to type `What's the official language of France?`, we let them select `Tell me the official languages` from a list, and they simply need to type "France" (or "Wakanda" for a change - our chatbot needs some challenges too).
-
-In this scenario, we have two pieces of the prompt: a variable (the country name, like "France") and a prompt template, which in this case is `"What's the official language of {{ country }}?"`
-
-Haystack offers a component that can render variables into prompt templates: it's called [`PromptBuilder`](https://docs.haystack.deepset.ai/v2.0/docs/promptbuilder). As the generators we've seen before, also `PromptBuilder` is nearly trivial to initialize and use.
-
-```python
-from haystack.components.builders import PromptBuilder
-
-prompt_builder = PromptBuilder(template="What's the official language of {{ country }}?")
-prompt_builder.run(country="France")
-# returns {'prompt': "What's the official language of France?"}
-```
-
-Note how we defined a variable, `country`, by wrapping its name in double curly brackets. PromptBuilder lets you define any input variable that way: if the prompt template was `"What's the official language of {{ nation }}?"`, the `run()` method of `PromptBuilder` would have expected a `nation` input.
-
-This syntax comes from [Jinja2](https://jinja.palletsprojects.com/en/3.0.x/intro/), a popular templating library for Python. If you have ever used Flask, Django, or Ansible, you will feel at home with `PromptBuilder`. Instead, if you never heard of any of these libraries, you can check out the [syntax](https://jinja.palletsprojects.com/en/3.0.x/templates/) on Jinja's documentation. Jinja has a powerful templating language and offers way more features than you'll ever need in prompt templates, ranging from simple if statements and for loops to object access through dot notation, nesting of templates, variables manipulation, macros, full-fledged import and encapsulation of templates, and more.
-
-## A Simple Generative Pipeline
-
-With these two components, we can assemble a minimal pipeline to see how they work together. Connecting them is trivial: `PromptBuilder` generates a `prompt` output, and `OpenAIGenerator` expects an input with the same name and type.
-
-```python
-from haystack import Pipeline
-from haystack.components.generators import OpenAIGenerator
-from haystack.components.builders import PromptBuilder
-
-pipe = Pipeline()
-pipe.add_component("prompt_builder", PromptBuilder(template="What's the official language of {{ country }}?"))
-pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
-pipe.connect("prompt_builder", "llm")
-
-pipe.run({"prompt_builder": {"country": "France"}})
-# returns {"llm": {"replies": ['The official language of France is French.'] }}
-```
-
-Here is the pipeline graph:
-
-![Simple LLM pipeline](/posts/2023-10-27-haystack-series-rag/simple-llm-pipeline.png)
-
-## Make the LLM cheat
-
-Building the Generative part of a RAG application was very simple! So far, we only provided the question to the LLM, but no information to base its answers on. Nowadays, LLMs possess a lot of general knowledge, so questions about famous countries such as France or Germany are easy for them to reply to correctly. However, when using an app about world countries, some users may be interested in knowing more about obscure or defunct microstates that don't exist anymore. In this case, ChatGPT is unlikely to provide the correct answer without any help.
-
-For example, let's ask our pipeline something *really* obscure.
-
-```python
-pipe.run({"prompt_builder": {"country": "the Republic of Rose Island"}})
-# returns {
-# "llm": {
-# "replies": [
-# 'The official language of the Republic of Rose Island was Italian.'
-# ]
-# }
-# }
-```
-
-The answer is an educated guess but is not accurate: although it was located just outside of Italy's territorial waters, according to [Wikipedia](https://en.wikipedia.org/wiki/Republic_of_Rose_Island) the official language of this short-lived micronation was Esperanto.
-
-How can we get ChatGPT to reply to such a question correctly? One way is to make it "cheat" by providing the answer as part of the question. In fact, `PromptBuilder` is designed to serve precisely this use case.
-
-Here is our new, more advanced prompt:
-
-```text
-Given the following information, answer the question.
-Context: {{ context }}
-Question: {{ question }}
-```
-
-Let's build a new pipeline using this prompt!
-
-```python
-context_template = """
-Given the following information, answer the question.
-Context: {{ context }}
-Question: {{ question }}
-"""
-language_template = "What's the official language of {{ country }}?"
-
-pipe = Pipeline()
-pipe.add_component("context_prompt", PromptBuilder(template=context_template))
-pipe.add_component("language_prompt", PromptBuilder(template=language_template))
-pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
-pipe.connect("language_prompt", "context_prompt.question")
-pipe.connect("context_prompt", "llm")
-
-pipe.run({
- "context_prompt": {"context": "Rose Island had its own government, currency, post office, and commercial establishments, and the official language was Esperanto."}
- "language_prompt": {"country": "the Republic of Rose Island"}
-})
-# returns {
-# "llm": {
-# "replies": [
-# 'The official language of the Republic of Rose Island is Esperanto.'
-# ]
-# }
-# }
-```
-Let's look at the graph of our Pipeline:
-
-![Double PromptBuilder pipeline](/posts/2023-10-27-haystack-series-rag/double-promptbuilder-pipeline.png)
-
-The beauty of `PromptBuilder` lies in its flexibility. It allows users to chain instances together to assemble complex prompts from simpler schemas: for example, we used the output of the first `PromptBuilder` as the value of `question` in the second prompt.
-
-However, in this specific scenario, we can build a simpler system by merging the two prompts into one.
-
-```text
-Given the following information, answer the question.
-Context: {{ context }}
-Question: What's the official language of {{ country }}?
-```
-
-Using this new prompt, the resulting pipeline becomes again very similar to our first.
-
-```python
-template = """
-Given the following information, answer the question.
-Context: {{ context }}
-Question: What's the official language of {{ country }}?
-"""
-pipe = Pipeline()
-pipe.add_component("prompt_builder", PromptBuilder(template=template))
-pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
-pipe.connect("prompt_builder", "llm")
-
-pipe.run({
- "prompt_builder": {
- "context": "Rose Island had its own government, currency, post office, and commercial establishments, and the official language was Esperanto.",
- "country": "the Republic of Rose Island"
- }
-})
-# returns {
-# "llm": {
-# "replies": [
-# 'The official language of the Republic of Rose Island is Esperanto.'
-# ]
-# }
-# }
-```
-
-![PromptBuilder with two inputs pipeline](/posts/2023-10-27-haystack-series-rag/double-variable-promptbuilder-pipeline.png)
-
-
-## Retrieving the context
-
-For now, we've been playing with prompts, but the fundamental question remains unanswered: where do we get the correct text snippet for the question the user is asking? We can't expect such information as part of the input: we need our system to be able to fetch this information independently, based uniquely on the query.
-
-Thankfully, retrieving relevant information from large [corpora](https://en.wikipedia.org/wiki/Text_corpus) (a technical term for extensive collections of data, usually text) is a task that Haystack excels at since its inception: the components that perform this task are called [Retrievers](https://docs.haystack.deepset.ai/v2.0/docs/retrievers).
-
-Retrieval can be performed on different data sources: to begin, let's assume we're searching for data in a local database, which is the use case that most Retrievers are geared towards.
-
-Let's create a small local database to store information about some European countries. Haystack offers a neat object for these small-scale demos: `InMemoryDocumentStore`. This document store is little more than a Python dictionary under the hood but provides the same exact API as much more powerful data stores and vector stores, such as [Elasticsearch](https://github.com/deepset-ai/haystack-core-integrations/tree/main/document_stores/elasticsearch) or [ChromaDB](https://haystack.deepset.ai/integrations/chroma-documentstore). Keep in mind that the object is called "Document Store" and not simply "datastore" because what it stores is Haystack's Document objects: a small dataclass that helps other components make sense of the data that they receive.
-
-So, let's initialize an `InMemoryDocumentStore` and write some `Documents` into it.
-
-```python
-from haystack.dataclasses import Document
-from haystack.document_stores.in_memory import InMemoryDocumentStore
-
-documents = [
- Document(content="German is the the official language of Germany."),
- Document(content="The capital of France is Paris, and its official language is French."),
- Document(content="Italy recognizes a few official languages, but the most widespread one is Italian."),
- Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")
-]
-docstore = InMemoryDocumentStore()
-docstore.write_documents(documents=documents)
-
-docstore.filter_documents()
-# returns [
-# Document(content="German is the the official language of Germany."),
-# Document(content="The capital of France is Paris, and its official language is French."),
-# Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea."),
-# Document(content="Italy recognizes a few official languages, but the most widespread one is Italian."),
-# ]
-```
-
-Once the document store is set up, we can initialize a retriever. In Haystack 2.0, each document store comes with its own set of highly optimized retrievers: `InMemoryDocumentStore` offers two, one based on BM25 ranking and one based on embedding similarity.
-
-Let's start with the BM25-based retriever, which is slightly easier to set up. Let's first use it in isolation to see how it behaves.
-
-```python
-from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
-
-retriever = InMemoryBM25Retriever(document_store=docstore)
-retriever.run(query="Rose Island", top_k=1)
-# returns [
-# Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")
-# ]
-
-retriever.run(query="Rose Island", top_k=3)
-# returns [
-# Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")
-# Document(content="Italy recognizes a few official languages, but the most widespread one is Italian."),
-# Document(content="The capital of France is Paris, and its official language is French."),
-# ]
-```
-
-We see that [`InMemoryBM25Retriever`](https://docs.haystack.deepset.ai/v2.0/reference/retriever-api#inmemorybm25retriever) accepts a few parameters. `query` is the question we want to find relevant documents for. In the case of BM25, the algorithm only searches for exact word matches. The resulting retriever is very fast, but it doesn't fail gracefully: it can't handle spelling mistakes, synonyms, or descriptions of an entity. For example, documents containing the word "cat" would be considered irrelevant against a query such as "felines".
-
-`top_k` controls the number of documents returned. We can see that in the first example, only one document is returned, the correct one. In the second, where `top_k = 3`, the retriever is forced to return three documents even if just one is relevant, so it picks the other two randomly. Although the behavior is not optimal, BM25 guarantees that if there is a document that is relevant to the query, it will be in the first position, so for now, we can use it with `top_k=1`.
-
-Retrievers also accepts a `filters` parameter, which lets you pre-filter the documents before retrieval. This is a powerful technique that comes useful in complex applications, but for now we have no use for it. I will talk more in detail about this topic, called metadata filtering, in a later post.
-
-Let's now make use of this new component in our Pipeline.
-
-## Our first RAG Pipeline
-
-The retriever does not return a single string but a list of Documents. How do we put the content of these objects into our prompt template?
-
-It's time to use Jinja's powerful syntax to do some unpacking on our behalf.
-
-```text
-Given the following information, answer the question.
-
-Context:
-{% for document in documents %}
- {{ document.content }}
-{% endfor %}
-
-Question: What's the official language of {{ country }}?
-```
-
-Notice how, despite the slightly alien syntax for a Python programmer, what the template does is reasonably evident: it iterates over the documents and, for each of them, renders their `content` field.
-
-With all these pieces set up, we can finally put them all together.
-
-```python
-template = """
-Given the following information, answer the question.
-
-Context:
-{% for document in documents %}
- {{ document.content }}
-{% endfor %}
-
-Question: What's the official language of {{ country }}?
-"""
-pipe = Pipeline()
-
-pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
-pipe.add_component("prompt_builder", PromptBuilder(template=template))
-pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
-pipe.connect("retriever", "prompt_builder.documents")
-pipe.connect("prompt_builder", "llm")
-
-pipe.run({
- "retriever": {"query": country},
- "prompt_builder": {
- "country": "the Republic of Rose Island"
- }
-})
-# returns {
-# "llm": {
-# "replies": [
-# 'The official language of the Republic of Rose Island is Esperanto.'
-# ]
-# }
-# }
-```
-
-![BM25 RAG Pipeline](/posts/2023-10-27-haystack-series-rag/bm25-rag-pipeline.png)
-
-Congratulations! We've just built our first, true-to-its-name RAG Pipeline.
-
-
-## Scaling up: Elasticsearch
-
-So, we now have our running prototype. What does it take to scale this system up for production workloads?
-
-Of course, scaling up a system to production readiness is no simple task that can be addressed in a paragraph. Still, we can start this journey with one component that can readily be improved: the document store.
-
-`InMemoryDocumentStore` is clearly a toy implementation: Haystack supports much more performant document stores that make more sense to use in a production scenario. Since we have built our app with a BM25 retriever, let's select [Elasticsearch](https://haystack.deepset.ai/integrations/elasticsearch-document-store) as our production-ready document store of choice.
-
-{{< notice warning >}}
-
-⚠️ **Warning:** *While ES is a valid document store to use in this scenario, nowadays if often makes more sense to choose a more specialized document store such as [Weaviate](https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/weaviate), [Qdrant](https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/qdrant), and so on. Check [this page](https://github.com/deepset-ai/haystack-core-integrations/tree/main) to see which document stores are currently supported for Haystack 2.0.*
-
-{{< /notice >}}
-
-How do we use Elasticsearch on our pipeline? All it takes is to swap out `InMemoryDocumentStore` and `InMemoryBM25Retriever` with their Elasticsearch counterparts, which offer nearly identical APIs.
-
-First, let's create the document store: we will need a slightly more complex setup to connect to the Elasticearch backend. In this example, we use Elasticsearch version 8.8.0, but every Elasticsearch 8 version should work.
-
-```python
-from elasticsearch_haystack.document_store import ElasticsearchDocumentStore
-
-host = os.environ.get("ELASTICSEARCH_HOST", "https://localhost:9200")
-user = "elastic"
-pwd = os.environ["ELASTICSEARCH_PASSWORD"] # You need to provide this value
-
-docstore = ElasticsearchDocumentStore(
- hosts=[host],
- basic_auth=(user, pwd),
- ca_certs="/content/elasticsearch-8.8.0/config/certs/http_ca.crt"
-)
-```
-
-Now, let's write again our four documents into the store. In this case, we specify the duplicate policy, so if the documents were already present, they would be overwritten. All Haystack document stores offer three policies to handle duplicates: `FAIL` (the default), `SKIP`, and `OVERWRITE`.
-
-```python
-from haystack.document_stores import DuplicatePolicy
-documents = [
- Document(content="German is the the official language of Germany."),
- Document(content="The capital of France is Paris, and its official language is French."),
- Document(content="Italy recognizes a few official languages, but the most widespread one is Italian."),
- Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")
-]
-docstore.write_documents(documents=documents, policy=DuplicatePolicy.OVERWRITE)
-```
-
-Once this is done, we are ready to build the same pipeline as before, but using `ElasticsearchBM25Retriever`.
-
-```python
-from elasticsearch_haystack.bm25_retriever import ElasticsearchBM25Retriever
-
-template = """
-Given the following information, answer the question.
-
-Context:
-{% for document in documents %}
- {{ document.content }}
-{% endfor %}
-
-Question: What's the official language of {{ country }}?
-"""
-
-pipe = Pipeline()
-pipe.add_component("retriever", ElasticsearchBM25Retriever(document_store=docstore))
-pipe.add_component("prompt_builder", PromptBuilder(template=template))
-pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
-pipe.connect("retriever", "prompt_builder.documents")
-pipe.connect("prompt_builder", "llm")
-
-pipe.draw("elasticsearch-rag-pipeline.png")
-
-country = "the Republic of Rose Island"
-pipe.run({
- "retriever": {"query": country},
- "prompt_builder": {"country": country}
-})
-# returns {
-# "llm": {
-# "replies": [
-# 'The official language of the Republic of Rose Island is Esperanto.'
-# ]
-# }
-# }
-```
-
-![Elasticsearch RAG Pipeline](/posts/2023-10-27-haystack-series-rag/elasticsearch-rag-pipeline.png)
-
-That's it! We're now running the same pipeline over a production-ready Elasticsearch instance.
-
-## Wrapping up
-
-In this post, we've detailed some fundamental components that make RAG applications possible with Haystack: Generators, the PromptBuilder, and Retrievers. We've seen how they can all be used in isolation and how you can make Pipelines out of them to achieve the same goal. Last, we've experimented with some of the (very early!) features that make Haystack 2.0 production-ready and easy to scale up from a simple demo with minimal changes.
-
-However, this is just the start of our journey into RAG. Stay tuned!
-
----
-
-*Next: [Indexing data for RAG applications](/posts/2023-11-05-haystack-series-minimal-indexing)*
-
-*Previous: [Canals: a new concept of Pipeline](/posts/2023-10-26-haystack-series-canals)*
-
-*See the entire series here: [Haystack 2.0 series](/series/haystack-2.0-series/)*
-
-*Cover image from [Wikipedia](https://it.wikipedia.org/wiki/File:Isoladellerose.jpg)*
diff --git a/content/posts/2023-11-05-haystack-series-minimal-indexing.md b/content/posts/2023-11-05-haystack-series-minimal-indexing.md
deleted file mode 100644
index e24665fe..00000000
--- a/content/posts/2023-11-05-haystack-series-minimal-indexing.md
+++ /dev/null
@@ -1,247 +0,0 @@
----
-title: "Indexing data for RAG applications"
-date: 2023-11-05
-author: "ZanSara"
-series: ["Haystack 2.0 Series"]
-featuredImage: "/posts/2023-11-05-haystack-series-minimal-indexing/cover.png"
----
-
-*Last updated: 18/01/2023*
-
----
-
-In the [previous post](/posts/2023-10-27-haystack-series-rag) of the Haystack 2.0 series, we saw how to build RAG pipelines using a generator, a prompt builder, and a retriever with its document store. However, the content of our document store wasn't extensive, and populating one with clean, properly formatted data is not an easy task. How can we approach this problem?
-
-In this post, I will show you how to use Haystack 2.0 to create large amounts of documents from a few web pages and write them a document store that you can then use for retrieval.
-
-{{< notice info >}}
-
-💡 *Do you want to see the code in action? Check out the [Colab notebook](https://colab.research.google.com/drive/155CtcumiK5w3wX6FWyM1dG3OqnhwnCqy?usp=sharing) or the [gist](https://gist.github.com/ZanSara/ba7efd241c61ccfd12ed48195e23bb34).*
-
-{{< /notice >}}
-
-{{< notice warning >}}
-
-⚠️ **Warning:** *This code was tested on `haystack-ai==2.0.0b5`. Haystack 2.0 is still unstable, so later versions might introduce breaking changes without notice until Haystack 2.0 is officially released. The concepts and components, however, stay the same.*
-
-{{< /notice >}}
-
-
-# The task
-
-In Haystack's terminology, the process of extracting information from a group of files and storing the data in a document store is called "indexing". The process includes, at the very minimum, reading the content of a file, generating a Document object containing all its text, and then storing it in a document store.
-
-However, indexing pipelines often do more than this. They can process more than one file type, like .txt, .pdf, .docx, .html, audio, video, and images. Having many file types to convert, they route each file to the proper converter based on its type. Files tend to contain way more text than a normal LLM can chew, so they need to split those huge Documents into smaller chunks. Also, the converters are not perfect at reading text from the files, so they need to clean the data from artifacts such as page numbers, headers, footers, and so on. On top of all of this, if you plan to use a retriever that is based on embedding similarity, your indexing pipeline will also need to embed all documents before writing them into the store.
-
-Sounds like a lot of work!
-
-In this post, we will focus on the preprocessing part of the pipeline: cleaning, splitting, and writing documents. I will talk about the other functionalities of indexing pipelines, such as document embedding and multiple file types routing, in later posts.
-
-# Converting files
-
-As we've just seen, the most important task of this pipeline is to convert files into Documents. Haystack provides several converters for this task: at the time of writing, it supports:
-
-- Raw text files (`TextFileToDocument`)
-- HTML files, so web pages in general (`HTMLToDocument`)
-- PDF files, by extracting text natively (`PyPDFToDocument`)
-- Image files, PDFs with images, and Office files with images, by OCR (`AzureOCRDocumentConverter`)
-- Audio files, doing transcription with Whisper either locally (`LocalWhisperTranscriber`) or remotely using OpenAI's hosted models (`RemoteWhisperTranscriber`)
-- A ton of [other formats](https://tika.apache.org/2.9.1/formats.html), such as Microsoft's Office formats, thanks to [Apache Tika](https://tika.apache.org/) (`TikaDocumentConverter`)
-
-For this example, let's assume we have a collection of web pages downloaded from the Internet. These pages are our only source of information and contain all we want our RAG application to know about.
-
-In this case, our converter of choice is `HTMLToDocument`. `HTMLToDocument` is a Haystack component that understands HTML and can filter all the markup away, leaving only meaningful text. Remember that this is a file converter, not a URL fetcher: it can only process local files, such as a website crawl. Haystack provides some components to fetch web pages, but we will see them later.
-
-Here is how you can use this converter:
-
-```python
-from haystack.components.converters import HTMLToDocument
-
-path = "Republic_of_Rose_Island.html"
-
-converter = HTMLToDocument()
-converter.run(sources=[path])
-
-# returns {"documents": [Document(content="The Republic of Rose Isla...")]}
-```
-
-`HTMLToDocument` is a straightforward component that offers close to no parameters to customize its behavior. Of its API, one notable feature is its input type: this converter can take paths to local files in the form of strings or `Path` objects, but it also accepts `ByteStream` objects.
-
-`ByteStream` is a handy Haystack abstraction that makes handling binary streams easier. If a component accepts `ByteStream` as input, you don't necessarily have to save your web pages to file before passing them to this converter. This allows components that retrieve large files from the Internet to pipe their output directly into this component without saving the data to disk first, which can save a lot of time.
-
-# Cleaning the text
-
-With `HTMLToDocument`, we can convert whole web pages into large Document objects. The converter typically does a decent job of filtering out the markup. Still, it's not always perfect. To compensate for these occasional issues, Haystack offers a component called `DocumentCleaner` that can remove noise from the text of the documents.
-
-Just like any other component, `DocumentCleaner` is straightforward to use:
-
-```python
-from haystack.components.preprocessors import DocumentCleaner
-
-cleaner = DocumentCleaner()
-cleaner.run(documents=documents)
-# returns {"documents": [Document(content=...), Document(content=...), ...]}
-```
-
-The effectiveness of `DocumentCleaner` depends a lot on the type of converter you use. Some flags, such as `remove_empty_lines` and `remove_extra_whitespace`, are minor fixes that can come in handy but usually have little impact on the quality of the results when used in a RAG pipeline. They can, however, make a vast difference for Extractive QA pipelines.
-
-Other parameters, like `remove_substrings` or `remove_regex`, work very well but need manual inspection and iteration from a human to get right. For example, for Wikipedia pages, we could use these parameters to remove all instances of the word `"Wikipedia"`, which are undoubtedly many and irrelevant.
-
-Finally, `remove_repeated_substrings` is a convenient method that removes headers and footers from long text, for example, books and articles. However, it works only for PDFs and, to a limited degree, for text files because it relies on the presence of form feed characters (`\f`), which are rarely present in web pages.
-
-# Splitting the text
-
-Now that the text is cleaned up, we can move onto a more exciting step: text splitting.
-
-So far, each Document stored the content of an entire file. If a file was a whole book with hundreds of pages, a single Document would contain hundreds of thousands of words, which is clearly too much for an LLM to make sense of. Such a large Document is also challenging for Retrievers to understand because it contains so much text that it looks relevant to every possible question. To populate our document store with data that can be used effectively by a RAG pipeline, we need to chunk this data into much smaller Documents.
-
-That's where `DocumentSplitter` comes into play.
-
-{{< notice info >}}
-
-💡 *With LLMs in a race to offer the [largest context window](https://magic.dev/blog/ltm-1) and research showing that such a chase is [counterproductive](https://arxiv.org/abs/2307.03172), there is no general consensus about how splitting Documents for RAG impacts the LLM's performance.*
-
-*What you need to keep in mind is that splitting implies a tradeoff. Huge documents will always be slightly relevant for every question, but they will bring a lot of context, which may or may not confuse the model. On the other hand, tiny Documents are much more likely to be retrieved only for questions they're highly relevant for, but they might provide too little context for the LLM to really understand their meaning.*
-
-*Tweaking the size of your Documents for the specific LLM you're using and the topic of your documents is one way to optimize your RAG pipeline, so be ready to experiment with different Document sizes before committing to one.*
-
-{{< /notice >}}
-
-How is it used?
-
-```python
-from haystack.components.preprocessors.text_document_splitter import DocumentSplitter
-
-text_splitter = DocumentSplitter(split_by="sentence", split_length=5)
-text_splitter.run(documents=documents)
-
-# returns {"documents": [Document(content=...), Document(content=...), ...]}
-```
-
-`DocumentSplitter` lets you configure the approximate size of the chunks you want to generate with three parameters: `split_by`, `split_length`, and `split_overlap`.
-
-`split_by` defines the unit to use when splitting some text. For now, the options are `word`, `sentence`, and `passage` (paragraph), but we will soon add other options.
-
-`split_length` is the number of the units defined above each document should include. For example, if the unit is `sentence`, `split_length=10` means that all your Documents will contain 10 sentences worth of text (except usually for the last document, which may have less). If the unit was `word`, it would instead contain 10 words.
-
-`split_overlap` is the amount of units that should be included from the previous Document. For example, if the unit is `sentence` and the length is `10`, setting `split_overlap=2` means that the last two sentences of the first document will also be present at the start of the second, which will include only 8 new sentences for a total of 10. Such repetition carries over to the end of the text to split.
-
-# Writing to the store
-
-Once all of this is done, we can finally move on to the last step of our journey: writing the Documents into our document store. We first create the document store:
-
-```python
-from haystack.document_stores.in_memory import InMemoryDocumentStore
-
-document_store = InMemoryDocumentStore()
-```
-
-and then use `DocumentWriter` to actually write the documents in:
-
-
-```python
-from haystack.components.writers import DocumentWriter
-
-writer = DocumentWriter(document_store=document_store)
-writer.run(documents=documents_with_embeddings)
-# returns {"documents_written": 120}
-```
-
-If you've read my [previous post](/posts/2023-10-27-haystack-series-rag) about RAG pipelines, you may wonder: why use `DocumentWriter` when we could call the `.write_documents()` method of our document store?
-
-In fact, the two methods are fully equivalent: `DocumentWriter` does nothing more than calling the `.write_documents()` method of the document store. The difference is that `DocumentWriter` is the way to go if you are using a Pipeline, which is what we're going to do next.
-
-# Putting it all together
-
-We finally have all the components we need to go from a list of web pages to a document store populated with clean and short Document objects. Let's build a Pipeline to sum up this process:
-
-```python
-from haystack import Pipeline
-
-document_store = InMemoryDocumentStore()
-
-pipeline = Pipeline()
-pipeline.add_component("converter", HTMLToDocument())
-pipeline.add_component("cleaner", DocumentCleaner())
-pipeline.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=5))
-pipeline.add_component("writer", DocumentWriter(document_store=document_store))
-pipeline.connect("converter", "cleaner")
-pipeline.connect("cleaner", "splitter")
-pipeline.connect("splitter", "writer")
-
-pipeline.draw("simple-indexing-pipeline.png")
-
-pipeline.run({"converter": {"sources": file_names}})
-```
-
-![Indexing Pipeline](/posts/2023-11-05-haystack-series-minimal-indexing/simple-indexing-pipeline.png)
-
-That's it! We now have a fully functional indexing pipeline that can take a list of web pages and convert them into Documents that our RAG pipeline can use. As long as the RAG pipeline reads from the same store we are writing the Documents to, we can add as many Documents as we need to keep the chatbot's answers up to date without having to touch the RAG pipeline.
-
-To try it out, we only need to take the RAG pipeline we built in [my previous post](/posts/2023-10-27-haystack-series-rag) and connect it to the same document store we just populated:
-
-```python
-from haystack.components.generators import OpenAIGenerator
-from haystack.components.builders.prompt_builder import PromptBuilder
-from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
-
-template = """
-Given the following information, answer the question: {{ question }}
-
-{% for document in documents %}
- {{ document.content }}
-{% endfor %}
-"""
-pipe = Pipeline()
-
-pipe.add_component("retriever", InMemoryBM25Retriever(document_store=document_store))
-pipe.add_component("prompt_builder", PromptBuilder(template=template))
-pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
-pipe.connect("retriever", "prompt_builder.documents")
-pipe.connect("prompt_builder", "llm")
-
-question = "Is there any documentary about the story of Rose Island? Can you tell me something about that?"
-pipe.run({
- "retriever": {"query": question},
- "prompt_builder": {"question": question}
-})
-
-# returns {
-# 'llm': {
-# 'replies': [
-# 'Yes, there is a documentary about the story of Rose Island. It is
-# called "Rose Island" and was released on Netflix on 8 December 2020.
-# The documentary follows the true story of Giorgio Rosa, an Italian
-# engineer who built his own island in the Adriatic sea in the late
-# 1960s. The island housed a restaurant, bar, souvenir shop, and even
-# a post office. Rosa\'s goal was to have his self-made structure
-# recognized as an independent state, leading to a battle with the
-# Italian authorities. The film depicts the construction of the island
-# and Rosa\'s refusal to dismantle it despite government demands. The
-# story of Rose Island was relatively unknown until the release of the
-# documentary. The film showcases the technology Rosa invented to build
-# the island and explores themes of freedom and resilience.'
-# ],
-# 'metadata': [...]
-# }
-# }
-```
-
-And suddenly, our chatbot knows everything about Rose Island without us having to feed the data to the document store by hand.
-
-# Wrapping up
-
-Indexing pipelines can be powerful tools, even in their simplest form, like the one we just built. However, it doesn't end here: Haystack offers many more facilities to extend what's possible with indexing pipelines, like doing web searches, downloading files from the web, processing many other file types, and so on.
-
-We will see how soon, so stay tuned!
-
----
-
-*Next: [The World of Web RAG](/posts/2023-11-09-haystack-series-simple-web-rag)*
-
-*Previous: [RAG Pipelines from scratch](/posts/2023-10-27-haystack-series-rag)*
-
-*See the entire series here: [Haystack 2.0 series](/series/haystack-2.0-series/)*
-
-
-*Cover image from [this website.](https://bertolamifineart.bidinside.com/en/lot/126352/1968-insula-de-la-rozoj-o-isola-delle-/)*
diff --git a/content/posts/2023-11-09-haystack-series-simple-web-rag.md b/content/posts/2023-11-09-haystack-series-simple-web-rag.md
deleted file mode 100644
index 0cac1d3e..00000000
--- a/content/posts/2023-11-09-haystack-series-simple-web-rag.md
+++ /dev/null
@@ -1,372 +0,0 @@
----
-title: "The World of Web RAG"
-date: 2023-11-09
-author: "ZanSara"
-series: ["Haystack 2.0 Series"]
-featuredImage: "/posts/2023-11-09-haystack-series-simple-web-rag/cover.jpeg"
----
-
-*Last updated: 18/01/2023*
-
----
-
-In an earlier post of the Haystack 2.0 series, we've seen how to build RAG and indexing pipelines. An application that uses these two pipelines is practical if you have an extensive, private collection of documents and need to perform RAG on such data only. However, in many cases, you may want to get data from the Internet: from news outlets, documentation pages, and so on.
-
-In this post, we will see how to build a Web RAG application: a RAG pipeline that can search the Web for the information needed to answer your questions.
-
-{{< notice info >}}
-
-💡 *Do you want to see the code in action? Check out the [Colab notebook](https://colab.research.google.com/drive/1dGMPxReo730j7_zQDZOu-0SGf-pk4XDL?usp=sharing) or the [gist](https://gist.github.com/ZanSara/0907a8f3ae19f62998cc061ed6e8ce53).*
-
-{{< /notice >}}
-
-{{< notice warning >}}
-
-⚠️ **Warning:** *This code was tested on `haystack-ai==2.0.0b5`. Haystack 2.0 is still unstable, so later versions might introduce breaking changes without notice until Haystack 2.0 is officially released. The concepts and components, however, stay the same.*
-
-{{< /notice >}}
-
-
-# Searching the Web
-
-As we've seen [earlier](/posts/2023-10-27-haystack-series-rag), a Haystack RAG Pipeline is made of three components: a Retriever, a PromptBuilder, and a Generator, and looks like this:
-
-![BM25 RAG Pipeline](/posts/2023-11-09-haystack-series-simple-web-rag/bm25-rag-pipeline.png)
-
-To make this pipeline use the Web as its data source, we need to change the retriever with a component that does not look into a local document store for information but can search the web.
-
-Haystack 2.0 already provides a search engine component called `SerperDevWebSearch`. It uses [SerperDev's API](https://serper.dev/) to query popular search engines and return two types of data: a list of text snippets coming from the search engine's preview boxes and a list of links, which point to the top search results.
-
-To begin, let's see how to use this component in isolation.
-
-```python
-from haystack.components.websearch import SerperDevWebSearch
-
-question = "What's the official language of the Republic of Rose Island?"
-
-search = SerperDevWebSearch(api_key=serperdev_api_key)
-results = search.run(query=question)
-# returns {
-# "documents": [
-# Document(content='Esperanto', meta={'title': 'Republic of Rose Island - Wikipedia', 'link': 'https://en.wikipedia.org/wiki/Republic_of_Rose_Island'}),
-# Document(content="The Republic of Rose Island was a short-lived micronation on a man-made platform in the Adriatic Sea. It's a story that few people knew of until recently, ...", meta={'title': 'Rose Island - The story of a micronation', 'link': 'https://www.rose-island.co/', 'imageUrl': 'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQiRCfTO6OwFS32SX37S-7OadDZCNK6Fy_NZVGsci2gcIS-zcinhOcGhgU&s', 'position': 1},
-# ...
-# ],
-# "links": [
-# 'https://www.rose-island.co/',
-# 'https://www.defactoborders.org/places/rose-island',
-# ...
-# ]
-# }
-```
-
-`SerperDevWebSearch` is a component with a simple interface. Starting from its output, we can see that it returns not one but two different values in the returned dictionary: `documents` and `links`.
-
-`links` is the most straightforward and represents the top results that Google found relevant for the input query. It's a list of strings, each containing a URL. You can configure the number of links to return with the `top_k` init parameter.
-
-`documents` instead is a list of already fully formed Document objects. The content of these objects corresponds to the "answer boxes" that Google often returns together with its search results. Given that these code snippets are usually clean and short pieces of text, they're perfect to be fed directly to an LLM without further processing.
-
-Other than expecting an API key as an init parameter and `top_k` to control the number of results, `SerperDevWebSearch` also accepts an `allowed_domains` parameter, which lets you configure the domains Google is allowed to look into during search, and `search_params`, a more generic dictionary input that lets you pass any additional search parameter SerperDev's API understand.
-
-# A Minimal Web RAG Pipeline
-
-`SerperDevWebSearch` is actually the bare minimum we need to be able to build our very first Web RAG Pipeline. All we need to do is replace our original example's Retriever with our search component.
-
-This is the result:
-
-```python
-from haystack import Pipeline
-from haystack.components.builders import PromptBuilder
-from haystack.components.generators import OpenAIGenerator
-
-template = """
-Question: {{ question }}
-
-Google Search Answer Boxes:
-{% for document in documents %}
- {{ document.content }}
-{% endfor %}
-
-Please reformulate the information above to
-answer the user's question.
-"""
-pipe = Pipeline()
-
-pipe.add_component("search", SerperDevWebSearch(api_key=serperdev_api_key))
-pipe.add_component("prompt_builder", PromptBuilder(template=template))
-pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
-pipe.connect("search.documents", "prompt_builder.documents")
-pipe.connect("prompt_builder", "llm")
-
-question = "What's the official language of the Republic of Rose Island?"
-pipe.run({
- "search": {"query": question},
- "prompt_builder": {"question": question}
-})
-# returns {
-# 'llm': {
-# 'replies': [
-# "The official language of the Republic of Rose Island is Esperanto. This artificial language was chosen by the residents of Rose Island as their national language when they declared independence in 1968. However, it's important to note that despite having their own language, government, currency, and postal service, Rose Island was never officially recognized as an independent nation by any country."
-# ],
-# 'metadata': [...]
-# }
-# }
-```
-
-![Minimal Web RAG Pipeline](/posts/2023-11-09-haystack-series-simple-web-rag/minimal-web-rag-pipeline.png)
-
-This solution is already quite effective for simple questions because Google does most of the heavy lifting of reading the content of the top results, extracting the relevant snippets, and packaging them up in a way that is really easy to access and understand by the model.
-
-However, there are situations in which this approach is not sufficient. For example, for highly technical or nuanced questions, the answer box does not provide enough context for the LLM to elaborate and grasp the entire scope of the discussion. In these situations, we may need to turn to the second output of `SerperDevWebSearch`: the links.
-
-# Fetching URLs
-
-Haystack offers components to read the content of a URL: it's `LinkContentFetcher`. Let's see this component in action.
-
-```python
-from haystack.components.fetchers import LinkContentFetcher
-
-fetcher = LinkContentFetcher()
-fetcher.run(urls=["https://en.wikipedia.org/wiki/Republic_of_Rose_Island"])
-# returns {
-# "streams": [
-# ByteStream(data=b"\n<...")
-# ]
-# }
-```
-
-First, let's notice that `LinkContentFetcher` outputs a list of `ByteStream` objects. `ByteStream` is a Haystack abstraction that makes handling binary streams and files equally easy. When a component produces `ByteStream` as output, you can directly pass these objects to a Converter component that can extract its textual content without saving such binary content to a file.
-
-These features come in handy to connect `LinkContentFetcher` to a component we've already met before: `HTMLToDocument`.
-
-# Processing the page
-
-In a [previous post](/posts/2023-11-05-haystack-series-minimal-indexing), we've seen how Haystack can convert web pages into clean Documents ready to be stored in a Document Store. We will reuse many of the components we have discussed there, so if you missed it, make sure to check it out.
-
-From the pipeline in question, we're interested in three of its components: `HTMLToDocument`, `DocumentCleaner`, and `DocumentSplitter`. Once the search component returns the links and `LinkContentFetcher` downloaded their content, we can connect it to `HTMLToDocument` to extract the text and `DocumentCleaner` and `DocumentSplitter` to clean and chunk the content, respectively. These documents then can go to the `PromptBuilder`, resulting in a pipeline such as this:
-
-```python
-template = """
-Question: {{ question }}
-
-Context:
-{% for document in documents %}
- {{ document.content }}
-{% endfor %}
-
-Please reformulate the information above to answer the user's question.
-"""
-pipe = Pipeline()
-
-pipe.add_component("search", SerperDevWebSearch(api_key=serperdev_api_key))
-pipe.add_component("fetcher", LinkContentFetcher())
-pipe.add_component("converter", HTMLToDocument())
-pipe.add_component("cleaner", DocumentCleaner())
-pipe.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=3))
-pipe.add_component("prompt_builder", PromptBuilder(template=template))
-pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
-pipe.connect("search.links", "fetcher")
-pipe.connect("fetcher", "converter")
-pipe.connect("converter", "cleaner")
-pipe.connect("cleaner", "splitter")
-pipe.connect("splitter", "prompt_builder.documents")
-pipe.connect("prompt_builder", "llm")
-
-question = "What's the official language of the Republic of Rose Island?"
-pipe.run({
- "search": {"query": question},
- "prompt_builder": {"question": question}
-})
-```
-
-![Incorrect Web RAG Pipeline](/posts/2023-11-09-haystack-series-simple-web-rag/incorrect-web-rag-pipeline.png)
-
-However, running this pipeline results in a crash.
-
-```
-PipelineRuntimeError: llm raised 'InvalidRequestError: This model's maximum context
-length is 4097 tokens. However, your messages resulted in 4911 tokens. Please reduce
-the length of the messages.'
-```
-
-Reading the error message reveals the issue right away: the LLM received too much text. And that's to be expected because we just passed the entire content of several web pages to it.
-
-We need to find a way to filter only the most relevant documents from the long list that is generated by `DocumentSplitter`.
-
-# Ranking Documents on the fly
-
-Retrievers are optimized to use the efficient retrieval engines of document stores to sift quickly through vast collections of Documents. However, Haystack also provides smaller, standalone components that work very well on shorter lists and don't require a full-blown vector database engine to function.
-
-These components are called rankers. One example of such a component is `TransformersSimilarityRanker`: a ranker that uses a model from the `transformers` library to rank Documents by their similarity to a given query.
-
-Let's see how it works:
-
-```python
-from haystack.components.rankers import TransformersSimilarityRanker
-
-ranker = TransformersSimilarityRanker()
-ranker.warm_up()
-ranker.run(
- query="What's the official language of the Republic of Rose Island?",
- documents=documents,
- top_k=1
- )
-# returns {
-# 'documents': [
-# Document(content="Island under construction\nRepublic of Rose Island\nThe Republic of Rose Island ( Esperanto : Respubliko de la Insulo de la Rozoj; Italian : Repubblica dell'Isola delle Rose) was a short-lived micronation on a man-made platform in the Adriatic Sea , 11 kilometres (6.8\xa0mi) off the coast of the province of Rimini , Italy, built by Italian engineer Giorgio Rosa, who made himself its president and declared it an independent state on 1 May 1968. [1] [2] Rose Island had its own government, currency, post office, and commercial establishments, and the official language was Esperanto .", meta={'source_id': '03bfe5f7b7a7ec623e854d2bc5eb36ba3cdf06e1e2771b3a529eeb7e669431b6'}, score=7.594357490539551)
-# ]
-# }
-```
-
-This component has a feature we haven't encountered before: the `warm_up()` method.
-
-Components that need to initialize heavy resources, such as a language model, always perform this operation after initializing them in the `warm_up()` method. When they are used in a Pipeline, `Pipeline.run()` takes care of calling `warm_up()` on all components before running; when used standalone, users need to call `warm_up()` explicitly to prepare the object to run.
-
-`TransformersSimilarityRanker` accepts a few parameters. When initialized, it accepts a `model_name_or_path` with the HuggingFace ID of the model to use for ranking: this value defaults to `cross-encoder/ms-marco-MiniLM-L-6-v2`. It also takes `token`, to allow users to download private models from the Models Hub, `device`, to let them leverage PyTorch's ability to select the hardware to run on, and `top_k`, the maximum number of documents to return. `top_k`, as we see above, can also be passed to `run()`, and the latter overcomes the former if both are set. This value defaults to 10.
-
-Let's also put this component in the pipeline: its place is between the splitter and the prompt builder.
-
-```python
-template = """
-Question: {{ question }}
-
-Context:
-{% for document in documents %}
- {{ document.content }}
-{% endfor %}
-
-Please reformulate the information above to answer the user's question.
-"""
-pipe = Pipeline()
-
-pipe.add_component("search", SerperDevWebSearch(api_key=serperdev_api_key))
-pipe.add_component("fetcher", LinkContentFetcher())
-pipe.add_component("converter", HTMLToDocument())
-pipe.add_component("cleaner", DocumentCleaner())
-pipe.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=3))
-pipe.add_component("ranker", TransformersSimilarityRanker())
-pipe.add_component("prompt_builder", PromptBuilder(template=template))
-pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
-pipe.connect("search.links", "fetcher")
-pipe.connect("fetcher", "converter")
-pipe.connect("converter", "cleaner")
-pipe.connect("cleaner", "splitter")
-pipe.connect("splitter", "ranker")
-pipe.connect("ranker", "prompt_builder.documents")
-pipe.connect("prompt_builder", "llm")
-
-question = "What's the official language of the Republic of Rose Island?"
-
-pipe.run({
- "search": {"query": question},
- "ranker": {"query": question},
- "prompt_builder": {"question": question}
-})
-# returns {
-# 'llm': {
-# 'replies': [
-# 'The official language of the Republic of Rose Island was Esperanto.'
-# ],
-# 'metadata': [...]
-# }
-# }
-```
-
-![Unfiltered Web RAG Pipeline](/posts/2023-11-09-haystack-series-simple-web-rag/unfiltered-web-rag-pipeline.png)
-
-
-Note how the ranker needs to know the question to compare the documents, just like the search and prompt builder components do. So, we need to pass the value to the pipeline's `run()` call.
-
-# Filtering file types
-
-The pipeline we just built works great in most cases. However, it may occasionally fail if the search component happens to return some URL that does not point to a web page but, for example, directly to a video, a PDF, or a PPTX.
-
-Haystack does offer some facilities to deal with these file types, but we will see these converters in another post. For now, let's only filter those links out to prevent `HTMLToDocument` from crashing.
-
-This task could be approached with Haystack in several ways, but the simplest in this scenario is to use a component that would typically be used for a slightly different purpose. This component is called `FileTypeRouter`.
-
-`FileTypeRouter` is designed to route different files to their appropriate converters by checking their mime type. It does so by inspecting the content or the extension of the files it receives in input and producing an output dictionary with a separate list for each identified type.
-
-However, we can also conveniently use this component as a filter. Let's see how!
-
-```python
-from haystack.components.routers import FileTypeRouter
-
-router = FileTypeRouter(mime_types=["text/html"])
-router.run(sources=["Republic_of_Rose_Island.txt", "Republic_of_Rose_Island.html"])
-# returns defaultdict(list,
-# {'unclassified': [PosixPath('Republic_of_Rose_Island.txt')],
-# 'text/html': [PosixPath('Republic_of_Rose_Island.html')]})
-```
-
-`FileTypeRouter` must always be initialized with the list of mime types it is supposed to handle. Not only that, but this component can also deal with files that do not match any of the expected mime types by putting them all under the `unclassified` category.
-
-By putting this component between `LinkContentFetcher` and `HTMLToDocument`, we can make it forward along the pipeline only the files that match the `text/html` mime type and silently discard all others.
-
-Notice how, in the pipeline below, I explicitly connect the `text/html` output only:
-
-```python
-template = """
-Question: {{ question }}
-
-Google Search Answer Boxes:
-{% for document in documents %}
- {{ document.content }}
-{% endfor %}
-
-Please reformulate the information above to answer the user's question.
-"""
-pipe = Pipeline()
-
-pipe.add_component("search", SerperDevWebSearch(api_key=serperdev_api_key))
-pipe.add_component("fetcher", LinkContentFetcher())
-pipe.add_component("filter", FileTypeRouter(mime_types=["text/html"]))
-pipe.add_component("converter", HTMLToDocument())
-pipe.add_component("cleaner", DocumentCleaner())
-pipe.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=3))
-pipe.add_component("ranker", TransformersSimilarityRanker())
-pipe.add_component("prompt_builder", PromptBuilder(template=template))
-pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
-pipe.connect("search.links", "fetcher")
-pipe.connect("fetcher", "filter")
-pipe.connect("filter.text/html", "converter")
-pipe.connect("converter", "cleaner")
-pipe.connect("cleaner", "splitter")
-pipe.connect("splitter", "ranker")
-pipe.connect("ranker", "prompt_builder.documents")
-pipe.connect("prompt_builder", "llm")
-
-question = "What's the official language of the Republic of Rose Island?"
-
-pipe.run({
- "search": {"query": question},
- "ranker": {"query": question},
- "prompt_builder": {"question": question}
-})
-# returns {
-# 'llm': {
-# 'replies': [
-# 'The official language of the Republic of Rose Island was Esperanto.'
-# ],
-# 'metadata': [...]
-# }
-# }
-```
-
-![HTML-only Web RAG Pipeline](/posts/2023-11-09-haystack-series-simple-web-rag/html-web-rag-pipeline.png)
-
-With this last addition, we added quite a bit of robustness to our pipeline, making it less likely to fail.
-
-# Wrapping up
-
-Web RAG is a use case that can be expanded to cover many use cases, resulting in very complex pipelines. Haystack helps make sense of their complexity by pipeline graphs and detailed error messages in case of mismatch connections. However, pipelines this large can become overwhelming, especially when more branches are added.
-
-In one of our next posts, we will see how to cover such use cases while keeping the resulting complexity as low as possible.
-
----
-
-*Previous: [Indexing data for RAG applications](/posts/2023-11-05-haystack-series-minimal-indexing)*
-
-*See the entire series here: [Haystack 2.0 series](/series/haystack-2.0-series/)*
-
-*Cover image from [Wikipedia](https://commons.wikimedia.org/wiki/File:Isola_delle_Rose_1968.jpg)*
diff --git a/content/posts/2024-01-06-raspberrypi-headless-bookworm-wifi-config.md b/content/posts/2024-01-06-raspberrypi-headless-bookworm-wifi-config.md
deleted file mode 100644
index 94e1268d..00000000
--- a/content/posts/2024-01-06-raspberrypi-headless-bookworm-wifi-config.md
+++ /dev/null
@@ -1,389 +0,0 @@
----
-title: "Headless WiFi setup on Raspberry Pi OS \"Bookworm\" without the Raspberry Pi Imager"
-date: 2024-01-06
-author: "ZanSara"
-featuredImage: "/posts/2024-01-06-raspberrypi-headless-bookworm-wifi-config/cover.png"
----
-
-Setting up a Raspberry Pi headless without the Raspberry Pi Imager used to be a fairly simple process for the average Linux user, to the point where a how-to and a few searches on the Raspberry Pi forums would sort the process out. After flashing the image with `dd`, creating `ssh` in the boot partition and populating `wpa_supplicant.conf` was normally enough to get started.
-
-However with the [recently released Raspberry Pi OS 12 "Bookworm"](https://www.raspberrypi.com/news/bookworm-the-new-version-of-raspberry-pi-os/) this second step [doesn't work anymore](https://www.raspberrypi.com/documentation/computers/configuration.html#connect-to-a-wireless-network) and the only recommendation that users receive is to "just use the Raspberry Pi Imager" (like [here](https://github.com/raspberrypi/bookworm-feedback/issues/72)).
-
-But what does the Imager really do to configure the OS? Is it really that complex that it requires downloading a dedicated installer?
-
-In this post I'm going to find out first how to get the OS connect to the WiFi without Imager, and then I'm going to dig a bit deeper to find out why such advice is given and how the Imager performs this configuration step.
-
-# Network Manager
-
-In the [announcement](https://www.raspberrypi.com/news/bookworm-the-new-version-of-raspberry-pi-os/) of the new OS release, one of the highlights is the move to [NetworkManager](https://networkmanager.dev/) as the default mechanism to deal with networking. While this move undoubtely brings many advantages, it is the reason why the classic technique of dropping a `wpa_supplicant.conf` file under `/etc/wpa_supplicant/` no longer works.
-
-The good news is that also NetworkManager can be manually configured with a text file. The file needs to be called `SSID.nmconnection` (replace `SSID` with your network's SSID) and placed under `/etc/NetworkManager/system-connections/` in the Pi's `rootfs` partition.
-
-```toml
-[connection]
-
-id=SSID
-uuid= # random UUID in the format 11111111-1111-1111-1111-111111111111
-type=wifi
-autoconnect=true
-
-[wifi]
-mode=infrastructure
-ssid=SSID
-
-[wifi-security]
-auth-alg=open
-key-mgmt=wpa-psk
-psk=PASSWORD
-
-[ipv4]
-method=auto
-
-[ipv6]
-method=auto
-```
-
-(replace `SSID` and `PASSWORD` with your wifi network's SSID and password). [Here](https://developer-old.gnome.org/NetworkManager/stable/nm-settings-keyfile.html) you can find the full syntax for this file.
-
-You'll need also to configure its access rights as:
-
-```bash
-sudo chmod -R 600 /etc/NetworkManager/system-connections/SSID.nmconnection
-sudo chown -R root:root /etc/NetworkManager/system-connections/SSID.nmconnection
-```
-
-Once this is done, let's not forget to create an empty `ssh` file in the `bootfs` partition to enable the SSH server:
-
-```bash
-touch /ssh
-```
-
-and, as it was already the case in Bullseye to [configure the default user](https://www.raspberrypi.com/news/raspberry-pi-bullseye-update-april-2022/) with `userconfig.txt`:
-
-```bash
-echo 'mypassword' | openssl passwd -6 -stdin | awk '{print "myuser:"$1}' > /userconfig.txt
-```
-
-So far it doesn't seem too complicated. However, interestingly, this is **not** what the Raspberry Pi Imager does, because if you use it to flash the image and check the result, these files are nowhere to be found. Is there a better way to go about this?
-
-# Raspberry Pi Imager
-
-To find out what the Imager does, my first idea was to have a peek at its [source code](https://github.com/raspberrypi/rpi-imager). Being a Qt application the source might be quite intimidating, but with a some searching it's possible to locate this interesting [snippet](https://github.com/raspberrypi/rpi-imager/blob/6f6a90adbb88c135534d5f20cc2a10f167ea43a3/src/imagewriter.cpp#L1214):
-
-```cpp
-void ImageWriter::setImageCustomization(const QByteArray &config, const QByteArray &cmdline, const QByteArray &firstrun, const QByteArray &cloudinit, const QByteArray &cloudinitNetwork)
-{
- _config = config;
- _cmdline = cmdline;
- _firstrun = firstrun;
- _cloudinit = cloudinit;
- _cloudinitNetwork = cloudinitNetwork;
-
- qDebug() << "Custom config.txt entries:" << config;
- qDebug() << "Custom cmdline.txt entries:" << cmdline;
- qDebug() << "Custom firstuse.sh:" << firstrun;
- qDebug() << "Cloudinit:" << cloudinit;
-}
-```
-I'm no C++ expert, but this function tells me a few things:
-
-1. The Imager writes the configuration in these files: `config.txt`, `cmdline.txt`, `firstuse.sh` (we'll soon figure out this is a typo: the file is actually called `firstrun.sh`).
-2. It also prepares a "Cloudinit" configuration file, but it's unclear if it writes it and where
-3. The content of these files is printed to the console as debug output.
-
-So let's enable the debug logs and see what they produce:
-
-```bash
-rpi-imager --debug
-```
-
-The console stays quiet until I configure the user, password, WiFi and so on in the Imager, at which point it starts printing all the expected configuration files to the console.
-
-
-Click here to see the full output
-
-
-```
-Custom config.txt entries: ""
-Custom cmdline.txt entries: " cfg80211.ieee80211_regdom=PT"
-Custom firstuse.sh: "#!/bin/bash
-
-set +e
-
-CURRENT_HOSTNAME=`cat /etc/hostname | tr -d " \ \
-\\r"`
-if [ -f /usr/lib/raspberrypi-sys-mods/imager_custom ]; then
- /usr/lib/raspberrypi-sys-mods/imager_custom set_hostname raspberrypi
-else
- echo raspberrypi >/etc/hostname
- sed -i "s/127.0.1.1.*$CURRENT_HOSTNAME/127.0.1.1\ raspberrypi/g" /etc/hosts
-fi
-FIRSTUSER=`getent passwd 1000 | cut -d: -f1`
-FIRSTUSERHOME=`getent passwd 1000 | cut -d: -f6`
-if [ -f /usr/lib/raspberrypi-sys-mods/imager_custom ]; then
- /usr/lib/raspberrypi-sys-mods/imager_custom enable_ssh
-else
- systemctl enable ssh
-fi
-if [ -f /usr/lib/userconf-pi/userconf ]; then
- /usr/lib/userconf-pi/userconf 'myuser' ''
-else
- echo "$FIRSTUSER:"'' | chpasswd -e
- if [ "$FIRSTUSER" != "myuser" ]; then
- usermod -l "myuser" "$FIRSTUSER"
- usermod -m -d "/home/myuser" "myuser"
- groupmod -n "myuser" "$FIRSTUSER"
- if grep -q "^autologin-user=" /etc/lightdm/lightdm.conf ; then
- sed /etc/lightdm/lightdm.conf -i -e "s/^autologin-user=.*/autologin-user=myuser/"
- fi
- if [ -f /etc/systemd/system/getty@tty1.service.d/autologin.conf ]; then
- sed /etc/systemd/system/getty@tty1.service.d/autologin.conf -i -e "s/$FIRSTUSER/myuser/"
- fi
- if [ -f /etc/sudoers.d/010_pi-nopasswd ]; then
- sed -i "s/^$FIRSTUSER /myuser /" /etc/sudoers.d/010_pi-nopasswd
- fi
- fi
-fi
-if [ -f /usr/lib/raspberrypi-sys-mods/imager_custom ]; then
- /usr/lib/raspberrypi-sys-mods/imager_custom set_wlan 'MY-SSID' 'MY-PASSWORD' 'PT'
-else
-cat >/etc/wpa_supplicant/wpa_supplicant.conf <<'WPAEOF'
-country=PT
-ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
-ap_scan=1
-
-update_config=1
-network={
- ssid="MY-SSID"
- psk=MY-PASSWORD
-}
-
-WPAEOF
- chmod 600 /etc/wpa_supplicant/wpa_supplicant.conf
- rfkill unblock wifi
- for filename in /var/lib/systemd/rfkill/*:wlan ; do
- echo 0 > $filename
- done
-fi
-if [ -f /usr/lib/raspberrypi-sys-mods/imager_custom ]; then
- /usr/lib/raspberrypi-sys-mods/imager_custom set_keymap 'us'
- /usr/lib/raspberrypi-sys-mods/imager_custom set_timezone 'Europe/Lisbon'
-else
- rm -f /etc/localtime
- echo "Europe/Lisbon" >/etc/timezone
- dpkg-reconfigure -f noninteractive tzdata
-cat >/etc/default/keyboard <<'KBEOF'
-XKBMODEL="pc105"
-XKBLAYOUT="us"
-XKBVARIANT=""
-XKBOPTIONS=""
-
-KBEOF
- dpkg-reconfigure -f noninteractive keyboard-configuration
-fi
-rm -f /boot/firstrun.sh
-sed -i 's| systemd.run.*||g' /boot/cmdline.txt
-exit 0
-"
-
-Cloudinit: "hostname: raspberrypi
-manage_etc_hosts: true
-packages:
-- avahi-daemon
-apt:
- conf: |
- Acquire {
- Check-Date "false";
- };
-
-users:
-- name: myuser
- groups: users,adm,dialout,audio,netdev,video,plugdev,cdrom,games,input,gpio,spi,i2c,render,sudo
- shell: /bin/bash
- lock_passwd: false
- passwd:
-
-ssh_pwauth: true
-
-timezone: Europe/Lisbon
-runcmd:
-- localectl set-x11-keymap "us" pc105
-- setupcon -k --force || true
-
-
-"
-```
-
-
-
-Among these the most interesting file is `firstrun.sh`, which we can quickly locate in the `bootfs` partition. Here is its content:
-
-```bash
-#!/bin/bash
-
-set +e
-
-CURRENT_HOSTNAME=`cat /etc/hostname | tr -d " \ \
-\\r"`
-if [ -f /usr/lib/raspberrypi-sys-mods/imager_custom ]; then
- /usr/lib/raspberrypi-sys-mods/imager_custom set_hostname raspberrypi
-else
- echo raspberrypi >/etc/hostname
- sed -i "s/127.0.1.1.*$CURRENT_HOSTNAME/127.0.1.1\ raspberrypi/g" /etc/hosts
-fi
-FIRSTUSER=`getent passwd 1000 | cut -d: -f1`
-FIRSTUSERHOME=`getent passwd 1000 | cut -d: -f6`
-if [ -f /usr/lib/raspberrypi-sys-mods/imager_custom ]; then
- /usr/lib/raspberrypi-sys-mods/imager_custom enable_ssh
-else
- systemctl enable ssh
-fi
-if [ -f /usr/lib/userconf-pi/userconf ]; then
- /usr/lib/userconf-pi/userconf 'myuser' ''
-else
- echo "$FIRSTUSER:"'' | chpasswd -e
- if [ "$FIRSTUSER" != "myuser" ]; then
- usermod -l "myuser" "$FIRSTUSER"
- usermod -m -d "/home/myuser" "myuser"
- groupmod -n "myuser" "$FIRSTUSER"
- if grep -q "^autologin-user=" /etc/lightdm/lightdm.conf ; then
- sed /etc/lightdm/lightdm.conf -i -e "s/^autologin-user=.*/autologin-user=myuser/"
- fi
- if [ -f /etc/systemd/system/getty@tty1.service.d/autologin.conf ]; then
- sed /etc/systemd/system/getty@tty1.service.d/autologin.conf -i -e "s/$FIRSTUSER/myuser/"
- fi
- if [ -f /etc/sudoers.d/010_pi-nopasswd ]; then
- sed -i "s/^$FIRSTUSER /myuser /" /etc/sudoers.d/010_pi-nopasswd
- fi
- fi
-fi
-if [ -f /usr/lib/raspberrypi-sys-mods/imager_custom ]; then
- /usr/lib/raspberrypi-sys-mods/imager_custom set_wlan 'MY-SSID' 'MY-PASSWORD' 'PT'
-else
-cat >/etc/wpa_supplicant/wpa_supplicant.conf <<'WPAEOF'
-country=PT
-ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
-ap_scan=1
-
-update_config=1
-network={
- ssid="MY-SSID"
- psk=MY-PASSWORD
-}
-
-WPAEOF
- chmod 600 /etc/wpa_supplicant/wpa_supplicant.conf
- rfkill unblock wifi
- for filename in /var/lib/systemd/rfkill/*:wlan ; do
- echo 0 > $filename
- done
-fi
-if [ -f /usr/lib/raspberrypi-sys-mods/imager_custom ]; then
- /usr/lib/raspberrypi-sys-mods/imager_custom set_keymap 'us'
- /usr/lib/raspberrypi-sys-mods/imager_custom set_timezone 'Europe/Lisbon'
-else
- rm -f /etc/localtime
- echo "Europe/Lisbon" >/etc/timezone
- dpkg-reconfigure -f noninteractive tzdata
-cat >/etc/default/keyboard <<'KBEOF'
-XKBMODEL="pc105"
-XKBLAYOUT="us"
-XKBVARIANT=""
-XKBOPTIONS=""
-
-KBEOF
- dpkg-reconfigure -f noninteractive keyboard-configuration
-fi
-rm -f /boot/firstrun.sh
-sed -i 's| systemd.run.*||g' /boot/cmdline.txt
-exit 0
-```
-
-
-
-Side note: how does the OS know that it should run this file on its first boot?
-
-Imager also writes a file called `cmdline.txt` in the boot partition, which contains the following:
-
-```
-console=serial0,115200 console=tty1 root=PARTUUID=57c84f67-02 rootfstype=ext4 fsck.repair=yes rootwait quiet init=/usr/lib/raspberrypi-sys-mods/firstboot cfg80211.ieee80211_regdom=PT systemd.run=/boot/firstrun.sh systemd.run_success_action=reboot systemd.unit=kernel-command-line.target
-```
-
-Note the reference to `/boot/firstrun.sh`. If you plan to implement your own `firstrun.sh` file and want to change its name, don't forget to modify this line as well.
-
-
-
-That's a lot of Bash in one go, but upon inspection one can spot a recurring pattern. For example, when setting the hostname, it does this:
-
-```bash
-if [ -f /usr/lib/raspberrypi-sys-mods/imager_custom ]; then
- /usr/lib/raspberrypi-sys-mods/imager_custom set_hostname raspberrypi
-else
- echo raspberrypi >/etc/hostname
- sed -i "s/127.0.1.1.*$CURRENT_HOSTNAME/127.0.1.1\ raspberrypi/g" /etc/hosts
-fi
-```
-
-The script clearly messages that there is a "preferred" way to set the hostname: to use `/usr/lib/raspberrypi-sys-mods/imager_custom set_hostname [NAME]`. Only if this executable is not available, then it falls back to the "traditional" way of setting the hostname by editing `/etc/hosts`.
-
-The same patterns repeat a few times to perform the following operations:
-- set the hostname (`/usr/lib/raspberrypi-sys-mods/imager_custom set_hostname [NAME]`)
-- enable ssh (`/usr/lib/raspberrypi-sys-mods/imager_custom enable_ssh`)
-- configure the user (`/usr/lib/userconf-pi/userconf [USERNAME] [HASHED-PASSWORD]`)
-- configure the WiFi (`/usr/lib/raspberrypi-sys-mods/imager_custom set_wlan [MY-SSID [MY-PASSWORD] [2-LETTER-COUNTRY-CODE]`)
-- set the keyboard layout (`/usr/lib/raspberrypi-sys-mods/imager_custom set_keymap [CODE]`)
-- set the timezone (`/usr/lib/raspberrypi-sys-mods/imager_custom set_timezone [TIMEZONE-NAME]`)
-
-It seems like using `raspberrypi-sys-mods` to configure the OS at the first boot is the way to go in this RPi OS version, and it might be true in future versions as well. There are [hints](https://github.com/RPi-Distro/raspberrypi-sys-mods/issues/82#issuecomment-1779109991) that the Raspberry PI OS team is going to move to [`cloud-init`](https://cloudinit.readthedocs.io/en/latest/index.html) in the near future, but for now this seems to be the way that the initial setup is done.
-
-# raspberrypi-sys-mods
-
-So let's check out what `raspberrypi-sys-mods` do! The source code can be found here: [raspberrypi-sys-mods](https://github.com/RPi-Distro/raspberrypi-sys-mods).
-
-Given that we're interested in the WiFi configuration, let's head straight to the `imager_custom` script ([here](https://github.com/RPi-Distro/raspberrypi-sys-mods/blob/2e256445b65995f62db80e6a267313275cad51e4/usr/lib/raspberrypi-sys-mods/imager_custom#L97)), where we discover that it's a Bash script which does this:
-
-```bash
-CONNFILE=/etc/NetworkManager/system-connections/preconfigured.nmconnection
- UUID=$(uuid -v4)
- cat <<- EOF >${CONNFILE}
- [connection]
- id=preconfigured
- uuid=${UUID}
- type=wifi
- [wifi]
- mode=infrastructure
- ssid=${SSID}
- hidden=${HIDDEN}
- [ipv4]
- method=auto
- [ipv6]
- addr-gen-mode=default
- method=auto
- [proxy]
- EOF
-
- if [ ! -z "${PASS}" ]; then
- cat <<- EOF >>${CONNFILE}
- [wifi-security]
- key-mgmt=wpa-psk
- psk=${PASS}
- EOF
- fi
-
- # NetworkManager will ignore nmconnection files with incorrect permissions,
- # to prevent Wi-Fi credentials accidentally being world-readable.
- chmod 600 ${CONNFILE}
-```
-
-So after all this searching, we're back to square one. This utility is doing exactly what we've done at the start: it writes a NetworkManager configuration file called `preconfigured.nmconnection` and it fills it in with the information that we've provided to the Imager, then changes the permissions to make sure NetworkManager can use it.
-
-# Conclusion
-
-It would be great if the Raspberry Pi OS team would expand their documentation to include this information, so that users aren't left wondering what makes the RPi Imager so special and whether their manual setup is the right way to go or rather a hack that is likely to break. For now it seems like there is one solid good approach to this problem, and we are going to see what is going to change in the next version of the Raspberry Pi OS.
-
-On this note you should remember that doing a manual configuration of NetworkManager, using the Imager, or using `raspberrypi-sys-mods` may be nearly identical right now, but when choosing which approach to use for your project you should also keep in mind the maintenance burden that this decision brings.
-
-Doing a manual configuration is easier on many levels, but only if you don't intend to support other versions of RPi OS. If you do, or if you expect to migrate when a new version comes out, you should consider doing something similar to what the Imager does: use a `firstrun.sh` file that tries to use `raspberrypi-sys-mods` and falls back to a manual configuration only if that executable is missing. That is likely to make migrations easier if the Raspberry Pi OS team should choose once again to modify the way that headless setups work.
-
-
-
\ No newline at end of file
diff --git a/content/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser.md b/content/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser.md
deleted file mode 100644
index ff1b7cd9..00000000
--- a/content/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser.md
+++ /dev/null
@@ -1,96 +0,0 @@
----
-title: "Is RAG all you need? A look at the limits of retrieval augmentation"
-date: 2024-02-21
-author: "ZanSara"
-featuredImage: "/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser/cover.jpeg"
-canonicalUrl: https://opendatascience.com/is-rag-all-you-need-a-look-at-the-limits-of-retrieval-augmentation/
-aliases:
-- /posts/is-rag-all-you-need
----
-
-*This blogpost is a teaser for [my upcoming talk](https://odsc.com/speakers/rag-the-bad-parts-and-the-good-building-a-deeper-understanding-of-this-hot-llm-paradigms-weaknesses-strengths-and-limitations/) at ODSC East 2024 in Boston, April 23-25. It is published on the ODSC blog [at this link](https://opendatascience.com/is-rag-all-you-need-a-look-at-the-limits-of-retrieval-augmentation/).*
-
----
-
-Retrieval Augmented Generation (RAG) is by far one of the most popular and effective techniques to bring LLMs to production. Introduced by a Meta [paper](https://arxiv.org/abs/2005.11401) in 2021, it since took off and evolved to become a field in itself, fueled by the immediate benefits that it provides: lowered risk of hallucinations, access to updated information, and so on. On top of this, RAG is relatively cheap to implement for the benefit it provides, especially when compared to costly techniques like LLM finetuning. This makes it a no-brainer for a lot of usecases, to the point that nowadays every production system that uses LLMs in production seems to be implemented as some form of RAG.
-
-![](/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser/rag_paper.png)
-
-*A diagram of a RAG system from the [original paper](https://arxiv.org/abs/2005.11401).*
-
-However, retrieval augmentation is not a silver bullet that many claim it is. Among all these obvious benefits, RAG brings its own set of weaknesses and limitations, which it’s good to be aware of when scale and accuracy need to be improved further.
-
-# How does a RAG application fail?
-
-At a high level, RAG introduces a retrieval step right before the LLM generation. This means that we can classify the failure modes of a RAG system into two main categories:
-
-* Retrieval failures: when the retriever returns only documents which are irrelevant to the query or misleading, which in turn gives the LLM wrong information to build the final answer from.
-
-* Generation failures: when the LLM generates a reply that is unrelated or directly contradicts the documents that were retrieved. This is a classic LLM hallucination.
-
-When developing a simple system or a PoC, these sorts of errors tends to have a limited impact on the results as long as you are using the best available tools. Powerful LLMs such as GPT 4 and Mixtral are not at all prone to hallucination when the provided documents are correct and relevant, and specialized systems such as vector databases, combined with specialized embedding models, that can easily achieve high retrieval accuracy, precision and recall on most queries.
-
-However, as the system scales to larger corpora, lower quality documents, or niche and specialized fields, these errors end up amplifying each other and may degrade the overall system performance noticeably. Having a good grasp of the underlying causes of these issues, and an idea of how to minimize them, can make a huge difference.
-
-
-![](/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser/rag_failures.png)
-
-*The difference between retrieval and generation failures. Identifying where your RAG system is more likely to fail is key to improve the quality of its answers.*
-
-# A case study: customer support chatbots
-
-This is one of the most common applications of LLMs is a chatbot that helps users by answering their questions about a product or a service. Apps like this can be used in situations that are more or less sensitive for the user and difficult for the LLM: from simple developer documentation search, customer support for airlines or banks, up to bots that provide legal or medical advice.
-
-These three systems are very similar from a high level perspective: the LLM needs to use snippets retrieved from a a corpus of documents to build a coherent answer for the user. In fact, RAG is a fitting architecture for all of them, so let’s assume that all the three systems are build more or less equally, with a retrieval step followed by a generation one.
-Let’s see what are the challenges involved in each of them.
-
-## Enhanced search for developer documentation
-
-For this usecase, RAG is usually sufficient to achieve good results. A simple proof of concept may even overshoots expectations.
-
-When present and done well, developer documentation is structured and easy for a chatbot to understand. Retrieval is usually easy and effective, and the LLM can reinterpret the retrieved snippets effectively. On top of that, hallucinations are easy to spot by the user or even by an automated system like a REPL, so they have a limited impact on the perceived quality of the results.
-
-As a bonus, the queries are very likely to always be in English, which happens to be the case for the documentation too and to be the language which LLMs are the strongest at.
-
-![](/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser/mongodb.png)
-
-*The MongoDB documentation provides a chatbot interface which is quite useful.*
-
-## Customer support bots for airlines and banks
-
-In this case, the small annoyances that are already present above have a [much stronger impact](https://www.theguardian.com/world/2024/feb/16/air-canada-chatbot-lawsuit).
-
-Even if your airline or bank’s customer support pages are top notch, hallucinations are not as easy to spot, because to make sure that the answers are accurate the user needs to check the sources that the LLM is quoting… which voids the whole point of the generation step. And what if the user cannot read such pages at all? Maybe they speak a minority language, so they can’t read them. Also, LLMs tend to perform worse on languages other than English and hallucinate more often, exacerbating the problem where it’s already more acute.
-
-![](/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser/air_canada.png)
-
-*You are going to need a very good RAG system and a huge disclaimer to avoid [this scenario](https://www.theguardian.com/world/2024/feb/16/air-canada-chatbot-lawsuit).*
-
-## Bots that provide legal or medical advice
-
-The third case brings the exact same issues to a whole new level. In these scenarios, vanilla RAG is normally not enough.
-
-Laws and scientific articles are hard to read for the average person, require specialized knowledge to understand, and they need to be read in context: asking the user to check the sources that the LLM is quoting is just not possible. And while retrieval on this type of documents is feasible, its accuracy is not as high as on simple, straightforward text.
-
-Even worse, LLMs often have no reliable background knowledge on these topics, so their reply need to be strongly grounded by relevant documents for the answers to be correct and dependable. While a simple RAG implementation is still better than a vanilla reply from GPT-4, the results can be problematic in entirely different ways.
-
-![](/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser/medical_questions.png)
-
-*[Research is being done](https://www.sciencedirect.com/science/article/pii/S2949761223000366), but the results are not promising yet.*
-
-# Ok, but what can we do?
-
-Moving your simple PoC to real world use cases without reducing the quality of the response requires a deeper understanding of how the retrieval and the generation work together. You need to be able to measure your system’s performance, to analyze the causes of the failures, and to plan experiments to improve such metrics. Often you will need to complement it with other techniques that can improve its retrieval and generation abilities to reach the quality thresholds that makes such a system useful at all.
-
-In my upcoming talk at ODSC East “RAG, the bad parts (and the good!): building a deeper understanding of this hot LLM paradigm’s weaknesses, strengths, and limitations” we are going to cover all these topics:
-
-* how to **measure the performance** of your RAG applications, from simple metrics like F1 to more sophisticated approaches like Semantic Answer Similarity.
-
-* how to **identify if you’re dealing with a retrieval or a generation failure** and where to look for a solution: is the problem in your documents content, in their size, in the way you chunk them or embed them? Or is the LLM that is causing the most trouble, maybe due to the way you are prompting it?
-
-* what **techniques can help you raise the quality of the answers**, from simple prompt engineering tricks like few-shot prompting, all the way up to finetuning, self-correction loops and entailment checks.
-
-Make sure to attend to the [talk](https://odsc.com/speakers/rag-the-bad-parts-and-the-good-building-a-deeper-understanding-of-this-hot-llm-paradigms-weaknesses-strengths-and-limitations/) to learn more about all these techniques and how to apply them in your projects.
-
-
-
\ No newline at end of file
diff --git a/content/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt.md b/content/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt.md
deleted file mode 100644
index 3ed460d0..00000000
--- a/content/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt.md
+++ /dev/null
@@ -1,129 +0,0 @@
----
-title: "ClozeGPT: Write Anki cloze cards with a custom GPT"
-date: 2024-02-28
-author: "ZanSara"
-featuredImage: "/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/cover.png"
----
-
-As everyone who has been serious about studying with [Anki](https://apps.ankiweb.net/) knows, the first step of the journey is writing your own flashcards. Writing the cards yourself is often cited as the most straigthforward way to make the review process more effective. However, this can become a big chore, and not having enough cards to study is a sure way to not learn anything.
-
-What can we do to make this process less tedious?
-
-# Write simple cards
-
-[A lot](https://www.reddit.com/r/Anki/) has been written about the best way to create Anki cards. However, as a [HackerNews commenter](https://news.ycombinator.com/item?id=39002138) once said:
-
-> One massively overlooked way to improve spaced repetition is to make easier cards.
-
-Cards can hardly be [too simple to be effective](https://www.supermemo.com/en/blog/twenty-rules-of-formulating-knowledge). You don't need to write complicated tricky questions to make sure you are making the most of your reviews. On the contrary, even a long sentence where the word you need to study is highlighted is often enough to make the review worthwhile.
-
-In the case of language learning, if you're an advanced learner one of the easiest way to create such cards is to [copy-paste a sentence](https://www.supermemo.com/en/blog/learn-whole-phrases-supertip-4) with your target word into a card and write the translation of that word (or sentence) on the back. But if you're a beginner, even these cards can be complicated both to write and to review. What if the sentence where you found the new word is too complex? You'll need to write a brand new sentence. But what if you write an incorrect sentence? And so on.
-
-# Automating the process
-
-Automated card generation has been often compared to the usage of [pre-made decks](https://www.reddit.com/r/languagelearning/comments/6ysx7g/is_there_value_in_making_your_own_anki_deck_or/), because the students don't see the content of the cards they're adding to their decks before doing so. However, this depends a lot on how much the automation is hiding from the user.
-
-In my family we're currently learning Portuguese, so we end up creating a lot of cards with Portuguese vocabulary. Given that many useful words are hard to make sense of without context, having cards with sample sentences helps me massively to remember them. But our sample sentences often sound unnatural in Portuguese, even when they're correct. It would be great if we could have a "sample sentence generator" that creates such sample sentences for me in more colloquial Portuguese!
-
-This is when we've got the idea of using an LLM to help with the task. GPT models are great sentence generators: can we get them to make some good sample sentence cards?
-
-A [quick experiment](https://chat.openai.com/share/89c821b8-6048-45f3-9fc1-c3875fdbe1c5) proves that there is potential to this concept.
-
-![](/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/chatgpt-anki-card-creation.png)
-
-# Custom GPTs
-
-The natural next step is to store that set of instructions into a custom prompt, or as they're called now, a [custom GPT](https://help.openai.com/en/articles/8554407-gpts-faq#h_40756527ce). Making these small wrapper is [really easy](https://help.openai.com/en/articles/8554397-creating-a-gpt): it requires no coding, only a well crafted prompt and a catchy name. So we called our new GPT "ClozeGPT" and started off with a prompt like this:
-
-
- Your job is to create Portuguese Anki cloze cards.
- I might give you a single word or a pair (word + translation).
-
- Front cards:
- - Use Anki's `{{c1::...}}` feature to template in cards.
- - You can create cards with multiple clozes.
- - Keep the verb focused, and don't rely too much on auxiliary verbs like
- "precisar", "gostar", etc...
- - Use the English translation as a cloze hint.
-
- Back cards:
- - The back card should contain the Portuguese word.
- - If the word could be mistaken (e.g. "levantar" vs. "acordar"),
- write a hint that can help me remember the difference.
- - The hint should be used sparingly.
-
- Examples:
-
- ---------
-
- Input: cozinhar
-
- # FRONT
- ```
- Eu {{c1::cozinho::cook}} todos os dias para minha família.
- ```
-
- # BACK
- ```
- cozinhar - to cook
- ```
- ---------
-
- Input: levantar
-
- # FRONT
- ```
- Eu preciso {{c1::levantar::get up}} cedo amanhã para ir ao trabalho.
- ```
-
- # BACK
- ```
- levantar - to get up, to raise (don't mistake this with "acordar", which is to wake up from sleep)
- ```
-
-This simple prompt already gives very nice results!
-
-![](/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/beber-flashcard.png)
-
-# Bells and whistles
-
-Naturally, once a tool works well it's hard to resist the urge to add some new features to it. So for our ClozeGPT we added a few more abilities:
-
- # Commands
-
- ## `+<>`
- Expands the back card with an extra word present in the sentence.
- Include all the previous words, plus the one given.
- In this case, only the back card needs to be printed; don't show the front card again.
-
- ## `R[: <>]`
- Regenerates the response based on the hint given.
- If the hint is absent, regenerate the sentence with a different context.
- Do not change the target words, the hint most often a different context I would like to have a sentence for.
-
- ## `Q: <>`
- This is an escape to a normal chat about a related question.
- Answer the question as usual, you don't need to generate anything.
-
-The `+` command is useful when the generated sentence contains some other interesting word you can take the occasion to learn as well:
-
-![](/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/maca-flashcard.png)
-
-The `R` command can be used to direct the card generation a bit better than with a simple press on the "Regenerate" icon:
-
-![](/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/morango-flashcard.png)
-
-And finally `Q` is a convenient escape hatch to make this GPT revert back to its usual helpful self, where it can engage in conversation.
-
-![](/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/esquecer-flashcard.png)
-
-# Have fun
-
-Our small [ClozeGPT](https://chat.openai.com/g/g-wmHCaGcCZ-clozegpt) works only for Portuguese now, but feel free to play with it if you find it useful. And, of course, always keep in mind that LLMs are only [pretending to be humans](https://chat.openai.com/share/07295647-9f43-4346-97a5-b35f62251d55).
-
-![](/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/laranja-flashcard.png)
-
-_Front: I like orange juice in my morning coffee._
-
-
-
\ No newline at end of file
diff --git a/content/posts/2024-04-14-eli5-llms.md b/content/posts/2024-04-14-eli5-llms.md
deleted file mode 100644
index 25292318..00000000
--- a/content/posts/2024-04-14-eli5-llms.md
+++ /dev/null
@@ -1,92 +0,0 @@
----
-title: "Explain me LLMs like I'm five: build a story to help anyone get the idea"
-date: 2024-04-14
-author: "ZanSara"
-featuredImage: "/posts/2024-04-14-eli5-llms/cover.png"
-aliases:
-- /posts/eli5-llms
----
-
-These days everyone's boss seems to want some form of GenAI in their products. That doesn't always make sense: however, understanding when it does and when it doesn't is not obvious even for us experts, and nearly impossible for everyone else.
-
-How can we help our colleagues understand the pros and cons of this tech, and figure out when and how it makes sense to use it?
-
-In this post I am going to outline a narrative that explains LLMs without tecnicalities and help you frame some high level technical decisions, such as RAG vs finetuning, or which specific model size to use, in a way that a non-technical audience can not only grasp but also reason about. We'll start by "translating" a few terms into their "human equivalent" and then use this metaphor to reason about the differences between RAG and finetuning.
-
-Let's dive in!
-
-# LLMs are high-school students
-
-Large Language Models are often described as "super-intelligent" entities that know far more than any human could possibly know. This makes a lot of people think that they are also extremely intelligent and are able to reason about anything in a super-human way. The reality is very different: LLMs are able to memorize and repeat far more facts that humans do, but in their abilities to reason they are often inferior to the average person.
-
-Rather than describing LLMs as all-knowing geniuses, it's much better to frame them as **an average high-school student**. They're not the smartest humans on the planet, but they can help a lot if you guide them through the process. And just as a normal person might, sometimes they forget things, and occasionally they remember them wrong.
-
-# Some LLMs are smarter than others
-
-Language models are not all born equal. Some are inherently able to do more complex reasoning, to learn more facts and to talk more smoothly in more languages.
-
-The **"IQ"** of an LLM can be approximated, more or less, to its **parameter count**. An LLM with 7 billion parameters is almost always less clever than a 40 billion parameter model, will have a harder time learning more facts, and will be harder to reason with.
-
-However, just like with real humans, there are exceptions. Recent "small" models can easily outperform older and larger models, due to improvements in the way they're built. Also, some small models are very good at some very specialized job and can outperform a large, general purpose model on that task.
-
-# LLMs learn by "studying"
-
-Another similarity to human students is that LLMs also learn all the fact they know by **"going to school"** and studying a ton of general and unrelated facts. This is what **training** an LLM means. This implies that, just like with students, an LLM needs a lot of varied material to study from. This material is what is usually called "training data" or "training dataset".
-
-They can also learn more than what they currently know and **specialize** on a topic: all they need to do is to study further on it. This is what **finetuning** represents, and as you would expect, it also needs some study material. This is normally called "finetuning data/datasets".
-
-The distinction between training and fine tuning is not much about how it's done, but mostly about **the size and contents of the dataset required**. The initial training usually takes a lot of time, computing power, and tons of very varied data, just like what's needed to bring a baby to the level of a high-schooler. Fine tuning instead looks like preparing for a specific test or exam: the study material is a lot less and a lot more specific.
-
-Keep in mind that, just like for humans, studying more can make a student a bit smarter, but it won't make it a genius. In many cases, no amount of training and/or finetuning can close the gap between the 7 billion parameter version of an LLM and the 40 billion one.
-
-# Every chat is an exam
-
-One of the most common usecases for LLMs is question answering, an NLP task where users ask questions to the model and expect a correct answer back. The fact that the answer must be correct means that this interaction is very similar to an **exam**: the LLM is being tested by the user on its knowledge.
-
-This means that, just like a student, when the LLM is used directly it has to rely on its knowledge to answer the question. If it studied the topic well it will answer accurately most of the times. However if it didn't study the subject, it will do what students always do: they will **make up stuff that sounds legit**, hoping that the teacher will not notice how little they know. This is what we call **hallucinations**.
-
-When the answer is known to the user the answer of the LLM can be graded, just like in a real exam, to make the LLM improve. This process is called **evaluation**. Just like with humans, there are many ways in which the answer can be graded: the LLM can be graded on the accuracy of the facts it recalled, or the fluency it delivered its answer with, or it can be scored on the correctness of a reasoning exercise, like a math problem. These ways of grading an LLM are called **metrics**.
-
-# Making the exams easier
-
-Hallucinations are very dangerous if the user doesn't know what the LLM was supposed to reply, so they need to be reduced, possibly eliminated entirely. It's like we really need the students to pass the exam with flying colors, no matter how much they studied.
-
-Luckily there are many ways to help our student succeed. One way to improve the score is, naturally, to make them study more and better. Giving them more time to study (**more finetuning**) and better material (**better finetuning datasets**) is one good way to make LLMs reply correctly more often. The issue is that this method is **expensive**, because it needs a lot of computing power and high quality data, and the student may still forget something during the exam.
-
-We can make the exams even easier by converting them into **open-book exams**. Instead of asking the students to memorize all the facts and recall them during the exam, we can let them bring the book and lookup the information they need when the teacher asks the question. This method can be applied to LLMs too and is called **RAG**, which stands for "retrieval augmented generation".
-
-RAG has a few interesting properties. First of all, it can make very easy even for "dumb", small LLMs to recall nearly all the important facts correctly and consistently. By letting your students carry their history books to the exam, all of them will be able to tell you the date of any historical event by just looking it up, regardless of how smart they are or how much they studied.
-
-RAG doesn't need a lot of data, but you need an **efficient way to access it**. In our metaphor, you need a well structured book with a good index to help the student find the correct facts when asked, or they might fail to find the information they need when they're quizzed.
-
-A trait that makes RAG unique is that is can be used to keep the LLM up-to-date with **information that can't be "studied"** because it changes too fast. Let's imagine a teacher that wants to quiz the students about today's stock prices. They can't expect the pupils to know them if they don't have access to the latest financial data. Even if they were to study the prices every hour the result would be quite pointless, because all the knowledge they acquire becomes immediately irrelevant and might even confuse them.
-
-Last but not least, RAG can be used *together* with finetuning. Just as a teacher can make students study the topic and then also bring the book to the exam to make sure they will answer correctly, you can also use RAG and finetuning together.
-
-However, there are situations where RAG doesn't help. For example, it's pointless if the questions are asked in language that the LLM doesn't know, or if the exam is made of tasks that require complex reasoning. This is true for human students too: books won't help them much to understand a foreign language to the point that they can take an exam in it, and won't be useful to crack a hard math problem. For these sort of exams the students just need to be smart and study more, which in LLM terms means that you should prefer a large model and you probably need to finetune it.
-
-# Telling a story
-
-Let's recap the terminology we used:
-
-- The **LLM** is a **student**
-- The **LLM's IQ** corresponds to its **parameter count**
-- **Training** an LLM is the same as making a student **go to school**
-- **Finetuning** it means to make it **specialize on a subject** by making it study only books and other material on the subject
-- A **training dataset** is the **books and material** the student needs to study on
-- **User interactions** are like **university exams**
-- **Evaluating** an LLM means to **score its answers** as if they were the responses to a test
-- A **metric** is a **type of evaluation** that focuses on a specific trait of the answer
-- A **hallucination** is a **wrong answer** that the LLM makes up just like a student would, to in order to try passing an exam when it doesn't know the answer or can't recall it in that moment
-- **RAG (retrieval augmented generation)** is like an **open-book exam**: it gives the LLM access to some material on the question's topic, so it won't need to hallucinate an answer. It will help the LLM recall facts, but it won't make it smarter.
-
-By drawing a parallel with a human student it can be a lot easier to explain to non-technical audience why some decisions were taken.
-
-For example, it might not be obvious why RAG is cheaper than finetuning, because both need domain-specific data. By explaining that RAG is like an open-book exam versus a closed-book one, the difference is clearer: the students need less time and effort to prepare and they're less likely to make trivial mistakes if they can bring the book with them at the exam.
-
-Another example is hallucinations. It's difficult for many people to understand why LLMs don't like to say "I don't know", until they realise that from the LLM's perspective every question is like an exam: better make up something that admit they're unprepared! And so on.
-
-Building a shared, simple intuition of how LLM works is a very powerful tool. Next time you're asked to explain a technical decision related to LLMs, building a story around it may get the message across in a much more effective way and help everyone be on the same page. Give it a try!
-
-
-
\ No newline at end of file
diff --git a/content/posts/2024-04-29-odsc-east-rag.md b/content/posts/2024-04-29-odsc-east-rag.md
deleted file mode 100644
index 7159a19c..00000000
--- a/content/posts/2024-04-29-odsc-east-rag.md
+++ /dev/null
@@ -1,361 +0,0 @@
----
-title: "RAG, the bad parts (and the good!)"
-date: 2024-04-29
-author: "ZanSara"
-featuredImage: "/posts/2024-04-29-odsc-east-rag/cover.png"
-aliases:
- - 2024-04-29-odsc-east-rag-talk-summary
----
-
-*This is a writeup of my talk at [ODSC East 2024](/talks/2024-04-25-odsc-east-rag/) and [EuroPython 2024](/talks/2024-07-10-europython-rag/).*
-
----
-
-If you've been at any AI or Python conference this year, there's one acronym that you've probably heard in nearly every talk: it's RAG. RAG is one of the most used techniques to enhance LLMs in production, but why is it so? And what are its weak points?
-
-In this post, we will first describe what RAG is and how it works at a high level. We will then see what type of failures we may encounter, how they happen, and a few reasons that may trigger these issues. Next, we will look at a few tools to help us evaluate a RAG application in production. Last, we're going to list a few techniques to enhance your RAG app and make it more capable in a variety of scenarios.
-
-Let's dive in.
-
-# Outline
-
-- [What is RAG?](#what-is-rag)
-- [Why should I use it?](#why-should-i-use-it)
- - [A weather chatbot](#a-weather-chatbot)
- - [A real-world example](#a-real-world-example)
-- [Failure modes](#failure-modes)
- - [Retrieval failure](#retrieval-failure)
- - [Generation failure](#generation-failure)
-- [Evaluation strategies](#evaluation-strategies)
- - [Evaluating Retrieval](#evaluating-retrieval)
- - [Evaluating Generation](#evaluating-generation)
- - [End-to-end evaluation](#end-to-end-evaluation)
- - [Putting it all together](#putting-it-all-together)
-- [Advanced flavors of RAG](#advanced-flavors-of-rag)
- - [Use multiple retrievers](#use-multiple-retrievers)
- - [Self-correction](#self-correction)
- - [Agentic RAG](#agentic-rag)
- - [Multihop RAG](#multihop-rag)
-- [A word on finetuning](#a-word-on-finetuning)
-- [Conclusion](#conclusion)
-
-
-# What is RAG?
-
-RAG stands for **R**etrieval **A**ugmented **G**eneration, which can be explained as: "A technique to **augment** LLM’s knowledge beyond its training data by **retrieving** contextual information before a **generating** an answer."
-
-![](/posts/2024-04-29-odsc-east-rag/rag-diagram.png)
-
-RAG is a technique that works best for question-answering tasks, such as chatbots or similar knowledge extraction applications. This means that the user of a RAG app is a user who needs an answer to a question.
-
-The first step of RAG is to take the question and hand it over to a component called [**retriever**](https://docs.haystack.deepset.ai/docs/retrievers?utm_campaign=odsc-east). A retriever is any system that, given a question, can find data relevant to the question within a vast dataset, be it text, images, rows in a DB, or anything else.
-
-When implementing RAG, many developers think immediately that a vector database is necessary for retrieval. While vector databases such as [Qdrant](https://haystack.deepset.ai/integrations/qdrant-document-store?utm_campaign=odsc-east), [ChromaDB](https://haystack.deepset.ai/integrations/chroma-documentstore?utm_campaign=odsc-east), [Weaviate](https://haystack.deepset.ai/integrations/weaviate-document-store?utm_campaign=odsc-east) and so on, are great for retrieval in some applications, they're not the only option. Keyword-based algorithms such as [Elasticsearch BM25](https://haystack.deepset.ai/integrations/elasticsearch-document-store?utm_campaign=odsc-east) or TF-IDF can be used as retrievers in a RAG application, and you can even go as far as using a [web search engine API](https://docs.haystack.deepset.ai/docs/websearch?utm_campaign=odsc-east), such as Google or Bing. Anything that is given a question and can return information relevant to the question can be used here.
-
-Once our retriever sifted through all the data and returned a few relevant snippets of context, the question and the context are assembled into a **RAG prompt**. It looks like this:
-
-```markdown
-Read the text below and answer the question at the bottom.
-
-Text: [all the text found by the retriever]
-
-Question: [the user's question]
-```
-
-This prompt is then fed to the last component, called a [**generator**](https://docs.haystack.deepset.ai/docs/components_overview#generators?utm_campaign=odsc-east). A generator is any system that, given a prompt, can answer the question that it contains. In practice, "generator" is an umbrella term for any LLM, be it behind an API like GPT-3.5 or running locally, such as a Llama model. The generator receives the prompt, reads and understands it, and then writes down an answer that can be given back to the user, closing the loop.
-
-# Why should I use it?
-
-There are three main benefits of using a RAG architecture for your LLM apps instead of querying the LLM directly.
-
-1. **Reduces hallucinations**. The RAG prompt contains the answer to the user's question together with the question, so the LLM doesn't need to *know* the answer, but it only needs to read the prompt and rephrase a bit of its content.
-
-2. **Allows access to fresh data**. RAG makes LLMs capable of reasoning about data that wasn't present in their training set, such as highly volatile figures, news, forecasts, and so on.
-
-3. **Increases transparency**. The retrieval step is much easier to inspect than LLM's inference process, so it's far easier to spot and fact-check any answer the LLM provides.
-
-To understand these points better, let's see an example.
-
-## A weather chatbot
-
-We're making a chatbot for a weather forecast app. Suppose the user asks an LLM directly, "Is it going to rain in Lisbon tomorrow morning?". In that case, the LLM will make up a random answer because it obviously didn't have tomorrow's weather forecast for Lisbon in its training set and knows nothing about it.
-
-When an LLM is queried with a direct question, it will use its internal knowledge to answer it. LLMs have read the entire Internet during their training phase, so they learned that whenever they saw a line such as "What's the capital of France?", the string "Paris" always appeared among the following few words. So when a user asks the same question, the answer will likely be "Paris".
-
-This "recalling from memory" process works for well-known facts but is not always practical. For more nuanced questions or something that the LLM hasn't seen during training, it often fails: in an attempt to answer the question, the LLM will make up a response that is not based on any real source. This is called a **hallucination**, one of LLMs' most common and feared failure modes.
-
-RAG helps prevent hallucinations because, in the RAG prompt, the question and all the data needed to answer it are explicitly given to the LLM. For our weather chatbot, the retriever will first do a Google search and find some data. Then, we will put together the RAG prompt. The result will look like this:
-
-```markdown
-Read the text below and answer the question at the bottom.
-
-Text: According to the weather forecast, the weather in Lisbon tomorrow
-is expected to be mostly sunny, with a high of 18°C and a low of 11°C.
-There is a 25% chance of showers in the evening.
-
-Question: Is it going to rain in Lisbon tomorrow morning?
-```
-
-Now, it's clear that the LLM doesn't have to recall anything about the weather in Lisbon from its memory because the prompt already contains the answer. The LLM only needs to rephrase the context. This makes the task much simpler and drastically reduces the chances of hallucinations.
-
-In fact, RAG is the only way to build an LLM-powered system that can answer a question like this with any confidence at all. Retraining an LLM every morning with the forecast for the day would be a lot more wasteful, require a ton of data, and won't return consistent results. Imagine if we were making a chatbot that gives you figures from the stock market!
-
-In addition, a weather chatbot built with RAG **can be fact-checked**. If users have access to the web pages that the retriever found, they can check the pages directly when the results are not convincing, which helps build trust in the application.
-
-## A real-world example
-
-If you want to compare a well-implemented RAG system with a plain LLM, you can put [ChatGPT](https://chat.openai.com/) (the free version, powered by GPT-3.5) and [Perplexity](https://www.perplexity.ai/) to the test. ChatGPT does not implement RAG, while Perplexity is one of the most effective implementations existing today.
-
-Let's ask both: "Where does ODSC East 2024 take place?"
-
-ChatGPT says:
-
-![](/posts/2024-04-29-odsc-east-rag/chatgpt.png)
-
-While Perplexity says:
-
-![](/posts/2024-04-29-odsc-east-rag/perplexity-ai.png)
-
-Note how ChatGPT clearly says that it doesn't know: this is better than many other LLMs, which would just make up a place and date. On the contrary, Perplexity states some specific facts, and in case of doubt it's easy to verify that it's right by simply checking the sources above. Even just looking at the source's URL can give users a lot more confidence in whether the answer is grounded.
-
-# Failure modes
-
-Now that we understand how RAG works, let's see what can go wrong in the process.
-
-As we've just described, a RAG app goes in two steps -- retrieval and generation. Therefore, we can classify RAG failures into two broad categories:
-
-1. **Retrieval failures**: The retriever component fails to find the correct context for the given question. The RAG prompt injects irrelevant noise into the prompt, which confuses the LLM and results in a wrong or unrelated answer.
-
-2. **Generation failures**: The LLM fails to produce a correct answer even with a proper RAG prompt containing a question and all the data needed to answer it.
-
-To understand them better, let's pretend an imaginary user poses our application the following question about a [little-known European microstate](https://en.wikipedia.org/wiki/Republic_of_Rose_Island):
-
-```markdown
-What was the official language of the Republic of Rose Island?
-```
-
-Here is what would happen in an ideal case:
-
-![](/posts/2024-04-29-odsc-east-rag/successful-query.png)
-
-First, the retriever searches the dataset (let's imagine, in this case, Wikipedia) and returns a few snippets. The retriever did a good job here, and the snippets contain clearly stated information about the official language of Rose Island. The LLM reads these snippets, understands them, and replies to the user (correctly):
-
-```markdown
-The official language of the Republic of Rose Island was Esperanto.
-```
-
-## Retrieval failure
-
-What would happen if the retrieval step didn't go as planned?
-
-![](/posts/2024-04-29-odsc-east-rag/retrieval-failure.png)
-
-Here, the retriever finds some information about Rose Island, but none of the snippets contain any information about the official language. They only say where it was located, what happened to it, and so on. So the LLM, which knows nothing about this nation except what the prompt says, takes an educated guess and replies:
-
-```markdown
-The official language of the Republic of Rose Island was Italian.
-```
-
-The wrong answer here is none of the LLM's fault: the retriever is the component to blame.
-
-When and why can retrieval fail? There are as many answers to this question as retrieval methods, so each should be inspected for its strengths and weaknesses. However there are a few reasons that are common to most of them.
-
-- **The relevant data does not exist in the database**. When the data does not exist, it's impossible to retrieve it. Many retrieval techniques, however, give a relevance score to each result that they return, so filtering out low-relevance snippets may help mitigate the issue.
-
-- **The retrieval algorithm is too naive to match a question with its relevant context**. This is a common issue for keyword-based retrieval methods such as TF-IDF or BM25 (Elasticsearch). These algorithms can't deal with synonims or resolve acronyms, so if the question and the relevant context don't share the exact same words, the retrieval won't work.
-
-- **Embedding model (if used) is too small or unsuitable for the data**. The data must be embedded before being searchable when doing a vector-based search. "Embedded" means that every snippet of context is associated with a list of numbers called an **embedding**. The quality of the embedding then determines the quality of the retrieval. If you embed your documents with a naive embedding model, or if you are dealing with a very specific domain such as narrow medical and legal niches, the embedding of your data won't be able to represent their content precisely enough for the retrieval to be successful.
-
-- **The data is not chunked properly (too big or too small chunks)**. Retrievers thrive on data that is chunked properly. Huge blocks of text will be found relevant to almost any question and will drown the LLM with information. Too small sentences or sentence fragments won't carry enough context for the LLM to benefit from the retriever's output. Proper chunking can be a huge lever to improve the quality of your retrieval.
-
-- **The data and the question are in different languages**. Keyword-based retrieval algorithms suffer from this issue the most because keywords in different languages rarely match. If you expect questions to come in a different language than the data you are retrieving from, consider adding a translation step or performing retrieval with a multilingual embedder instead.
-
-One caveat with retrieval failures is that if you're using a very powerful LLM such as GPT-4, sometimes your LLM is smart enough to understand that the retrieved context is incorrect and will discard it, **hiding the failure**. This means that it's even more important to make sure retrieval is working well in isolation, something we will see in a moment.
-
-## Generation failure
-
-Assuming that retrieval was successful, what would happen if the LLM still hallucinated?
-
-![](/posts/2024-04-29-odsc-east-rag/generation-failure.png)
-
-This is clearly an issue with our LLM: even when given all the correct data, the LLM still generated a wrong answer. Maybe our LLM doesn't know that Esperanto is even a language? Or perhaps we're using an LLM that doesn't understand English well?
-
-Naturally, each LLM will have different weak points that can trigger issues like these. Here are some common reasons why you may be getting generation failures.
-
-- **The model is too small and can’t follow instructions well**. When building in a resource-constrained environment (such as local smartphone apps or IoT), the choice of LLMs shrinks to just a few tiny models. However, the smaller the model, the less it will be able to understand natural language, and even when it does, it limits its ability to follow instructions. If you notice that your model consistently doesn't pay enough attention to the question when answering it, consider switching to a larger or newer LLM.
-
-- **The model knows too little about the domain to even understand the question**. This can happen if your domain is highly specific, uses specific terminology, or relies on uncommon acronyms. Models are trained on general-purpose text, so they might not understand some questions without finetuning, which helps specify the meaning of the most critical key terms and acronyms. When the answers given by your model somewhat address the question but miss the point entirely and stay generic or hand-wavy, this is likely the case.
-
-- **The model is not multilingual, but the questions and context may be**. It's essential that the model understands the question being asked in order to be able to answer it. The same is true for context: if the data found by the retriever is in a language that the LLM cannot understand, it won't help it answer and might even confuse it further. Always make sure that your LLM understands the languages your users use.
-
-- **The RAG prompt is not built correctly**. Some LLMs, especially older or smaller ones, may be very sensitive to how the prompt is built. If your model ignores part of the context or misses the question, the prompt might contain contradicting information, or it might be simply too large. LLMs are not always great at [finding a needle in the haystack](https://cs.stanford.edu/~nfliu/papers/lost-in-the-middle.arxiv2023.pdf): if you are consistently building huge RAG prompts and you observe generation issues, consider cutting it back to help the LLM focus on the data that actually contains the answer.
-
-![](/posts/2024-04-29-odsc-east-rag/lost-in-the-middle.png)
-
-# Evaluation strategies
-
-Once we put our RAG system in production, we should keep an eye on its performance at scale. This is where evaluation frameworks come into play.
-
-To properly evaluate the performance of RAG, it's best to perform two evaluation steps:
-
-1. **Isolated Evaluation**. Being a two-step process, failures at one stage can hide or mask the other, so it's hard to understand where the failures originate from. To address this issue, evaluate the retrieval and generation separately: both must work well in isolation.
-
-2. **End to end evaluation**. To ensure the system works well from start to finish, it's best to evaluate it as a whole. End-to-end evaluation brings its own set of challenges, but it correlates more directly to the quality of the overall app.
-
-## Evaluating Retrieval
-
-Each retrieval method has its own state-of-the-art evaluation method and framework, so it's usually best to refer to those.
-
-For **keyword-based** retrieval algorithms such as TD-IDF, BM25, PageRank, and so on, evaluation is often done by checking the keywords match well. For this, you can use [one of the many metrics](https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)) used for this purpose: [recall, precision](https://en.wikipedia.org/wiki/Precision_and_recall), [F1](https://en.wikipedia.org/wiki/F-score), [MRR](https://en.wikipedia.org/wiki/Mean_reciprocal_rank), [MAP](https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision), …
-
-For **vector-based** retrievers like vector DBs, the evaluation is more tricky because checking for matching keywords is not sufficient: the semantics of the question and the answer must evaluated for similarity. We are going to see some libraries that help with this when evaluating generation: in short, they use another LLM to judge the similarity or compute metrics like [semantic similarity](https://docs.ragas.io/en/latest/concepts/metrics/semantic_similarity.html).
-
-## Evaluating Generation
-
-![](/posts/2024-04-29-odsc-east-rag/uptrain-logo.png)
-
-Evaluating an LLM's answers to a question is still a developing art, and several libraries can help with the task. One commonly used framework is [UpTrain](https://haystack.deepset.ai/integrations/uptrain?utm_campaign=odsc-east), which implements an "LLM-as-a-judge" approach. This means that the answers given by an LLM are then evaluated by another LLM, normally a larger and more powerful model.
-
-This approach has the benefit that responses are not simply checked strictly for the presence or absence of keywords but can be evaluated according to much more sophisticated criteria like [completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness), [conciseness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-conciseness), [relevance](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-relevance), [factual accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy), [conversation quality](https://docs.uptrain.ai/predefined-evaluations/conversation-evals/user-satisfaction), and more.
-
-This approach leads to a far more detailed view of what the LLM is good at and what aspects of the generation could or should be improved. The criteria to select depend strongly on the application: for example, in medical or legal apps, factual accuracy should be the primary metric to optimize for, while in customer support, user satisfaction and conversation quality are also essential. For personal assistants, it's usually best to focus on conciseness, and so on.
-
-{{< notice info >}}
-
-💡 *UpTrain can also be used to evaluate RAG applications end-to-end. Check [its documentation](https://docs.uptrain.ai/getting-started/introduction) for details.*
-
-{{< /notice >}}
-
-## End-to-end evaluation
-
-![](/posts/2024-04-29-odsc-east-rag/ragas-logo.png)
-
-The evaluation of RAG systems end-to-end is also quite complex and can be implemented in many ways, depending on the aspect you wish to monitor. One of the simplest approaches is to focus on semantic similarity between the question and the final answer.
-
-A popular framework that can be used for such high-level evaluation is [RAGAS](https://haystack.deepset.ai/integrations/ragas?utm_campaign=odsc-east). In fact, RAGAS offers two interesting metrics:
-
-- [**Answer semantic similarity**](https://docs.ragas.io/en/stable/concepts/metrics/semantic_similarity.html). This is computed simply by taking the cosine similarity between the answer and the ground truth.
-
-- [**Answer correctness**](https://docs.ragas.io/en/stable/concepts/metrics/answer_correctness.html). Answer correctness is defined as a weighted average of the semantic similarity and the F1 score between the generated answer and the ground truth. This metric is more oriented towards fact-based answers, where F1 can help ensure that relevant facts such as dates, names, and so on are explicitly stated.
-
-On top of evaluation metrics, RAGAS also offers the capability to build [synthetic evaluation datasets](https://docs.ragas.io/en/stable/concepts/testset_generation.html) to evaluate your app against. Such datasets spare you the work-intensive process of building a real-world evaluation dataset with human-generated questions and answers but also trade high quality for volume and speed. If your domain is very specific or you need extreme quality, synthetic datasets might not be an option, but for most real-world apps, such datasets can save tons of labeling time and resources.
-
-{{< notice info >}}
-
-💡 *RAGAS can also be used to evaluate each step of a RAG application in isolation. Check [its documentation](https://docs.ragas.io/en/stable/getstarted/index.html) for details.*
-
-{{< /notice >}}
-
-
-{{< notice info >}}
-
-💡 *I recently discovered an even more comprehensive framework for end-to-end evaluation called [**continuous-eval**](https://docs.relari.ai/v0.3) from [Relari.ai](https://relari.ai/), which focuses on modular evaluation of RAG pipelines. Check it out if you're interested in this topic and RAGAS doesn't offer enough flexibility for your use case.*
-
-![](/posts/2024-04-29-odsc-east-rag/relari-logo.png)
-
-{{< /notice >}}
-
-## Putting it all together
-
-![](/posts/2024-04-29-odsc-east-rag/haystack-logo.png)
-
-Once you know how you want to evaluate your app, it's time to put it together. A convenient framework for this step is [Haystack](https://haystack.deepset.ai/?utm_campaign=odsc-east), a Python open-source LLM framework focused on building RAG applications. Haystack is an excellent choice because it can be used through all stages of the application lifecycle, from prototyping to production, including evaluation.
-
-Haystack supports several evaluation libraries including [UpTrain](https://haystack.deepset.ai/integrations/uptrain?utm_campaign=odsc-east), [RAGAS](https://haystack.deepset.ai/integrations/ragas?utm_campaign=odsc-east) and [DeepEval](https://haystack.deepset.ai/integrations/deepeval?utm_campaign=odsc-east). To understand more about how to implement and evaluate a RAG application with it, check out their tutorial about model evaluation [here](https://haystack.deepset.ai/tutorials/35_model_based_evaluation_of_rag_pipelines?utm_campaign=odsc-east).
-
-# Advanced flavors of RAG
-
-Once our RAG app is ready and deployed in production, the natural next step is to look for ways to improve it even further. RAG is a very versatile technique, and many different flavors of "advanced RAG" have been experimented with, many more than I can list here. Depending on the situation, you may focus on different aspects, so let's list some examples of tactics you can deploy to make your pipeline more powerful, context-aware, accurate, and so on.
-
-## Use multiple retrievers
-
-Sometimes, a RAG app needs access to vastly different types of data simultaneously. For example, a personal assistant might need access to the Internet, your Slack, your emails, your personal notes, and maybe even your pictures. Designing a single retriever that can handle data of so many different kinds is possible. Still, it can be a real challenge and require, in many cases, an entire data ingestion pipeline.
-
-Instead of going that way, you can instead use multiple retrievers, each specialized to a specific subset of your data: for example, one retriever that browses the web, one that searches on Slack and in your emails, one that checks for relevant pictures.
-
-When using many retrievers, however, it's often best to introduce another step called **reranking**. A reranker double-checks that all the results returned by each retriever are actually relevant and sorts them again before the RAG prompt is built. Rerankers are usually much more precise than retrievers in assessing the relative importance of various snippets of context, so they can dramatically improve the quality of the pipeline. In exceptional cases, they can be helpful even in RAG apps with a single retriever.
-
-Here is an [example](https://haystack.deepset.ai/tutorials/33_hybrid_retrieval?utm_campaign=odsc-east) of such a pipeline built with Haystack.
-
-## Self-correction
-
-We mentioned that one of the most common evaluation strategies for RAG output is "LLM-as-a-judge": the idea of using another LLM to evaluate the answer of the first. However, why use this technique only for evaluation?
-
-Self-correcting RAG apps add one extra step at the end of the pipeline: they take the answer, pass it to a second LLM, and ask it to assess whether the answer is likely to be correct. If the check fails, the second LLM will provide some feedback on why it believes the answer is wrong, and this feedback will be given back to the first LLM to try answering another time until an agreement is reached.
-
-Self-correcting LLMs can help improve the accuracy of the answers at the expense of more LLM calls per user question.
-
-## Agentic RAG
-
-In the LLMs field, the term "agent" or "agentic" is often used to identify systems that use LLMs to make decisions. In the case of a RAG application, this term refers to a system that does not always perform retrieval but decides whether to perform it by reading the question first.
-
-For example, imagine we're building a RAG app to help primary school children with their homework. When the question refers to topics like history or geography, RAG is very helpful to avoid hallucinations. However, if the question regards math, the retrieval step is entirely unnecessary, and it might even confuse the LLM by retrieving similar math problems with different answers.
-
-Making your RAG app agentic is as simple as giving the question to an LLM before retrieval in a prompt such as:
-
-```markdown
-Reply YES if the answer to this question should include facts and
-figures, NO otherwise.
-
-Question: What's the capital of France?
-```
-Then, retrieval is run or skipped depending on whether the answer is YES or NO.
-
-This is the most basic version of agentic RAG. Some advanced LLMs can do better: they support so-called "function calling," which means that they can tell you exactly how to invoke the retriever and even provide specific parameters instead of simply answering YES or NO.
-
-For more information about function calling with LLMs, check out [OpenAI's documentation](https://platform.openai.com/docs/guides/function-calling) on the topic or the equivalent documentation of your LLM provider.
-
-## Multihop RAG
-
-Multihop RAG is an even more complex version of agentic RAG. Multihop pipelines often use **chain-of-thought prompts**, a type of prompt that looks like this:
-
-```markdown
-You are a helpful and knowledgeable agent.
-
-To answer questions, you'll need to go through multiple steps involving step-by-step
-thinking and using a search engine to do web searches. The browser will respond with
-snippets of text from web pages. When you are ready for a final answer, respond with
-`Final Answer:`.
-
-Use the following format:
-
-- Question: the question to be answered
-- Thought: Reason if you have the final answer. If yes, answer the question. If not,
- find out the missing information needed to answer it.
-- Search Query: the query for the search engine
-- Observation: the search engine will respond with the results
-- Final Answer: the final answer to the question, make it short (1-5 words)
-
-Thought, Search Query, and Observation steps can be repeated multiple times, but
-sometimes, we can find an answer in the first pass.
-
----
-
-- Question: "Was the capital of France founded earlier than the discovery of America?"
-- Thought:
-```
-
-This prompt is very complex, so let's break it down:
-
-1. The LLM reads the question and decides which information to retrieve.
-2. The LLM returns a query for the search engine (or a retriever of our choice).
-3. Retrieval is run with the query the LLM provided, and the resulting context is appended to the original prompt.
-4. The entire prompt is returned to the LLM, which reads it, follows all the reasoning it did in the previous steps, and decides whether to do another search or reply to the user.
-
-Multihop RAG is used for autonomous exploration of a topic, but it can be very expensive because many LLM calls are performed, and the prompts tend to become really long really quickly. The process can also take quite some time, so it's not suitable for low-latency applications. However, the idea is quite powerful, and it can be adapted into other forms.
-
-# A word on finetuning
-
-It's important to remember that finetuning is not an alternative to RAG. Finetuning can and should be used together with RAG on very complex domains, such as medical or legal.
-
-When people think about finetuning, they usually focus on finetuning the LLM. In RAG, though, it is not only the LLM that needs to understand the question: it's crucial that the retriever understands it well, too! This means **the embedding model needs finetuning as much as the LLM**. Finetuning your embedding models, and in some cases also your reranker, can improve the effectiveness of your RAG by orders of magnitude. Such a finetune often requires only a fraction of the training data, so it's well worth the investment.
-
-Finetuning the LLM is also necessary if you need to alter its behavior in production, such as making it more colloquial, more concise, or stick to a specific voice. Prompt engineering can also achieve these effects, but it's often more brittle and can be more easily worked around. Finetuning the LLM has a much more powerful and lasting effect.
-
-# Conclusion
-
-RAG is a vast topic that could fill books: this was only an overview of some of the most important concepts to remember when working on a RAG application. For more on this topic, check out my [other blog posts](/posts) and stay tuned for [future talks](/talks)!
-
-
-
-
diff --git a/content/posts/2024-05-06-teranoptia.md b/content/posts/2024-05-06-teranoptia.md
deleted file mode 100644
index e315c00e..00000000
--- a/content/posts/2024-05-06-teranoptia.md
+++ /dev/null
@@ -1,888 +0,0 @@
----
-title: "Generating creatures with Teranoptia"
-date: 2024-05-06
-author: "ZanSara"
-featuredImage: "/posts/2024-05-06-teranoptia/cover.png"
----
-
-{{< raw >}}
-
-{{< /raw >}}
-
-Having fun with fonts doesn't always mean obsessing over kerning and ligatures. Sometimes, writing text is not even the point!
-
-You don't believe it? Type something in here.
-
-{{< raw >}}
-
-
-
-
- Characters to generate:
-
-
-
-
-
-{{< /raw >}}
-
-[Teranoptia](https://www.tunera.xyz/fonts/teranoptia/) is a cool font that lets you build small creatures by mapping each letter (and a few other characters) to a piece of a creature like a head, a tail, a leg, a wing and so on. By typing words you can create strings of creatures.
-
-Here is the glyphset:
-
-{{< raw >}}
-
-
A
A
-
-
B
B
-
-
C
C
-
-
D
D
-
-
E
E
-
-
F
F
-
-
G
G
-
-
H
H
-
-
I
I
-
-
J
J
-
-
K
K
-
-
L
L
-
-
M
M
-
-
N
N
-
-
O
O
-
-
P
P
-
-
Q
Q
-
-
R
R
-
-
S
S
-
-
T
T
-
-
U
U
-
-
V
V
-
-
W
W
-
-
X
X
-
-
Ẋ
Ẋ
-
-
Y
Y
-
-
Z
Z
-
-
Ź
Ź
-
-
Ž
Ž
-
-
Ż
Ż
-
-
a
a
-
-
b
b
-
-
ḅ
ḅ
-
-
c
c
-
-
d
d
-
-
e
e
-
-
f
f
-
-
g
g
-
-
h
h
-
-
i
i
-
-
j
j
-
-
k
k
-
-
l
l
-
-
m
m
-
-
n
n
-
-
o
o
-
-
p
p
-
-
q
q
-
-
r
r
-
-
s
s
-
-
t
t
-
-
u
u
-
-
v
v
-
-
w
w
-
-
x
x
-
-
y
y
-
-
z
z
-
-
ź
ź
-
-
ž
ž
-
-
ż
ż
-
-
,
,
-
-
*
*
-
-
(
(
-
-
)
)
-
-
{
{
-
-
}
}
-
-
[
[
-
-
]
]
-
-
‐
‐
-
-
“
“
-
-
”
”
-
-
‘
‘
-
-
’
’
-
-
«
«
-
-
»
»
-
-
‹
‹
-
-
›
›
-
-
$
$
-
-
€
€
-
-
-You'll notice that there's a lot you can do with it, from assembling simple creatures:
-
-
vTN
-
-to more complex, multi-line designs:
-
-
{Ž}
-
F] [Z
-
-{{< /raw >}}
-
-Let's play with it a bit and see how we can put together a few "correct" looking creatures.
-
-{{< notice info >}}
-
-_As you're about to notice, I'm no JavaScript developer. Don't expect high-quality JS in this post._
-
-{{< /notice >}}
-
-## Mirroring animals
-
-To begin with, let's start with a simple function: animal mirroring. The glyphset includes a mirrored version of each non-symmetric glyph, but the mapping is rather arbitrary, so we are going to need a map.
-
-Here are the pairs:
-
-
By Ev Hs Kp Nm Ri Ve Za Żź Az Cx Fu Ir Lo Ol Sh Wd Źż vE Dw Gt Jq Mn Pk Qj Tg Uf Xc Ẋḅ Yb Žž bY cX () [] {}
-
-### Animal mirror
-
-{{< raw >}}
-
-
-
-
ST»K*abd
-
-
-
-{{< /raw >}}
-
-```javascript
-const mirrorPairs = {"B": "y", "y": "B", "E": "v", "v": "E", "H": "s", "s": "H", "K": "p", "p": "K", "N": "m", "m": "N", "R": "i", "i": "R", "V": "e", "e": "V", "Z": "a", "a": "Z", "Ż": "ź", "ź": "Ż", "A": "z", "z": "A", "C": "x", "x": "C", "F": "u", "u": "F", "I": "r", "r": "I", "L": "o", "o": "L", "O": "l", "l": "O", "S": "h", "h": "S", "W": "d", "d": "W", "Ź": "ż", "ż": "Ź", "v": "E", "E": "v", "D": "w", "w": "D", "G": "t", "t": "G", "J": "q", "q": "J", "M": "n", "n": "M", "P": "k", "k": "P", "Q": "j", "j": "Q", "T": "g", "g": "T", "U": "f", "f": "U", "X": "c", "c": "X", "Ẋ": "ḅ", "ḅ": "Ẋ", "Y": "b", "b": "Y", "Ž": "ž", "ž": "Ž", "b": "Y", "Y": "b", "c": "X", "X": "c", "(": ")", ")": "(", "[": "]", "]": "[", "{": "}", "}": "{"};
-
-function mirrorAnimal(original){
- var mirror = '';
- for (i = original.length-1; i >= 0; i--){
- newChar = mirrorPairs[original.charAt(i)];
- if (newChar){
- mirror += newChar;
- } else {
- mirror += original.charAt(i)
- }
- }
- return mirror;
-}
-```
-
-
-## Random animal generation
-
-While it's fun to build complicated animals this way, you'll notice something: it's pretty hard to make them come out right by simply typing something. Most of the time you need quite careful planning. In addition there's almost no meaningful (English) word that corresponds to a well-defined creature. Very often the characters don't match, creating a sequence of "chopped" creatures.
-
-For example, "Hello" becomes:
-
-
Hello
-
-This is a problem if we want to make a parametric or random creature generator, because most of the random strings won't look good.
-
-### Naive random generator
-
-{{< raw >}}
-
- Characters to generate:
-
-
-
-
-
n]Zgameź)‐
-
-
-{{< /raw >}}
-
-```javascript
-const validChars = "ABCDEFGHIJKLMNOPQRSTUVWXẊYZŹŽŻabḅcdefghijklmnopqrstuvwxyzźžż,*(){}[]‐“”«»$"; // ‘’‹›€ excluded because they're mostly vertical
-
-function randomFrom(list){
- return list[Math.floor(Math.random() * list.length)];
-}
-
-function generateNaive(value){
- var newAnimal = '';
- for (var i = 0; i < value; i++) {
- newAnimal += randomFrom(validChars);
- }
- return newAnimal;
-}
-```
-
-Can we do better than this?
-
-## Generating "good" animals
-
-There are many ways to define "good" or "well-formed" creatures. One of the first rules we can introduce is that we don't want chopped body parts to float alone.
-
-Translating it into a rule we can implement: a character that is "open" on the right must be followed by a character that is open on the left, and a character that is _not_ open on the right must be followed by another character that is _not_ open on the left.
-
-For example, A may be followed by B to make AB, but A cannot be followed by C to make AC.
-
-In the same way, Z may be followed by A to make ZA, but Z cannot be followed by ż to make Zż.
-
-This way we will get rid of all those "chopped" monsters that make up most of the randomly generated string.
-
-To summarize, the rules we have to implement are:
-
-- Any character that is open on the right must be followed by another character that is open on the left.
-- Any character that is closed on the right must be followed by another character that is closed on the left.
-- The first character must not be open on the left.
-- The last character must not be open on the right.
-
-### Non-chopped animals generator
-
-{{< raw >}}
-
- Characters to generate:
-
-
-
-
-
suSHebQ«EIl
-
-
-{{< /raw >}}
-
-```javascript
-const charsOpenOnTheRightOnly = "yvspmieaźACFILOSWŹ({[";
-const charsOpenOnTheLeftOnly = "BEHKNRVZŻzxurolhdż)]}";
-const charsOpenOnBothSides = "DGJMPQTUXẊYŽbcwtqnkjgfcḅbžYX«»";
-const charsOpenOnNoSides = ",*-“”";
-
-const charsOpenOnTheRight = charsOpenOnTheRightOnly + charsOpenOnBothSides;
-const charsOpenOnTheLeft = charsOpenOnTheLeftOnly + charsOpenOnBothSides;
-const validInitialChars = charsOpenOnTheRightOnly + charsOpenOnNoSides;
-
-function generateNoChop(value){
- var newAnimal = '' + randomFrom(validInitialChars);
- for (var i = 0; i < value-1; i++) {
- if (charsOpenOnTheRight.indexOf(newAnimal[i]) > -1){
- newAnimal += randomFrom(charsOpenOnTheLeft);
-
- } else if (charsOpenOnTheLeftOnly.indexOf(newAnimal[i]) > -1){
- newAnimal += randomFrom(charsOpenOnTheRightOnly);
-
- } else if (charsOpenOnNoSides.indexOf(newAnimal[i]) > -1){
- newAnimal += randomFrom(validInitialChars);
- }
- }
- // Final character
- if (charsOpenOnTheRight.indexOf(newAnimal[i]) > -1){
- newAnimal += randomFrom(charsOpenOnTheLeftOnly);
- } else {
- newAnimal += randomFrom(charsOpenOnNoSides);
- }
- return newAnimal;
-}
-```
-
-The resulting animals are already quite better!
-
-There are still a few things we may want to fix. For example, some animals end up being just a pair of heads (such as sN); others instead have their bodies oriented in the wrong direction (like IgV).
-
-Let's try to get rid of those too.
-
-The trick here is to separate the characters into two groups: elements that are "facing left", elements that are "facing right", and symmetric ones. At this point, it's convenient to call them "heads", "bodies" and "tails" to make the code more understandable, like the following:
-
-- Right heads: BEHKNRVZŻ
-
-- Left heads: yvspmieaź
-
-- Right tails: ACFILOSWŹv
-
-- Left tails: zxurolhdżE
-
-- Right bodies: DGJMPQTUẊŽ
-
-- Left bodies: wtqnkjgfḅž
-
-- Entering hole: )]}
-
-- Exiting hole: ([{
-
-- Bounce & symmetric bodies: «»$bcXY
-
-- Singletons: ,*-
-
-Let's put this all together!
-
-### Oriented animals generator
-
-{{< raw >}}
-
- Characters to generate:
-
-
-
-
-
suSHebQ«EIl
-
-
-{{< /raw >}}
-
-```javascript
-const rightAnimalHeads = "BEHKNRVZŻ";
-const leftAnimalHeads = "yvspmieaź";
-const rightAnimalTails = "ACFILOSWŹv";
-const leftAnimalTails = "zxurolhdżE";
-const rightAnimalBodies = "DGJMPQTUẊŽ";
-const leftAnimalBodies = "wtqnkjgfḅž";
-const singletons = ",*‐";
-const exitingHole = "([{";
-const enteringHole = ")]}";
-const bounce = "«»$bcXY";
-
-const validStarts = leftAnimalHeads + rightAnimalTails + exitingHole;
-const validSuccessors = {
- [exitingHole + bounce]: rightAnimalHeads + rightAnimalBodies + leftAnimalBodies + leftAnimalTails + enteringHole + bounce,
- [enteringHole]: rightAnimalTails + leftAnimalHeads + exitingHole + singletons,
- [rightAnimalHeads + leftAnimalTails + singletons]: rightAnimalTails + leftAnimalHeads + exitingHole + singletons,
- [leftAnimalHeads]: leftAnimalBodies + leftAnimalBodies + leftAnimalBodies + leftAnimalTails + enteringHole + bounce,
- [rightAnimalTails]: rightAnimalBodies + rightAnimalBodies + rightAnimalBodies + rightAnimalHeads + enteringHole + bounce,
- [rightAnimalBodies]: rightAnimalBodies + rightAnimalBodies + rightAnimalBodies + rightAnimalHeads + enteringHole + bounce,
- [leftAnimalBodies]: leftAnimalBodies + leftAnimalBodies + leftAnimalBodies + leftAnimalTails + enteringHole + bounce,
-};
-const validEnds = {
- [exitingHole + bounce]: leftAnimalTails + rightAnimalHeads + enteringHole,
- [rightAnimalHeads + leftAnimalTails + enteringHole]: singletons,
- [leftAnimalHeads]: leftAnimalTails + enteringHole,
- [rightAnimalTails]: rightAnimalHeads + enteringHole,
- [rightAnimalBodies]: rightAnimalHeads,
- [leftAnimalBodies]: leftAnimalTails,
-};
-
-function generateOriented(value){
-
- var newAnimal = '' + randomFrom(validStarts);
- for (var i = 0; i < value-1; i++) {
- last_char = newAnimal[i-1];
- for (const [predecessor, successor] of Object.entries(validSuccessors)) {
- if (predecessor.indexOf(last_char) > -1){
- newAnimal += randomFrom(successor);
- break;
- }
- }
- }
- last_char = newAnimal[i-1];
- for (const [predecessor, successor] of Object.entries(validEnds)) {
- if (predecessor.indexOf(last_char) > -1){
- newAnimal += randomFrom(successor);
- break;
- }
- }
- return newAnimal;
-}
-```
-
-## A regular grammar
-
-Let's move up a level now.
-
-What we've defined up to this point is a set of rules that, given a string, determine what characters are allowed next. This is called a [**formal grammar**](https://en.wikipedia.org/wiki/Formal_grammar) in Computer Science.
-
-A grammar is defined primarily by:
-
-- an **alphabet** of symbols (our Teranoptia font).
-- a set of **starting characters**: all the characters that can be used at the start of the string (such as a or *).
-- a set of **terminating character**: all the characters that can be used to terminate the string (such as d or -).
-- a set of **production rules**: the rules needed to generate valid strings in that grammar.
-
-In our case, we're looking for a grammar that defines "well formed" animals. For example, our production rules might look like this:
-
-- S (the start of the string) → a (a)
-- a (a) → ad (ad)
-- a (a) → ab (ab)
-- b (b) → bb (bb)
-- b (b) → bd (bd)
-- d (d) → E (the end of the string)
-- , (,) → E (the end of the string)
-
-and so on. Each combination would have its own rule.
-
-There are three main types of grammars according to Chomsky's hierarchy:
-
-- **Regular grammars**: in all rules, the left-hand side is only a single nonterminal symbol and right-hand side may be the empty string, or a single terminal symbol, or a single terminal symbol followed by a nonterminal symbol, but nothing else.
-- **Context-free grammars**: in all rules, the left-hand side of each production rule consists of only a single nonterminal symbol, while the right-hand side may contain any number of terminal and non-terminal symbols.
-- **Context-sensitive grammars**: rules can contain many terminal and non-terminal characters on both sides.
-
-In our case, all the production rules look very much like the examples we defined above: one character on the left-hand side, at most two on the right-hand side. This means we're dealing with a regular grammar. And this is good news, because it means that this language can be encoded into a **regular expression**.
-
-## Building the regex
-
-Regular expressions are a very powerful tool, one that needs to be used with care. They're best used for string validation: given an arbitrary string, they are going to check whether it respects the grammar, i.e. whether the string it could have been generated by applying the rules above.
-
-Having a regex for our Teranoptia animals will allow us to search for valid animals in long lists of stroings, for example an English dictionary. Such a search would have been prohibitively expensive without a regular expression: using one, while still quite costly, is orders of magnitude more efficient.
-
-In order to build this complex regex, let's start with a very limited example: a regex that matches left-facing snakes.
-
-```regex
-^(a(b|c|X|Y)*d)+$
-```
-
-This regex is fairly straightforward: the string must start with a (a), can contain any number of b (b), c (c), X (X) and Y (Y), and must end with d (d). While we're at it, let's add a + to the end, meaning that this pattern can repeat multiple times: the string will simply contain many snakes.
-
-### Left-facing snakes regex
-
-{{< raw >}}
-
-
-
Valid
-
-
-
-{{< /raw >}}
-
-What would it take to extend it to snakes that face either side? Luckily, snake bodies are symmetrical, so we can take advantage of that and write:
-
-```regex
-^((a|W)(b|c|X|Y)*(d|Z))+$
-```
-
-### Naive snakes
-
-{{< raw >}}
-
-
-
Valid
-
-
-
-{{< /raw >}}
-
-That looks super-promising until we realize that there's a problem: this "snake" aZ also matches the regex. To generate well-formed animals we need to keep heads and tails separate. In the regex, it would look like:
-
-```regex
-^(
- (a)(b|c|X|Y)*(d) |
- (W)(b|c|X|Y)*(Z)
-)+$
-```
-
-### Correct snakes
-
-{{< raw >}}
-
-
-
Valid
-
-
-
-{{< /raw >}}
-
-Once here, building the rest of the regex is simply matter of adding the correct characters to each group. We're gonna trade some extra characters for an easier structure by duplicating the symmetric characters when needed.
-
-```regex
-^(
- // Left-facing animals
- (
- y|v|s|p|m|i|e|a|ź|(|[|{ // Left heads & exiting holes
- )(
- w|t|q|n|k|j|g|f|ḅ|ž|X|Y|b|c|$|«|» // Left & symmetric bodies
- )*(
- z|x|u|r|o|l|h|d|ż|E|)|]|} // Left tails & entering holes
- ) |
-
- // Right facing animals
- (
- A|C|F|I|L|O|S|W|Ź|v|(|[|{ // right tails & exiting holes
- )(
- D|G|J|M|P|Q|T|U|Ẋ|Ž|b|c|X|Y|$|«|» // right & symmetric bodies
- )*(
- B|E|H|K|N|R|V|Z|Ż|)|]|} // right heads & entering holes
- ) |
-
- // Singletons
- (,|-|*)
-)+$
-```
-
-### Well-formed animals regex
-
-{{< raw >}}
-
-
-
Valid
-
-
-
-{{< /raw >}}
-
-If you play with the above regex, you'll notice a slight discrepancy with what our well-formed animal generator creates. The generator can create "double-headed" monsters where a symmetric body part is inserted, like a«Z. However, the regex does not allow it. Extending it to account for these scenarios would make it even more unreadable, so this is left as an exercise for the reader.
-
-## Searching for "monstrous" words
-
-Let's put the regex to use! There must be some English words that match the regex, right?
-
-Google helpfully compiled a text file with the most frequent 10.000 English words by frequency. Let's load it up and match every line with our brand-new regex. Unfortunately Teranoptia is case-sensitive and uses quite a few odd letters and special characters, so it's unlikely we're going to find many interesting creatures. Still worth an attempt.
-
-### Monster search
-
-{{< raw >}}
-
-
-
-
-
-
-
-
-{{< /raw >}}
-
-Go ahead and put your own vocabulary file to see if your language contains more animals!
-
-## Conclusion
-
-In this post I've just put together a few exercises for fun, but these tools can be great for teaching purposes: the output is very easy to validate visually, and the grammar involved, while not trivial, is not as complex as natural language or as dry as numerical sequences. If you need something to keep your students engaged, this might be a simple trick to help them visualize the concepts better.
-
-On my side, I think I'm going to use these neat little monsters as weird [fleurons](https://en.wikipedia.org/wiki/Fleuron_(typography)) :)
-
-
-
-
----
-
-_Download Teranoptia at this link: https://www.tunera.xyz/fonts/teranoptia/_
-
-
-
-
-
-
diff --git a/content/posts/2024-06-10-the-agent-compass.md b/content/posts/2024-06-10-the-agent-compass.md
deleted file mode 100644
index 09378ce1..00000000
--- a/content/posts/2024-06-10-the-agent-compass.md
+++ /dev/null
@@ -1,213 +0,0 @@
----
-title: "The Agent Compass"
-date: 2024-06-10
-author: "ZanSara"
-featuredImage: "/posts/2024-06-10-the-agent-compass/cover.png"
----
-
-The concept of Agent is one of the vaguest out there in the post-ChatGPT landscape. The word has been used to identify systems that seem to have nothing in common with one another, from complex autonomous research systems down to a simple sequence of two predefined LLM calls. Even the distinction between Agents and techniques such as RAG and prompt engineering seems blurry at best.
-
-Let's try to shed some light on the topic by understanding just how much the term "AI Agent" covers and set some landmarks to better navigate the space.
-
-## Defining "Agent"
-
-The problem starts with the definition of "agent". For example, [Wikipedia](https://en.wikipedia.org/wiki/Software_agent) reports that a software agent is
-
-> a computer program that acts for a user or another program in a relationship of agency.
-
-This definition is extremely high-level, to the point that it could be applied to systems ranging from ChatGPT to a thermostat. However, if we restrain our definition to "LLM-powered agents", then it starts to mean something: an Agent is an LLM-powered application that is given some **agency**, which means that it can take actions to accomplish the goals set by its user. Here we see the difference between an agent and a simple chatbot, because a chatbot can only talk to a user. but don't have the agency to take any action on their behalf. Instead, an Agent is a system you can effectively delegate tasks to.
-
-In short, an LLM powered application can be called an Agent when
-
-> it can take decisions and choose to perform actions in order to achieve the goals set by the user.
-
-## Autonomous vs Conversational
-
-On top of this definition there's an additional distinction to take into account, normally brought up by the terms **autonomous** and **conversational** agents.
-
-Autonomous Agents are applications that **don't use conversation as a tool** to accomplish their goal. They can use several tools several times, but they won't produce an answer for the user until their goal is accomplished in full. These agents normally interact with a single user, the one that set their goal, and the whole result of their operations might be a simple notification that the task is done. The fact that they can understand language is rather a feature that lets them receive the user's task in natural language, understand it, and then to navigate the material they need to use (emails, webpages, etc).
-
-An example of an autonomous agent is a **virtual personal assistant**: an app that can read through your emails and, for example, pays the bills for you when they're due. This is a system that the user sets up with a few credentials and then works autonomously, without the user's supervision, on the user's own behalf, possibly without bothering them at all.
-
-On the contrary, Conversational Agents **use conversation as a tool**, often their primary one. This doesn't have to be a conversation with the person that set them off: it's usually a conversation with another party, that may or may not be aware that they're talking to an autonomous system. Naturally, they behave like agents only from the perspective of the user that assigned them the task, while in many cases they have very limited or no agency from the perspective of the users that holds the conversation with them.
-
-An example of a conversational agent is a **virtual salesman**: an app that takes a list of potential clients and calls them one by one, trying to persuade them to buy. From the perspective of the clients receiving the call this bot is not an agent: it can perform no actions on their behalf, in fact it may not be able to perform actions at all other than talking to them. But from the perspective of the salesman the bots are agents, because they're calling people for them, saving a lot of their time.
-
-The distinction between these two categories is very blurry, and **some systems may behave like both** depending on the circumnstances. For example, an autonomous agent might become a conversational one if it's configured to reschedule appointments for you by calling people, or to reply to your emails to automatically challenge parking fines, and so on. Alternatively, an LLM that asks you if it's appropriate to use a tool before using it is behaving a bit like a conversational agent, because it's using the chat to improve its odds of providing you a better result.
-
-## Degrees of agency
-
-All the distinctions we made above are best understood as a continuous spectrum rather than hard categories. Various AI systems may have more or less agency and may be tuned towards a more "autonomous" or "conversational" behavior.
-
-In order to understand this difference in practice, let's try to categorize some well-known LLM techniques and apps to see how "agentic" they are. Having two axis to measure by, we can build a simple compass like this:
-
-![a compass with two axis: no agency (left) to full agency (right) on the horizontal axis, and autonomous (bottom) to conversational (top) on the vertical axis.](/posts/2024-06-10-the-agent-compass/empty-compass.png)
-
-
Our Agent compass
-
-
-### Bare LLMs
-
-Many apps out there perform nothing more than direct calls to LLMs, such as ChatGPT's free app and other similarly simple assistants and chatbots. There are no more components to this system other than the model itself and their mode of operation is very straightforward: a user asks a question to an LLM, and the LLM replies directly.
-
-![Diagram of the operation of a direct LLM call: a user asks a question to an LLM and the LLM replies directly.](/posts/2024-06-10-the-agent-compass/direct-llm-call.png)
-
-This systems are not designed with the intent of accomplishing a goal, and neither can take any actions on the user's behalf. They focus on talking with a user in a reactive way and can do nothing else than talk back. An LLM on its own has **no agency at all**.
-
-At this level it also makes very little sense to distinguish between autonomous or conversational agent behavior, because the entire app shows no degrees of autonomy. So we can place them at the very center-left of the diagram.
-
-![the updated compass](/posts/2024-06-10-the-agent-compass/direct-llm-call-compass.png)
-
-### Basic RAG
-
-Together with direct LLM calls and simple chatbots, basic RAG is also an example of an application that does not need any agency or goals to pursue in order to function. Simple RAG apps works in two stages: first the user question is sent to a retriever system, which fetches some additional data relevant to the question. Then, the question and the additional data is sent to the LLM to formulate an answer.
-
-![Diagram of the operation of a RAG app: first the user question is sent to a retriever system, which fetches some additional data relevant to the question. Then, the question and the additional data is sent to the LLM to formulate an answer.](/posts/2024-06-10-the-agent-compass/basic-rag.png)
-
-This means that simple RAG is not an agent: the LLM has no role in the retrieval step and simply reacts to the RAG prompt, doing little more than what a direct LLM call does. **The LLM is given no agency**, takes no decisions in order to accomplish its goals, and has no tools it can decide to use, or actions it can decide to take. It's a fully pipelined, reactive system. However, we may rank basic RAG more on the autonomous side with respect to a direct LLM call, because there is one step that is done automonously (the retrieval).
-
-![the updated compass](/posts/2024-06-10-the-agent-compass/basic-rag-compass.png)
-
-### Agentic RAG
-
-Agentic RAG is a slightly more advanced version of RAG that does not always perform the retrieval step. This helps the app produce better prompts for the LLM: for example, if the user is asking a question about trivia, retrieval is very important, while if they're quizzing the LLM with some mathematical problem, retrieval might confuse the LLM by giving it examples of solutions to different puzzles, and therefore make hallucinations more likely.
-
-This means that an agentic RAG app works as follows: when the user asks a question, before calling the retriever the app checks whether the retrieval step is necessary at all. Most of the time the preliminary check is done by an LLM as well, but in theory the same check coould be done by a properly trained classifier model. Once the check is done, if retrieval was necessary it is run, otherwise the app skips directly to the LLM, which then replies to the user.
-
-![Diagram of the operation of an agentic RAG app: when the user asks a question, before calling the retriever the app checks whether the retrieval step is necessary at all. Once the check is done, if retrieval was necessary it is run, otherwise the app skips directly to the LLM, which then replies to the user.](/posts/2024-06-10-the-agent-compass/agentic-rag.png)
-
-You can see immediately that there's a fundamental difference between this type of RAG and the basic pipelined form: the app needs to **take a decision** in order to accomplish the goal of answering the user. The goal is very limited (giving a correct answer to the user), and the decision very simple (use or not use a single tool), but this little bit of agency given to the LLM makes us place an application like this definitely more towards the Agent side of the diagram.
-
-![the updated compass](/posts/2024-06-10-the-agent-compass/agentic-rag-compass.png)
-
-We keep Agentic RAG towards the Autonomous side because in the vast majority of cases the decision to invoke the retriever is kept hidden from the user.
-
-### LLMs with function calling
-
-Some LLM applications, such as ChatGPT with GPT4+ or Bing Chat, can make the LLM use some predefined tools: a web search, an image generator, and maybe a few more. The way they work is quite straightforward: when a user asks a question, the LLM first needs to decide whether it should use a tool to answer the question. If it decides that a tool is needed, it calls it, otherwise it skips directly to generating a reply, which is then sent back to the user.
-
-![Diagram of the operation of an LLM with function calling: when a user asks a question, the LLM first needs to decide whether it should use a tool to answer the question. If it decides that a tool is needed, it calls it, otherwise it skips directly to generating a reply, which is then sent back to the user.](/posts/2024-06-10-the-agent-compass/llm-with-function-calling.png)
-
-You can see how this diagram resemble agentic RAG's: before giving an answer to the user, the app needs to **take a decision**.
-
-With respect to Agentic RAG this decision is a lot more complex: it's not a simple yes/no decision, but it involves choosing which tool to use and also generate the input parameters for the selected tool that will provide the desired output. In many cases the tool's output will be given to the LLM to be re-elaborated (such as the output of a web search), while in some other it can go directly to the user (like in the case of image generators). This all implies that more agency is given to the system and, therefore, it can be placed more clearly towards the Agent end of the scale.
-
-![the updated compass](/posts/2024-06-10-the-agent-compass/llm-with-function-calling-compass.png)
-
-We place LLMs with function calling in the middle between Conversational and Autonomous because the degree to which the user is aware of this decision can vary greatly between apps. For example, Bing Chat and ChatGPT normally notify the user that they're going to use a tool when they do, and the user can instruct them to use them or not, so they're slightly more conversational.
-
-### Self correcting RAG
-
-Self-correcting RAG is a technique that improves on simple RAG by making the LLM double-check its replies before returning them to the user. It comes from an LLM evaluation technique called "LLM-as-a-judge", because an LLM is used to judge the output of a different LLM or RAG pipeline.
-
-Self-correcting RAG starts as simple RAG: when the user asks a question, the retriever is called and the results are sent to the LLM to extract an answer from. However, before returning the answer to the user, another LLM is asked to judge whether in their opinion, the answer is correct. If the second LLM agrees, the answer is sent to the user. If not, the second LLM generates a new question for the retriever and runs it again, or in other cases, it simply integrates its opinion in the prompt and runs the first LLM again.
-
-![Diagram of the operation of self correcting RAG: when the user asks a question, the retriever is called and the results are sent to the LLM to extract an answer from. However, before returning the answer to the user, another LLM is asked to judge whether in their opinion, the answer is correct. If the second LLM agrees, the answer is sent to the user. If not, the second LLM generates a new question for the retriever and runs it again, or in other cases, it simply integrates its opinion in the prompt and runs the first LLM again.](/posts/2024-06-10-the-agent-compass/self-correcting-rag.png)
-
-Self-correcting RAG can be seen as **one more step towards agentic behavior** because it unlocks a new possibility for the application: **the ability to try again**. A self-correcting RAG app has a chance to detect its own mistakes and has the agency to decide that it's better to try again, maybe with a slightly reworded question or different retrieval parameters, before answering the user. Given that this process is entirely autonomous, we'll place this technique quite towards the Autonomous end of the scale.
-
-![the updated compass](/posts/2024-06-10-the-agent-compass/self-correcting-rag-compass.png)
-
-### Chain-of-thought
-
-[Chain-of-thought](https://arxiv.org/abs/2201.11903) is a family of prompting techniques that makes the LLM "reason out loud". It's very useful when the model needs to process a very complicated question, such as a mathematical problem or a layered question like "When was the eldest sistem of the current King of Sweden born?" Assuming that the LLM knows these facts, in order to not hallucinate it's best to ask the model to proceed "step-by-step" and find out, in order:
-
-1. Who the current King of Sweden is,
-2. Whether he has an elder sister,
-3. If yes, who she is,
-3. The age of the person identified above.
-
-The LLM might know the final fact in any case, but the probability of it giving the right answer increases noticeably if the LLM is prompted this way.
-
-Chain-of-thought prompts can also be seen as the LLM accomplishing the task of finding the correct answer in steps, which implies that there are two lines of thinking going on: on one side the LLM is answering the questions it's posing to itself, while on the other it's constantly re-assessing whether it has a final answer for the user.
-
-In the example above, the chain of thought might end at step 2 if the LLM realizes that the current King of Sweden has no elder sisters (he [doesn't](https://en.wikipedia.org/wiki/Carl_XVI_Gustaf#Early_life)): the LLM needs to keep an eye of its own thought process and decide whether it needs to continue or not.
-
-We can summarize an app using chain-of-thought prompting like this: when a user asks a question, first of all the LLM reacts to the chain-of-thought prompt to lay out the sub-questions it needs to answer. Then it answers its own questions one by one, asking itself each time whether the final answer has already been found. When the LLM believes it has the final answer, it rewrites it for the user and returns it.
-
-![Diagram of the operation of a chain-of-thought LLM app: when a user asks a question, first of all the LLM reacts to the chain-of-thought prompt to lay out the sub-questions it needs to answer. Then it answers its own questions one by one, asking itself each time whether the final answer has already been found. When the LLM believes it has the final answer, it rewrites it for the user and returns it ](/posts/2024-06-10-the-agent-compass/chain-of-thought.png)
-
-This new prompting technique makes a big step towards full agency: the ability for the LLM to **assess whether the goal has been achieved** before returning any answer to the user. While apps like Bing Chat iterate with the user and need their feedback to reach high-level goals, chain-of-thought gives the LLM the freedom to check its own answers before having the user judge them, which makes the loop much faster and can increase the output quality dramatically.
-
-This process is similar to what self-correcting RAG does, but has a wider scope, because the LLM does not only need to decide whether an answer is correct, it can also decide to continue reasoning in order to make it more complete, more detailed, to phrase it better, and so on.
-
-Another interesting trait of chain-of-thought apps is that they introduce the concept of **inner monologue**. The inner monologue is a conversation that the LLM has with itself, a conversation buffer where it keeps adding messages as the reasoning develops. This monologue is not visible to the user, but helps the LLM deconstruct a complex reasoning line into a more manageable format, like a researcher that takes notes instead of keeping all their earlier reasoning inside their head all the times.
-
-Due to the wider scope of the decision-making that chain-of-thought apps are able to do, they also place in the middle of our compass They can be seen as slightly more autonomous than conversational due to the fact that they hide their internal monologue to the user.
-
-![the updated compass](/posts/2024-06-10-the-agent-compass/chain-of-thought-compass.png)
-
-From here, the next step is straightforward: using tools.
-
-### Multi-hop RAG
-
-Multi-hop RAG applications are nothing else than simple RAG apps that use chain-of-thought prompting and are free to invoke the retriever as many times as needed and only when needed.
-
-This is how it works. When the user makes a question, a chain of thought prompt is generated and sent to the LLM. The LLM assesses whether it knows the answer to the question and if not, asks itself whether a retrieval is necessary. If it decides that retrieval is necessary it calls it, otherwise it skips it and generates an answer directly. It then checks again whether the question is answered. Exiting the loop, the LLM produces a complete answer by re-reading its own inner monologue and returns this reply to the user.
-
-![Diagram of the operation of multi-hop RAG: when the user makes a question, a chain of thought prompt is generated and sent to the LLM. The LLM assesses whether it knows the answer to the question and if not, asks itself whether a retrieval is necessary. If it decides that retrieval is necessary it calls it, otherwise it skips it and generates an answer directly. It then checks again whether the question is answered. Exiting the loop, the LLM produces a complete answer by re-reading its own inner monologue and returns this reply to the user.](/posts/2024-06-10-the-agent-compass/multi-hop-rag.png)
-
-An app like this is getting quite close to a proper autonomous agent, because it can **perform its own research autonomously**. The LLM calls are made in such a way that the system is able to assess whether it knows enough to answer or whether it should do more research by formulating more questions for the retriever and then reasoning over the new collected data.
-
-Multi-hop RAG is a very powerful technique that shows a lot of agency and autonomy, and therefore can be placed in the lower-right quadrant of out compass. However, it is still limited with respect to a "true" autonomous agent, because the only action it can take is to invoke the retriever.
-
-![the updated compass](/posts/2024-06-10-the-agent-compass/multi-hop-rag-compass.png)
-
-### ReAct Agents
-
-Let's now move onto apps that can be defined proper "agents". One of the first flavor of agentic LLM apps, and still the most popular nowadays, is called "[ReAct](https://arxiv.org/abs/2210.03629)" Agents, which stands for "Reason + Act". ReAct is a prompting technique that belongs to the chain-of-thought extended family: it makes the LLM reason step by step, decide whether to perform any action, and then observe the result of the actions it took before moving further.
-
-A ReAct agent works more or less like this: when user sets a goal, the app builds a ReAct prompt, which first of all asks the LLM whether the answer is already known. If the LLM says no, the prompt makes it select a tool. The tool returns some values which are added to the inner monologue of the application toghether with the invitation to re-assess whether the goal has been accomplished. The app loops over until the answer is found, and then the answer is returned to the user.
-
-![Diagram of the operation of a ReAct Agent: when user sets a goal, the app builds a ReAct prompt, which first of all asks the LLM whether the answer is already known. If the LLM says no, the prompt makes it select a tool. The tool returns some values which are added to the inner monologue of the application toghether with the invitation to re-assess whether the goal has been accomplished. The app loops over until the answer is found, and then the answer is returned to the user.](/posts/2024-06-10-the-agent-compass/react-agent.png)
-
-As you can see, the structure is very similar to a multi-hop RAG, with an important difference: ReAct Agents normally have **many tools to choose from** rather than a single retriever. This gives them the agency to take much more complex decisions and can be finally called "agents".
-
-![the updated compass](/posts/2024-06-10-the-agent-compass/react-agent-compass.png)
-
-ReAct Agents are very autonomous in their tasks and rely on an inner monologue rather than a conversation with a user to achieve their goals. Therefore we place them very much on the Autonomous end of the spectrum.
-
-### Conversational Agents
-
-Conversational Agents are a category of apps that can vary widely. As stated earlier, conversational agents focus on using the conversation itself as a tool to accomplish goals, so in order to understand them, one has to distinguish between the people that set the goal (let's call them _owners_) and those who talk with the bot (the _users_).
-
-Once this distinction is made, this is how the most basic conversational agents normally work. First, the owner sets a goal. The application then starts a conversation with a user and, right after the first message, starts asking itself if the given goal was accomplished. It then keeps talking to the target user until it believes the goal was attained and, once done, it returns back to its owner to report the outcome.
-
-![Diagram of the operation of a Conversational Agent: first, the owner sets a goal. The application then starts a conversation with a user and, right after the first message, starts asking itself if the given goal was accomplished. It then keeps talking to the target user until it believes the goal was attained and, once done, it returns back to its owner to report the outcome.](/posts/2024-06-10-the-agent-compass/basic-conversational-agent.png)
-
-Basic conversational agents are very agentic in the sense that they can take a task off the hands of their owners and keep working on them until the goal is achieved. However, **they have varying degrees of agency** depending on how many tools they can use and how sophisticated is their ability to talk to their target users.
-
-For example, can the communication occurr over one single channel, be it email, chat, voice, or something else? Can the agent choose among different channels to reach the user? Can it perform side tasks to behalf of either party to work towards its task? There is a large variety of these agents available and no clear naming distinction between them, so depending on their abilities, their position on our compass might be very different. This is why we place them in the top center, spreading far out in both directions.
-
-![the updated compass](/posts/2024-06-10-the-agent-compass/conversational-agent-compass.png)
-
-### AI Crews
-
-By far the most advanced agent implementation available right now is called AI Crew, such as the ones provided by [CrewAI](https://www.crewai.com/). These apps take the concept of autonomous agent to the next level by making several different agents work together.
-
-The way these apps works is very flexible. For example, let's imagine we are making an AI application that can build a fully working mobile game from a simple description. This is an extremely complex task that, in real life, requires several developers. To achieve the same with an AI Crew, the crew needs to contain several agents, each one with their own special skills, tools, and background knowledge. There could be:
-
-- a Designer Agent, that has all the tools to generate artwork and assets;
-- a Writer Agent that writes the story, the copy, the dialogues, and most of the text;
-- a Frontend Developer Agent that designs and implements the user interface;
-- a Game Developer Agent that writes the code for the game itself;
-- a Manager Agent, that coordinates the work of all the other agents, keeps them on track and eventually reports the results of their work to the user.
-
-These agents interact with each other just like a team of humans would: by exchanging messages in a chat format, asking each other to perform actions for them, until their manager decides that the overall goal they were set to has been accomplished, and reports to the user.
-
-AI Crews are very advanced and dynamic systems that are still actively researched and explored. One thing that's clear though is that they show the highest level of agency of any other LLM-based app, so we can place them right at the very bottom-right end of the scale.
-
-![the updated compass](/posts/2024-06-10-the-agent-compass/ai-crews-compass.png)
-
-## Conclusion
-
-What we've seen here are just a few examples of LLM-powered applications and how close or far they are to the concept of a "real" AI agent. AI agents are still a very active area of research, and their effectiveness is getting more and more reasonable as LLMs become cheaper and more powerful.
-
-As matter of fact, with today's LLMs true AI agents are possible, but in many cases they're too brittle and expensive for real production use cases. Agentic systems today suffer from two main issues: they perform **huge and frequent LLM calls** and they **tolerate a very low error rate** in their decision making.
-
-Inner monologues can grow to an unbounded size during the agent's operation, making the context window size a potential limitation. A single bad decision can send a chain-of-thought reasoning train in a completely wrong direction and many LLM calls will be performed before the system realizes its mistake, if it does at all. However, as LLMs become faster, cheaper and smarter, the day when AI Agent will become reliable and cheap enough is nearer than many think.
-
-Let's be ready for it!
-
-
-
\ No newline at end of file
diff --git a/content/posts/2024-09-05-odsc-europe-voice-agents-part-1.md b/content/posts/2024-09-05-odsc-europe-voice-agents-part-1.md
deleted file mode 100644
index a30bcd85..00000000
--- a/content/posts/2024-09-05-odsc-europe-voice-agents-part-1.md
+++ /dev/null
@@ -1,221 +0,0 @@
----
-title: "Building Reliable Voice Bots with Open Source Tools - Part 1"
-date: 2024-09-18
-author: "ZanSara"
-featuredImage: "/posts/2024-09-05-odsc-europe-voice-agents/cover.png"
----
-
-*This is part one of the write-up of my talk at [ODSC Europe 2024](/talks/2024-09-05-odsc-europe-voice-agents/).*
-
----
-
-In the last few years, the world of voice agents saw dramatic leaps forward in the state of the art of all its most basic components. Thanks mostly to OpenAI, bots are now able to understand human speech almost like a human would, they're able to speak back with completely naturally sounding voices, and are able to hold a free conversation that feels extremely natural.
-
-But building voice bots is far from a solved problem. These improved capabilities are raising the bar, and even users accustomed to the simpler capabilities of old bots now expect a whole new level of quality when it comes to interacting with them.
-
-In this post we're going to focus mostly on **the challenges**: we'll discuss the basic structure of most voice bots today, their shortcomings and the main issues that you may face on your journey to improve the quality of the conversation.
-
-In [Part 2](/posts/2024-09-05-odsc-europe-voice-agents-part-2/) we are going to focus on **the solutions** that are available today, and we are going to build our own voice bot using [Pipecat](www.pipecat.ai), a recently released open-source library that makes building these bots a lot simpler.
-
-# Outline
-
-- [What is a voice agent?](#what-is-a-voice-agent)
- - [Speech-to-text (STT)](#speech-to-text-stt)
- - [Text-to-speech (TTS)](#text-to-speech-tts)
- - [Logic engine](#logic-engine)
- - [Tree-based](#tree-based)
- - [Intent-based](#intent-based)
- - [LLM-based](#llm-based)
-- [New challenges](#new-challenges)
- - [Real speech is not turn-based](#real-speech-is-not-turn-based)
- - [Real conversation flows are not predictable](#real-conversation-flows-are-not-predictable)
- - [LLMs bring their own problems](#llms-bring-their-own-problems)
- - [The context window](#the-context-window)
- - [Working in real time](#working-in-real-time)
-
-_Continues in [Part 2](/posts/2024-09-05-odsc-europe-voice-agents-part-2/)._
-
-
-# What is a voice agent?
-
-As the name says, voice agents are programs that are able to carry on a task and/or take actions and decisions on behalf of a user ("software agents") by using voice as their primary mean of communication (as opposed to the much more common text chat format). Voice agents are inherently harder to build than their text based counterparts: computers operate primarily with text, and the art of making machines understand human voices has been an elusive problem for decades.
-
-Today, the basic architecture of a modern voice agent can be decomposed into three main fundamental building blocks:
-
-- a **speech-to-text (STT)** component, tasked to translate an audio stream into readable text,
-- the agent's **logic engine**, which works entirely with text only,
-- a **text-to-speech (TTS)** component, which converts the bot's text responses back into an audio stream of synthetic speech.
-
-![](/posts/2024-09-05-odsc-europe-voice-agents/structure-of-a-voice-bot.png)
-
-Let's see the details of each.
-
-## Speech to text (STT)
-
-Speech-to-text software is able to convert the audio stream of a person saying something and produce a transcription of what the person said. Speech-to-text engines have a [long history](https://en.wikipedia.org/wiki/Speech_recognition#History), but their limitations have always been quite severe: they used to require fine-tuning on each individual speaker, have a rather high word error rate (WER) and they mainly worked strictly with native speakers of major languages, failing hard on foreign and uncommon accents and native speakers of less mainstream languages. These issues limited the adoption of this technology for anything else than niche software and research applications.
-
-With the [first release of OpenAI's Whisper models](https://openai.com/index/whisper/) in late 2022, the state of the art improved dramatically. Whisper enabled transcription (and even direct translation) of speech from many languages with an impressively low WER, finally comparable to the performance of a human, all with relatively low resources, higher then realtime speed, and no finetuning required. Not only, but the model was free to use, as OpenAI [open-sourced it](https://huggingface.co/openai) together with a [Python SDK](https://github.com/openai/whisper), and the details of its architecture were [published](https://cdn.openai.com/papers/whisper.pdf), allowing the scientific community to improve on it.
-
-![](/posts/2024-09-05-odsc-europe-voice-agents/whisper-wer.png)
-
-_The WER (word error rate) of Whisper was extremely impressive at the time of its publication (see the full diagram [here](https://github.com/openai/whisper/assets/266841/f4619d66-1058-4005-8f67-a9d811b77c62))._
-
-
-Since then, speech-to-text models kept improving at a steady pace. Nowadays the Whisper's family of models sees some competition for the title of best STT model from companies such as [Deepgram](https://deepgram.com/), but it's still one of the best options in terms of open-source models.
-
-## Text-to-speech (TTS)
-
-Text-to-speech model perform the exact opposite task than speech-to-text models: their goal is to convert written text into an audio stream of synthetic speech. Text-to-speech has [historically been an easier feat](https://en.wikipedia.org/wiki/Speech_synthesis#History) than speech-to-text, but it also recently saw drastic improvements in the quality of the synthetic voices, to the point that it could nearly be considered a solved problem in its most basic form.
-
-Today many companies (such as OpenAI, [Cartesia](https://cartesia.ai/sonic), [ElevenLabs](https://elevenlabs.io/), Azure and many others) offer TTS software with voices that sound nearly indistinguishable to a human. They also have the capability to clone a specific human voice with remarkably little training data (just a few seconds of speech) and to tune accents, inflections, tone and even emotion.
-
-{{< raw >}}
-
-
-
-{{< /raw >}}
-
-_[Cartesia's Sonic](https://cartesia.ai/sonic) TTS example of a gaming NPC. Note how the model subtly reproduces the breathing in between sentences._
-
-TTS is still improving in quality by the day, but due to the incredibly high quality of the output competition now tends to focus on price and performance.
-
-## Logic engine
-
-Advancements in the agent's ability to talk to users goes hand in hand with the progress of natural language understanding (NLU), another field with a [long and complicated history](https://en.wikipedia.org/wiki/Natural_language_understanding#History). Until recently, the bot's ability to understand the user's request has been severely limited and often available only for major languages.
-
-Based on the way their logic is implemented, today you may come across bots that rely on three different categories.
-
-### Tree-based
-
-Tree-based (or rule-based) logic is one of the earliest method of implementing chatbot's logic, still very popular today for its simplicity. Tree-based bots don't really try to understand what the user is saying, but listen to the user looking for a keyword or key sentence that will trigger the next step. For example, a customer support chatbot may look for the keyword "refund" to give the user any information about how to perform a refund, or the name of a discount campaign to explain the user how to take advantage of that.
-
-Tree-based logic, while somewhat functional, doesn't really resemble a conversation and can become very frustrating to the user when the conversation tree was not designed with care, because it's difficult for the end user to understand which option or keyword they should use to achieve the desired outcome. It is also unsuitable to handle real questions and requests like a human would.
-
-One of its most effective usecases is as a first-line screening to triage incoming messages.
-
-![](/posts/2024-09-05-odsc-europe-voice-agents/tree-based-logic.png)
-
-_Example of a very simple decision tree for a chatbot. While rather minimal, this bot already has several flaws: there's no way to correct the information you entered at a previous step, and it has no ability to recognize synonyms ("I want to buy an item" would trigger the fallback route.)_
-
-### Intent-based
-
-In intent-based bots, **intents** are defined roughly as "actions the users may want to do". With respect to a strict, keyword-based tree structure, intent-based bots may switch from an intent to another much more easily (because they lack a strict tree-based routing) and may use advanced AI techniques to understand what the user is actually trying to accomplish and perform the required action.
-
-Advanced voice assistants such as Siri and Alexa use variations of this intent-based system. However, as their owners are usually familiar with, interacting with an intent-based bot doesn't always feel natural, especially when the available intents don't match the user's expectation and the bot ends up triggering an unexpected action. In the long run, this ends with users carefully second-guessing what words and sentence structures activate the response they need and eventually leads to a sort of "magical incantation" style of prompting the agent, where the user has to learn what is the "magical sentence" that the bot will recognize to perform a specific intent without misunderstandings.
-
-![](/posts/2024-09-05-odsc-europe-voice-agents/amazon-echo.webp)
-
-_Modern voice assistants like Alexa and Siri are often built on the concept of intent (image from Amazon)._
-
-### LLM-based
-
-The introduction of instruction-tuned GPT models like ChatGPT revolutionized the field of natural language understanding and, with it, the way bots can be built today. LLMs are naturally good at conversation and can formulate natural replies to any sort of question, making the conversation feel much more natural than with any technique that was ever available earlier.
-
-However, LLMs tend to be harder to control. Their very ability of generating naturally sounding responses for anything makes them behave in ways that are often unexpected to the developer of the chatbot: for example, users can get the LLM-based bot to promise them anything they ask for, or they can be convinced to say something incorrect or even occasionally lie.
-
-The problem of controlling the conversation, one that traditionally was always on the user's side, is now entirely on the shoulders of the developers and can easily backfire.
-
-![](/posts/2024-09-05-odsc-europe-voice-agents/chatgpt-takesies-backsies.png)
-
-_In a rather [famous instance](https://x.com/ChrisJBakke/status/1736533308849443121), a user managed to convince a Chevrolet dealership chatbot to promise selling him a Chevy Tahoe for a single dollar._
-
-# New challenges
-
-Thanks to all these recent improvements, it would seem that making natural-sounding, smart bots is getting easier and easier. It is indeed much simpler to make a simple bot sound better, understand more and respond appropriately, but there's still a long way to go before users can interact with these new bots as they would with a human.
-
-The issue lays in the fact that **users expectations grow** with the quality of the bot. It's not enough for the bot to have a voice that sounds human: users want to be able to interact with it in a way that it feels human too, which is far more rich and interactive than what the rigid tech of earlier chatbots allowed so far.
-
-What does this mean in practice? What are the expectations that users might have from our bots?
-
-## Real speech is not turn-based
-
-Traditional bots can only handle turn-based conversations: the user talks, then the bot talks as well, then the user talks some more, and so on. A conversation with another human, however, has no such limitation: people may talk over each other, give audible feedback without interrupting, and more.
-
-Here are some examples of this richer interaction style:
-
-- **Interruptions**. Interruptions occur when a person is talking and another one starts talking at the same time. It is expected that the first person stops talking, at least for a few seconds, to understand what the interruption was about, while the second person continue to talk.
-
-- **Back-channeling**. Back-channeling is the practice of saying "ok", "sure", "right" while the other person is explaining something, to give them feedback and letting them know we're paying attention to what is being said. The person that is talking is not supposed to stop: the aim of this sort of feedback is to let them know they are being heard.
-
-- **Pinging**. This is the natural reaction a long silence, especially over a voice-only medium such as a phone call. When one of the two parties is supposed to speak but instead stays silent, the last one that talked might "ping" the silent party by asking "Are you there?", "Did you hear?", or even just "Hello?" to test whether they're being heard. This behavior is especially difficult to handle for voice agents that have a significant delay, because it may trigger an ugly vicious cycle of repetitions and delayed replies.
-
-- **Buying time**. When one of the parties know that it will stay silent for a while, a natural reaction is to notify the other party in advance by saying something like "Hold on...", "Wait a second...", "Let me check..." and so on. This message has the benefit of preventing the "pinging" behavior we've seen before and can be very useful for voice bots that may need to carry on background work during the conversation, such as looking up information.
-
-- **Audible clues**. Not everything can be transcribed by a speech-to-text model, but audio carries a lot of nuance that is often used by humans to communicate. A simple example is pitch: humans can often tell if they're talking to a child, a woman or a man by the pitch of their voice, but STT engines don't transcribe that information. So if a child picks up the phone when your bot asks for their mother or father, the model won't pick up the obvious audible clue and assume it is talking to the right person. Similar considerations should be made for tone (to detect mood, sarcasm, etc) or other sounds like laughter, sobs, and more
-
-## Real conversation flows are not predictable
-
-Tree-based bots, and to some degree intent-based too, work on the implicit assumption that conversation flows are largely predictable. Once the user said something and the bot replied accordingly, they can only follow up with a fixed set of replies and nothing else.
-
-This is often a flawed assumption and the primary reason why talking to chatbots tends to be so frustrating.
-
-In reality, natural conversations are largely unpredictable. For example, they may feature:
-
-- **Sudden changes of topic**. Maybe user and bot were talking about making a refund, but then the user changes their mind and decides to ask for assistance finding a repair center for the product. Well designed intent-based bots can deal with that, but most bots are in practice unable to do so in a way that feels natural to the user.
-
-- **Unexpected, erratic phrasing**. This is common when users are nervous or in a bad mood for any reason. Erratic, convoluted phrasing, long sentences, rambling, are all very natural ways of expressing themselves, but such outbursts very often confuse bots completely.
-
-- **Non native speakers**. Due to the nature la language learning, non native speakers may have trouble pronouncing words correctly, they may use highly unusual synonyms, or structure sentences in complicated ways. This is also difficult for bots to handle, because understanding the sentence is harder and transcription issues are far more likely.
-
-- _**Non sequitur**_. _Non sequitur_ is an umbrella term for a sequence of sentences that bear no relation to each other in a conversation. A simple example is the user asking the bot "What's the capital of France" and the bot replies "It's raining now". When done by the bot, this is often due to a severe transcription issue or a very flawed conversation design. When done by the user, it's often a malicious intent to break the bot's logic, so it should be handled with some care.
-
-## LLMs bring their own problems
-
-It may seem that some of these issues, especially the ones related to conversation flow, could be easily solved with an LLM. These models, however, bring their own set of issues too:
-
-- **Hallucinations**. This is a technical term to say that LLMs can occasionally mis-remember information, or straight up lie. The problem is that they're also very confident about their statements, sometimes to the point of trying to gaslight their users. Hallucinations are a major problem for all LLMs: although it may seem to get more manageable with larger and smarter models, the problem only gets more subtle and harder to spot.
-
-- **Misunderstandings**. While LLMs are great at understanding what the user is trying to say, they're not immune to misunderstandings. Unlike a human though, LLMs rarely suspect a misunderstanding and they rather make assumptions that ask for clarifications, resulting in surprising replies and behavior that are reminiscent of intent-based bots.
-
-- **Lack of assertiveness**. LLMs are trained to listen to the user and do their best to be helpful. This means that LLMs are also not very good at taking the lead of the conversation when we would need them to, and are easily misled and distracted by a motivated user. Preventing your model to give your user's a literary analysis of their unpublished poetry may sound silly, but it's a lot harder than many suspect.
-
-- **Prompt hacking**. Often done with malicious intent by experienced users, prompt hacking is the practice of convincing an LLM to reveal its initial instructions, ignore them and perform actions they are explicitly forbidden from. This is especially dangerous and, while a lot of work has gone into this field, this is far from a solved problem.
-
-## The context window
-
-LLMs need to keep track of the whole conversation, or at least most of it, to be effective. However, they often have a limitation to the amount of text they can keep in mind at any given time: this limit is called **context window** and for many models is still relatively low, at about 2000 tokens **(between 1500-1800 words)**.
-
-The problem is that this window also need to include all the instructions your bot needs for the conversation. This initial set of instructions is called **system prompt**, and is slightly distinct from the other messages in the conversation to make the LLM understand that it's not part of it, but it's a set of instructions about how to handle the conversation.
-
-For example, a system prompt for a customer support bot may look like this:
-
-```
-You're a friendly customer support bot named VirtualAssistant.
-You are always kind to the customer and you must do your best
-to make them feel at ease and helped.
-
-You may receive a set of different requests. If the users asks
-you to do anything that is not in the list below, kindly refuse
-to do so.
-
-# Handle refunds
-
-If the user asks you to handle a refund, perform these actions:
-- Ask for their shipping code
-- Ask for their last name
-- Use the tool `get_shipping_info` to verify the shipping exists
-...
-```
-and so on.
-
-Although very effective, system prompts have a tendency to become huge in terms of tokens. Adding information to it makes the LLM behave much more like you expect (although it's not infallible), hallucinate less, and can even shape its personality to some degree. But if the system prompt becomes too long (more than 1000 words), this means that the bot will only be able to exchange about 800 words worth of messages with the user before it starts to **forget** either its instructions or the first messages of the conversation. For example, the bot will easily forget its own name and role, or it will forget the user's name and initial demands, which can make the conversation drift completely.
-
-## Working in real time
-
-If all these issues weren't enough, there's also a fundamental issue related to voice interaction: **latency**. Voice bots interact with their users in real time: this means that the whole pipeline of transcription, understanding, formulating a reply and synthesizing it back but be very fast.
-
-How fast? On average, people expect a reply from another person to arrive within **300-500ms** to sound natural. They can normally wait for about 1-2 seconds. Any longer and they'll likely ping the bot, breaking the flow.
-
-This means that, even if we had some solutions to all of the above problems (and we do have some), these solutions needs to operate at blazing fast speed. Considering that LLM inference alone can take the better part of a second to even start being generated, latency is often one of the major issues that voice bots face when deployed at scale.
-
-![](/posts/2024-09-05-odsc-europe-voice-agents/ttft.jpg)
-
-_Time to First Token (TTFT) stats for several LLM inference providers running Llama 2 70B chat. From [LLMPerf leaderboard](https://github.com/ray-project/llmperf-leaderboard). You can see how the time it takes for a reply to even start being produced is highly variable, going up to more than one second in some scenarios._
-
-
-# To be continued...
-
-_Interested? Stay tuned for Part 2!_
-
-
-
diff --git a/content/posts/2024-09-05-odsc-europe-voice-agents-part-2.md b/content/posts/2024-09-05-odsc-europe-voice-agents-part-2.md
deleted file mode 100644
index 85f12f45..00000000
--- a/content/posts/2024-09-05-odsc-europe-voice-agents-part-2.md
+++ /dev/null
@@ -1,118 +0,0 @@
----
-title: "Building Reliable Voice Bots with Open Source Tools - Part 2"
-date: 2024-09-20
-author: "ZanSara"
-featuredImage: "/posts/2024-09-05-odsc-europe-voice-agents/cover.png"
-draft: true
----
-
-*This is part two of the write-up of my talk at [ODSC Europe 2024](/talks/2024-09-05-odsc-europe-voice-agents/).*
-
----
-
-In the last few years, the world of voice agents saw dramatic leaps forward in the state of the art of all its most basic components. Thanks mostly to OpenAI, bots are now able to understand human speech almost like a human would, they're able to speak back with completely naturally sounding voices, and are able to hold a free conversation that feels extremely natural.
-
-But building voice bots is far from a solved problem. These improved capabilities are raising the bar, and even users accustomed to the simpler capabilities of old bots now expect a whole new level of quality when it comes to interacting with them.
-
-In [Part 1](/posts/2024-09-05-odsc-europe-voice-agents-part-1/) we've seen mostly **the challenges** related to building such bot: we discussed the basic structure of most voice bots today, their shortcomings and the main issues that you may face on your journey to improve the quality of the conversation.
-
-In this post instead we will focus on **the solutions** that are available today and we are going to build our own voice bot using [Pipecat](www.pipecat.ai), a recently released open-source library that makes building these bots a lot simpler.
-
-# Outline
-
-_Start from [Part 1](/posts/2024-09-05-odsc-europe-voice-agents-part-1/)._
-
-- [Let's build a voice bot](#lets-build-a-voice-bot)
- - [Voice Activity Detection](#voice-activity-detection-vad)
- - [Blend intent's control with LLM's fluency](#blend-intents-control-with-llms-fluency)
- - [Intent detection](#intent-detection)
- - [Prompt building](#prompt-building)
- - [Reply generation](#reply-generation)
- - [What about latency](#what-about-latency)
-- [The code](#the-code)
-- [Looking forward](#looking-forward)
-
-
-# Let's build a voice bot
-
-At this point we have a comprehensive view of the issues that we need to solve to create a reliable, usable and natural-sounding voice agents. How can we actually build one?
-
-First of all, let's take a look at the structure we defined earlier and see how we can improve on it.
-
-![](/posts/2024-09-05-odsc-europe-voice-agents/structure-of-a-voice-bot.png)
-
-## Voice Activity Detection (VAD)
-
-One of the simplest improvements to this simple pipeline is the addition of a robust Voice Activity Detection (VAD) model. VAD gives the bot the ability to hear interruptions from the user and react to them accordingly, helping to break the classic, rigid turn-based interactions of old-style bots.
-
-![](/posts/2024-09-05-odsc-europe-voice-agents/structure-of-a-voice-bot-vad.png)
-
-However, on its own VAD models are not enough. To make a bot truly interruptible we also need the rest of the pipeline to be aware of the possibility of an interruption and be ready to handle it: speech-to-text models need to start transcribing and the text-to-speech component needs to stop speaking as soon as the VAD picks up speech.
-
-The logic engine also needs to handle a half-spoken reply in a graceful way: it can't just assume that the whole reply it planned to deliver was received, and neither it can drop the whole reply as it never started happening. LLMs can handle this scenario, however implementing it in practice is often not straightorward.
-
-The quality of your VAD model matters a lot, as well as tuning its parameters appropriately. You don't want the bot to interrupt itself at every ambient sound it detects, but you also want the interruption to happen promptly.
-
-Some of the best and most used models out there are [Silero](https://github.com/snakers4/silero-vad)'s VAD models, or alternatively [Picovoice](https://picovoice.ai/)'s [Cobra](https://picovoice.ai/platform/cobra/) models.
-
-## Blend intent's control with LLM's fluency
-
-Despite the distinctions we made at the start, often the logic of voice bots is implemented in a blend of more than one style. Intent-based bots may contain small decision trees, as well as LLM prompts. Often these approaches deliver the best results by taking the best of each approach to compensate for the weaknesss of the others.
-
-One of the most effective approaches is to use intent detection to help control the flow of an LLM conversation. Let's see how.
-
-![](/posts/2024-09-05-odsc-europe-voice-agents/structure-of-a-voice-bot-intent.png)
-
-Suppose we're building a general purpose customer support bot.
-
-A bot like this needs to be able to handle a huge variety of requests: helping the user renew subscriptions, buy or return items, update them on the state of a shipping, telling the opening hours of the certified repair shop closer to their home, explaining the advantages of a promotion, and more.
-
-If we decide to implement this chatbot based on intents, many intent may end up looking so similar that the bot will have trouble deciding which one suits a specific request the best: for example, a user that wants to know if there's any repair shop within a hour's drive from their home, otherwise they'll return the item.
-
-However, if we decide to implement this chatbot with an LLM, it becomes really hard to check its replies and make sure that the bot is not lying, because the amount of information it needs to handle is huge. The bot may also perform actions that it is not supposed to, like letting users return an item they have no warranty on anymore.
-
-There is an intermediate solution: **first try to detect intent, then leverage the LLM**.
-
-### Intent detection
-
-Step one is detecting the intention of the user. Given that this is a hybrid approach, we don't need to micromanage the model here and we can stick to macro-categories safely. No need to specify "opening hours of certified repair shops in New York", bur rather "information on certified repair shops" in general will suffice.
-
-This first steps narrows down drastically the information the LLM needs to handle, and it can be repeated at every message, to make sure the user is still talking about the same topic and didn't change subject completely.
-
-Intent detection can be performed with several tools, but it can be done with an LLM as well. Large models like GPT 4o are especially good at this sort of classification even when queried with a simple prompt like the following:
-
-```
-Given the following conversation, select what the user is currently talking about,
-picking an item from the "topics" list. Output only the topic name.
-
-Topics:
-
-[list of all the expected topics + catchall topics]
-
-Conversation:
-
-[conversation history]
-```
-
-### Prompt building
-
-Once we know more or less what the request is about, it's time to build the real prompt that will give us a reply for the user.
-
-With the general intent identified, we can equip the LLM strictly with the tools and information that it needs to proceed. If the user is asking about repair shops in their area, we can provide the LLM with a tool to search repair shops by zip code, a tool that would be useless if the user was asking about a shipment or a promotional campaign. Same for the background information: we don't need to tell the LLM that "you're a customer support bot", but we can narrow down its personality and background knowledge to make it focus a lot more on the task at hand, which is to help the user locating a suitable repair shop. And so on.
-
-This can be done by mapping each expected intent to a specific system prompt, pre-compiled to match the intent. At the prompt building stage we simply pick from our library of prompts and **replace the system prompt** with the one that we just selected.
-
-### Reply generation
-
-With the new, more appropriate system prompt in place at the head of the conversation, we can finally prompt the LLM again to generate a reply for the user. At this point the LLM, following the instructions of the updated prompt, has an easier time following its instructions (because they're simpler and more focused) and generated better quality answers for both the users and the developers.
-
-## What about latency?
-
-# The code
-
-
-
-# Looking forward
-
-
-
diff --git a/content/posts/drafts/2024-01-xx-respeaker.md b/content/posts/drafts/2024-01-xx-respeaker.md
deleted file mode 100644
index 7f5f2391..00000000
--- a/content/posts/drafts/2024-01-xx-respeaker.md
+++ /dev/null
@@ -1,60 +0,0 @@
----
-title: "Making my own voice assistant: Setting up ReSpeaker"
-date: 2024-01-06
-author: "ZanSara"
-featuredImage: "/posts/2024-01-xx/cover.png"
-draft: true
----
-
-In this last year voice assistants has started to become popular again. With the improvements in natural language understanding provided by LLMs, these assistants are become smarter and more useful
-
-
-Install the drivers:
-
-```
-sudo apt install git
-git clone --depth=1 https://github.com/HinTak/seeed-voicecard # Maintaned fork
-cd seeed-voicecard
-sudo ./install.sh
-reboot
-```
-
-https://forum.seeedstudio.com/t/installing-respeaker-2-mic-hat-not-working-unable-to-locate-package-and-kernel-version-issues/269986
-
-```
-aplay -l
-```
-
-```
-**** List of PLAYBACK Hardware Devices ****
-card 0: Headphones [bcm2835 Headphones], device 0: bcm2835 Headphones [bcm2835 Headphones]
- Subdevices: 8/8
- Subdevice #0: subdevice #0
- Subdevice #1: subdevice #1
- Subdevice #2: subdevice #2
- Subdevice #3: subdevice #3
- Subdevice #4: subdevice #4
- Subdevice #5: subdevice #5
- Subdevice #6: subdevice #6
- Subdevice #7: subdevice #7
-card 1: vc4hdmi [vc4-hdmi], device 0: MAI PCM i2s-hifi-0 [MAI PCM i2s-hifi-0]
- Subdevices: 1/1
- Subdevice #0: subdevice #0
-card 2: seeed2micvoicec [seeed-2mic-voicecard], device 0: bcm2835-i2s-wm8960-hifi wm8960-hifi-0 [bcm2835-i2s-wm8960-hifi wm8960-hifi-0]
- Subdevices: 1/1
- Subdevice #0: subdevice #0
-```
-
-
-
-
-
-Links:
-
-https://www.seeedstudio.com/ReSpeaker-2-Mics-Pi-HAT.html
-
-https://www.hackster.io/idreams/build-your-own-amazon-echo-using-a-rpi-and-respeaker-hat-7f44a0
-
-Original drivers: https://github.com/respeaker/seeed-voicecard
-
-LEDs: https://learn.adafruit.com/scanning-i2c-addresses/raspberry-pi
\ No newline at end of file
diff --git a/content/projects/booking-system.md b/content/projects/booking-system.md
deleted file mode 100644
index d3b6e64f..00000000
--- a/content/projects/booking-system.md
+++ /dev/null
@@ -1,15 +0,0 @@
----
-title: "CAI Sovico's Website"
-description: Small website and reservations management system
-date: 2016-01-01
-author: "ZanSara"
-featuredImage: "/projects/camerini.png"
----
-
-Main website: https://www.caisovico.it
-
----
-
-Since my bachelor studies I have maintained the IT infrastructure of an alpine hut, [Rifugio M. Del Grande - R. Camerini](https://maps.app.goo.gl/PwdVC82VHwdPZJDE6). I count this as my first important project, one that people, mostly older and not very tech savvy, depended on to run a real business.
-
-The website went through several iterations as web technologies evolved, and well as the type of servers we could afford. Right now it features minimal HTML/CSS static pages, plus a reservations system written on a PHP 8 / MySQL backend with a vanilla JS frontend. It also includes an FTP server that supports a couple of [ZanzoCams](/projects/zanzocam/) and a [weather monitoring station](http://www.meteoproject.it/ftp/stazioni/caisovico/).
\ No newline at end of file
diff --git a/content/projects/brekeke.md b/content/projects/brekeke.md
deleted file mode 100644
index 80d53257..00000000
--- a/content/projects/brekeke.md
+++ /dev/null
@@ -1,11 +0,0 @@
----
-title: "Brekeke"
-description: A collection of LLM-powered home automation scripts
-date: 2023-12-01
-author: "ZanSara"
-featuredImage: "/projects/brekeke.jpeg"
----
-
-With the rise of more and more powerful LLMs, I am experimenting with different ways to interact with them in ways that don't necessarily involve a laptop, a keyboard or a screen.
-
-I codenamed all of these experiments "Brekeke", the sound frogs make in Hungarian (don't ask why). The focus of these experiments is mostly small home automation tasks and run on a swarm of Raspberry Pis.
diff --git a/content/projects/ebisu-flashcards.md b/content/projects/ebisu-flashcards.md
deleted file mode 100644
index 7c86ab1a..00000000
--- a/content/projects/ebisu-flashcards.md
+++ /dev/null
@@ -1,8 +0,0 @@
----
-title: "Ebisu Flashcards - In Progress!"
-description: Lean, word-inflection aware flashcard application based on the Ebisu algorithm
-date: 2021-06-01
-author: "ZanSara"
-featuredImage: "/projects/ebisu-flashcards.png"
-externalLink: https://github.com/ebisu-flashcards
----
diff --git a/content/projects/zanzocam.md b/content/projects/zanzocam.md
deleted file mode 100644
index dda8375e..00000000
--- a/content/projects/zanzocam.md
+++ /dev/null
@@ -1,33 +0,0 @@
----
-title: "ZanzoCam"
-description: Remote camera for autonomous operation in isolated locations, based on Raspberry Pi.
-date: 2020-01-01
-author: "ZanSara"
-featuredImage: "/projects/zanzocam.png"
----
-
-Main website: https://zanzocam.github.io/
-
----
-
-ZanzoCam is a low-power, low-frequency camera based on Raspberry Pi, designed to operate autonomously in remote locations and under harsh conditions. It was designed and developed between 2019 and 2021 for [CAI Lombardia](https://www.cai.it/gruppo_regionale/gr-lombardia/) by a team of two people, with me as the software developer and the other responsible for the hardware design. CAI later deployed several of these devices on their affiliate huts.
-
-ZanzoCams are designed to work reliably in the harsh conditions of alpine winters, be as power-efficient as possible, and tolerate unstable network connections: they feature a robust HTTP- or FTP-based picture upload strategy which is remotely configurable from a very simple, single-file web panel. The camera software also improves on the basic capabilities of picamera to take pictures in dark conditions, making ZanzoCams able to shoot good pictures for a few hours after sunset.
-
-The camera is highly configurable: photo size and frequency, server address and protocol, all the overlays (color, size, position, text and images) and several other parameters can be configured remotely without the need to expose any ports of the device to the internet. They work reliably without the need for a VPN and at the same time are quite secure by design.
-
-ZanzoCams mostly serve CAI and the hut managers for self-promotion, and help hikers and climbers assess the local conditions before attempting a hike. Pictures taken for this purposes are sent to [RifugiLombardia](https://www.rifugi.lombardia.it/), and you can see many of them [at this page](https://www.rifugi.lombardia.it/territorio-lombardo/webcam).
-
-However, it has also been used by glaciologists to monitor glacier conditions, outlook and extension over the years. [Here you can see their webcams](https://www.servizioglaciologicolombardo.it/webcam-3), some of which are ZanzoCams.
-
-Here is the latest picture from [Rifugio M. Del Grande - R. Camerini](https://maps.app.goo.gl/PwdVC82VHwdPZJDE6), the test location for the original prototype:
-
-![ZanzoCam of Rifugio M. Del Grande - R. Camerini](https://webcam.rifugi.lombardia.it/rifugio/00003157/pictures/image__0.jpg)
-
-And here is one of the cameras serving a local glaciology research group, [Servizio Glaciologico Lombardo](https://www.servizioglaciologicolombardo.it/):
-
-![ZanzoCam of M. Disgrazia](https://webcam.rifugi.lombardia.it/rifugio/90003157/pictures/image__0.jpg)
-
-Both of these cameras are fully solar-powered.
-
-ZanzoCam is fully open-source: check the [GitHub repo](https://github.com/ZanzoCam?view_as=public). Due to this decision of open-sourcing the project, I was invited by [Università di Pavia](https://portale.unipv.it/it) to hold a lecture about the project as part of their ["Hardware and Software Codesign"](http://hsw2021.gnudd.com/). Check out the slides of the lecture [here](talks/zanzocam-pavia/).
\ No newline at end of file
diff --git a/content/publications/msc-thesis.md b/content/publications/msc-thesis.md
deleted file mode 100644
index 78490f40..00000000
--- a/content/publications/msc-thesis.md
+++ /dev/null
@@ -1,15 +0,0 @@
----
-title: "Evaluation of Qt as GUI Framework for Accelerator Controls"
-date: 2018-12-20
-author: "ZanSara"
-featuredImage: "/publications/msc-thesis.png"
----
-
-This is the full-text of my MSc thesis, written in collaboration with
-[Politecnico di Milano](https://www.polimi.it/) and [CERN](https://home.cern/).
-
----
-
-Get the full text here: [Evaluation of Qt as GUI Framework for Accelerator Controls](/publications/msc-thesis.pdf)
-
-Publisher's entry: [10589/144860](https://hdl.handle.net/10589/144860).
\ No newline at end of file
diff --git a/content/publications/thpv014.md b/content/publications/thpv014.md
deleted file mode 100644
index 7e6043c4..00000000
--- a/content/publications/thpv014.md
+++ /dev/null
@@ -1,18 +0,0 @@
----
-title: "Adopting PyQt For Beam Instrumentation GUI Development At CERN"
-date: 2022-03-01
-author: "ZanSara"
-featuredImage: "/publications/thpv014.png"
----
-
-## Abstract
-
-As Java GUI toolkits become deprecated, the Beam Instrumentation (BI)group at CERN has investigated alternatives and selected PyQt as one of the suitable technologies for future GUIs, in accordance with the paper presented at ICALEPCS19. This paper presents tools created, or adapted, to seamlessly integrate future PyQt GUI development alongside current Java oriented workflows and the controls environment. This includes (a) creating a project template and a GUI management tool to ease and standardize our development process, (b) rewriting our previously Java-centric Expert GUI Launcher to be language-agnostic and (c) porting a selection of operational GUIs from Java to PyQt, to test the feasibility of the development process and identify bottlenecks. To conclude, the challenges we anticipate for the BI GUI developer community in adopting this new technology are also discussed.
-
----
-
-Get the full text here: [Adopting PyQt For Beam Instrumentation GUI Development At CERN](/publications/thpv014.pdf)
-
-Get the poster: [PDF](/publications/thpv014-poster.pdf)
-
-Publisher's entry: [THPV014](https://accelconf.web.cern.ch/icalepcs2021/doi/JACoW-ICALEPCS2021-THPV014.html)
\ No newline at end of file
diff --git a/content/publications/thpv042.md b/content/publications/thpv042.md
deleted file mode 100644
index 7c2646c4..00000000
--- a/content/publications/thpv042.md
+++ /dev/null
@@ -1,16 +0,0 @@
----
-title: "Evolution of the CERN Beam Instrumentation Offline Analysis Framework (OAF)"
-date: 2021-12-11
-author: "ZanSara"
-featuredImage: "/publications/thpv042.png"
----
-
-## Abstract
-
-The CERN accelerators require a large number of instruments, measuring different beam parameters like position, losses, current etc. The instruments’ associated electronics and software also produce information about their status. All these data are stored in a database for later analysis. The Beam Instrumentation group developed the Offline Analysis Framework some years ago to regularly and systematically analyze these data. The framework has been successfully used for nearly 100 different analyses that ran regularly by the end of the LHC run 2. Currently it is being updated for run 3 with modern and efficient tools to improve its usability and data analysis power. In particular, the architecture has been reviewed to have a modular design to facilitate the maintenance and the future evolution of the tool. A new web based application is being developed to facilitate the users’ access both to online configuration and to results. This paper will describe all these evolutions and outline possible lines of work for further improvements.
-
----
-
-Get the full text here: [Evolution of the CERN Beam Instrumentation Offline Analysis Framework (OAF)](/publications/thpv042.pdf)
-
-Publisher's entry: [THPV042](https://accelconf.web.cern.ch/icalepcs2021/doi/JACoW-ICALEPCS2021-THPV042.html).
diff --git a/content/publications/tucpr03.md b/content/publications/tucpr03.md
deleted file mode 100644
index 7535a2af..00000000
--- a/content/publications/tucpr03.md
+++ /dev/null
@@ -1,16 +0,0 @@
----
-title: "Our Journey From Java to PyQt and Web For CERN Accelerator Control GUIs"
-date: 2020-08-30
-author: "ZanSara"
-featuredImage: "/publications/tucpr03.png"
----
-
-## Abstract
-
-For more than 15 years, operational GUIs for accelerator controls and some lab applications for equipment experts have been developed in Java, first with Swing and more recently with JavaFX. In March 2018, Oracle announced that Java GUIs were not part of their strategy anymore*. They will not ship JavaFX after Java 8 and there are hints that they would like to get rid of Swing as well. This was a wakeup call for us. We took the opportunity to reconsider all technical options for developing operational GUIs. Our options ranged from sticking with JavaFX, over using the Qt framework (either using PyQt or developing our own Java Bindings to Qt), to using Web technology both in a browser and in native desktop applications. This article explains the reasons for moving away from Java as the main GUI technology and describes the analysis and hands-on evaluations that we went through before choosing the replacement.
-
----
-
-Get the full text here: [Our Journey From Java to PyQt and Web For CERN Accelerator Control GUIs](/publications/tucpr03.pdf)
-
-Publisher's entry: [TUCPR03](https://accelconf.web.cern.ch/icalepcs2019/doi/JACoW-ICALEPCS2019-TUCPR03.html).
diff --git a/content/talks/2021-05-24-zanzocam-pavia.md b/content/talks/2021-05-24-zanzocam-pavia.md
deleted file mode 100644
index 25524fab..00000000
--- a/content/talks/2021-05-24-zanzocam-pavia.md
+++ /dev/null
@@ -1,19 +0,0 @@
----
-title: "ZanzoCam: An open-source alpine web camera"
-date: 2021-05-24
-author: "ZanSara"
-featuredImage: "/talks/2021-05-24-zanzocam-pavia.png"
----
-
-Slides: [ZanzoCam: An open-source alpine web camera](/talks/2021-05-24-zanzocam-pavia.pdf)
-
----
-
-On May 24th 2021 I held a talk about the [ZanzoCam project](https://zanzocam.github.io/en)
-as invited speaker for the ["Hardware and Software Codesign"](http://hsw2021.gnudd.com/) course at
-[Università di Pavia](https://portale.unipv.it/it).
-
-The slides go through the entire lifecycle of the [ZanzoCam project](https://zanzocam.github.io/en),
-from the very inception of it, the market research, our decision process, earlier prototypes, and
-then goes into a more detailed explanation of the the design and implementation of the project from
-a hardware and software perspective, with some notes about our financial situation and project management.
diff --git a/content/talks/2022-12-01-open-nlp-meetup.md b/content/talks/2022-12-01-open-nlp-meetup.md
deleted file mode 100644
index 085650d4..00000000
--- a/content/talks/2022-12-01-open-nlp-meetup.md
+++ /dev/null
@@ -1,41 +0,0 @@
----
-title: "OpenNLP Meetup: A Practical Introduction to Image Retrieval"
-date: 2022-12-01
-author: "ZanSara"
-featuredImage: "/talks/2022-12-01-open-nlp-meetup.png"
----
-
-[Youtube link](https://www.youtube.com/watch?v=7Idjl3OR0FY),
-[slides](https://gist.github.com/ZanSara/dc4b22e7ffe2a56647e0afba7537c46b), [Colab](https://gist.github.com/ZanSara/9e8557830cc866fcf43a2c5623688c74) (live coding).
-All the material can also be found [here](https://drive.google.com/drive/folders/1_3b8PsvykHeM0jSHsMUWQ-4h_VADutcX?usp=drive_link).
-
----
-
-{{< raw >}}
-
-
-
-{{< /raw >}}
-
-## A Practical Introduction to Image Retrieval
-
-*by Sara Zanzottera from deepset*
-
-Search should not be limited to text only. Recently, Transformers-based NLP models started crossing the boundaries of text data and exploring the possibilities of other modalities, like tabular data, images, audio files, and more. Text-to-text generation models like GPT now have their counterparts in text-to-image models, like Stable Diffusion. But what about search? In this talk we're going to experiment with CLIP, a text-to-image search model, to look for animals matching specific characteristics in a dataset of pictures. Does CLIP know which one is "The fastest animal in the world"?
-
----
-
-For the 7th [OpenNLP meetup](https://www.meetup.com/open-nlp-meetup/) I presented the topic of Image Retrieval, a feature that I've recently added to Haystack in the form of a [MultiModal Retriever](https://docs.haystack.deepset.ai/docs/retriever#multimodal-retrieval) (see the [Tutorial](https://haystack.deepset.ai/tutorials/19_text_to_image_search_pipeline_with_multimodal_retriever)).
-
-The talk consists of 5 parts:
-
-- An introduction of the topic of Image Retrieval
-- A mention of the current SOTA model (CLIP)
-- An overview of Haystack
-- A step-by-step description of how image retrieval applications can be implemented with Haystack
-- A live coding session where I start from a blank Colab notebook and build a fully working image retrieval system from the ground up, to the point where I can run queries live.
-
-Towards the end I mention briefly an even more advanced version of this image retrieval system, which I had no time to implement live. However, I later built a notebook implementing such system and you can find it here: [Cheetah.ipynb](https://gist.github.com/ZanSara/31ed3fc8252bb74b1952f2d0fe253ed0)
-
-The slides were generated from the linked Jupyter notebook with `jupyter nbconvert Dec_1st_OpenNLP_Meetup.ipynb --to slides --post serve`.
-
diff --git a/content/talks/2023-08-03-office-hours-haystack-2.0-status.md b/content/talks/2023-08-03-office-hours-haystack-2.0-status.md
deleted file mode 100644
index 75d0c7cf..00000000
--- a/content/talks/2023-08-03-office-hours-haystack-2.0-status.md
+++ /dev/null
@@ -1,22 +0,0 @@
----
-title: "Office Hours: Haystack 2.0"
-date: 2023-08-03
-author: "ZanSara"
-featuredImage: "/talks/2023-08-03-office-hours-haystack-2.0-status.png"
----
-
-[Recording](https://drive.google.com/file/d/1PyAlvJ22Z6o1bls07Do5kx2WMTdotsM7/view?usp=drive_link), [slides](https://drive.google.com/file/d/1QFNisUk2HzwRL_27bpr338maxLvDBr9D/preview). All the material can also be found [here](https://drive.google.com/drive/folders/1zmXwxsSgqDgvYf2ptjHocdtzOroqaudw?usp=drive_link).
-
----
-
-{{< raw >}}
-
-
-
-{{< /raw >}}
-
-In this [Office Hours](https://discord.com/invite/VBpFzsgRVF) I've presented for the first time to our Discord community a preview of the upcoming 2.0 release of Haystack, which has been in the works since the start of the year. As rumors started to arise at the presence of a `preview` module in the latest Haystack 1.x releases, we took the opportunity to share this early draft of the project to collect early feedback.
-
-Haystack 2.0 is a total rewrite that rethinks many of the core concepts of the framework and makes LLMs support its primary concern, but makes sure to support all the usecases its predecessor enabled. The rewrite addresses some well-know, old issues about the pipeline's design, the relationship between the pipeline, its components, and the document stores, and aims at improving drastically the developer experience and the framework's extensibility.
-
-As the main designer of this rewrite, I walked the community through a slightly re-hashed version of the slide deck I've presented internally just a few days earlier in an All Hands on the same topic.
\ No newline at end of file
diff --git a/content/talks/2023-10-12-office-hours-rag-pipelines.md b/content/talks/2023-10-12-office-hours-rag-pipelines.md
deleted file mode 100644
index a286e917..00000000
--- a/content/talks/2023-10-12-office-hours-rag-pipelines.md
+++ /dev/null
@@ -1,22 +0,0 @@
----
-title: "Office Hours: RAG Pipelines"
-date: 2023-10-12
-author: "ZanSara"
-featuredImage: "/talks/2023-10-12-office-hours-rag-pipelines.png"
----
-
-[Recording](https://drive.google.com/file/d/1UXGi4raiCQmrxOfOexL-Qh0CVbtiSm89/view?usp=drive_link), [notebook](https://gist.github.com/ZanSara/5975901eea972c126f8e1c2341686dfb). All the material can also be found [here](https://drive.google.com/drive/folders/17CIfoy6c4INs0O_X6YCa3CYXkjRvWm7X?usp=drive_link).
-
----
-
-{{< raw >}}
-
-
-
-{{< /raw >}}
-
-In this [Office Hours](https://discord.com/invite/VBpFzsgRVF) I walk through the LLM support offered by Haystack 2.0 to this date: Generator, PromptBuilder, and how to connect them to different types of Retrievers to build Retrieval Augmented Generation (RAG) applications.
-
-In under 40 minutes we start from a simple query to ChatGPT up to a full pipeline that retrieves documents from the Internet, splits them into chunks and feeds them to an LLM to ground its replies.
-
-The talk indirectly shows also how Pipelines can help users compose these systems quickly, to visualize them, and helps them connect together different parts by producing verbose error messages.
\ No newline at end of file
diff --git a/content/talks/2023-12-15-datahour-rag.md b/content/talks/2023-12-15-datahour-rag.md
deleted file mode 100644
index 28223e63..00000000
--- a/content/talks/2023-12-15-datahour-rag.md
+++ /dev/null
@@ -1,29 +0,0 @@
----
-title: "DataHour: Optimizing LLMs with Retrieval Augmented Generation and Haystack 2.0"
-date: 2023-12-15
-author: "ZanSara"
-featuredImage: "/talks/2023-12-15-datahour-rag.png"
----
-
-[Recording](https://drive.google.com/file/d/1OkFr4u9ZOraJRF406IQgQh4YC8GLHbzA/view?usp=drive_link), [slides](https://drive.google.com/file/d/1n1tbiUW2wZPGC49WK9pYEIZlZuCER-hu/view?usp=sharing), [Colab](https://drive.google.com/file/d/17FXuS7X70UF02IYmOr-yEDQYg_gp9cFv/view?usp=sharing), [gist](https://gist.github.com/ZanSara/6075d418c1494e780f7098db32bc6cf6). All the material can also be found on [Analytics Vidhya's community](https://community.analyticsvidhya.com/c/datahour/optimizing-llms-with-retrieval-augmented-generation-and-haystack-2-0) and on [my backup](https://drive.google.com/drive/folders/1KwCEDTCsm9hrRaFUPHpzdTpVsOJSnvGk?usp=drive_link).
-
----
-
-{{< raw >}}
-
-
-
-{{< /raw >}}
-
-In this hour-long workshop organized by [Analytics Vidhya](https://www.analyticsvidhya.com/) I give an overview of what RAG is, what problems it solves, and how it works.
-
-After a brief introduction to Haystack, I show in practice how to use Haystack 2.0 to assemble a Pipeline that performs RAG on a local database and then on the Web with a simple change.
-
-I also mention how to use and implement custom Haystack components, and share a lot of resources on the topic of RAG and Haystack 2.0.
-
-This was my most popular talk to date, with over a hundred attendees watching live and several questions.
-
-Other resources mentioned in the talk are:
-- [Blog post about custom components](https://haystack.deepset.ai/blog/customizing-rag-to-summarize-hacker-news-posts-with-haystack2?utm_campaign=developer-relations&utm_source=data-hour-event&utm_medium=webinar)
-- [LLM structured output example](https://haystack.deepset.ai/tutorials/28_structured_output_with_loop?utm_campaign=developer-relations&utm_source=data-hour-event&utm_medium=webinar)
-- [Advent of Haystack](https://haystack.deepset.ai/advent-of-haystack?utm_campaign=developer-relations&utm_source=data-hour-event&utm_medium=webinar)
diff --git a/content/talks/2023-12-15-pointerpodcast-haystack.md b/content/talks/2023-12-15-pointerpodcast-haystack.md
deleted file mode 100644
index 3af81362..00000000
--- a/content/talks/2023-12-15-pointerpodcast-haystack.md
+++ /dev/null
@@ -1,26 +0,0 @@
----
-title: "Pointer[183]: Haystack, creare LLM Applications in modo facile"
-date: 2023-12-15
-author: "ZanSara"
-featuredImage: "/talks/2023-12-15-pointerpodcast-haystack.png"
----
-
-[Episode link](https://pointerpodcast.it/p/pointer183-haystack-creare-llm-applications-in-modo-facile-con-stefano-fiorucci-e-sara-zanzottera). Backup recording [here](https://drive.google.com/file/d/1BOoAhfvWou_J4J7RstgKAHPs3Pre2YAw/view?usp=sharing).
-
----
-
-_The podcast was recorded in Italian for [PointerPodcast](https://pointerpodcast.it) with [Luca Corbucci](https://www.linkedin.com/in/luca-corbucci-b6156a123/), [Eugenio Paluello](https://www.linkedin.com/in/eugenio-paluello-851b3280/) and [Stefano Fiorucci](https://www.linkedin.com/in/stefano-fiorucci/)._
-
----
-
-{{< raw >}}
-
-
-
-{{< /raw >}}
-
-Per concludere in grande stile il 2023, in questa puntata ci occupiamo delle LLM che sono state un argomento centrale della scena tech dell'anno che sta per terminare. Abbiamo invitato due esperti del settore, Sara Zanzottera e Stefano Fiorucci.
-
-Entrambi i nostri ospiti lavorano per deepset come NLP Engineer. Deepset è l'azienda produttrice di Haystack uno dei framework opensource per LLM più noti, che ha da poco raggiunto la versione 2.0 beta. Proprio Haystack è stato uno degli argomenti di cui ci siamo occupati con i nostri ospiti, cercando di capirne le potenzialità.
-
-Ma è possibile riuscire a lavorare ad un framework di questo tipo rimanendo anche aggiornati sulle novità di un mondo in costante evoluzione? Questa è una delle tante domande a cui Sara e Stefano hanno risposto. Vi interessa il mondo delle LLM? Non perdetevi questa puntata!
\ No newline at end of file
diff --git a/content/talks/2024-04-25-odsc-east-rag.md b/content/talks/2024-04-25-odsc-east-rag.md
deleted file mode 100644
index acaec2ed..00000000
--- a/content/talks/2024-04-25-odsc-east-rag.md
+++ /dev/null
@@ -1,26 +0,0 @@
----
-title: "ODSC East: RAG, the bad parts (and the good!)"
-date: 2024-04-25
-author: "ZanSara"
-featuredImage: "/talks/2024-04-25-odsc-east-rag.png"
----
-
-[Announcement](https://odsc.com/speakers/rag-the-bad-parts-and-the-good-building-a-deeper-understanding-of-this-hot-llm-paradigms-weaknesses-strengths-and-limitations/),
-[slides](https://drive.google.com/file/d/19EDFCqOiAo9Cvx5fxx6Wq1Z-EoMKwxbs/view?usp=sharing).
-Did you miss the talk? Check out the [write-up](/posts/2024-04-29-odsc-east-rag).
-
----
-
-At [ODSC East 2024](https://odsc.com/boston/) I talked about RAG: how it works, how it fails, and how to evaluate its performance objectively. I gave an overview of some useful open-source tools for RAG evalution and how to use them with [Haystack](https://haystack.deepset.ai/?utm_campaign=odsc-east), and then offered some ideas on how to expand your RAG architecture further than a simple two-step process.
-
-Some resources mentioned in the talk:
-
-- Haystack: open-source LLM framework for RAG and beyond: [https://haystack.deepset.ai/](https://haystack.deepset.ai/?utm_campaign=odsc-east)
-- Build and evaluate RAG with Haystack: [https://haystack.deepset.ai/tutorials/35_model_based_evaluation_of_rag_pipelines](https://haystack.deepset.ai/tutorials/35_model_based_evaluation_of_rag_pipelines/?utm_campaign=odsc-east)
-- Evaluating LLMs with UpTrain: https://docs.uptrain.ai/getting-started/introduction
-- Evaluating RAG end-to-end with RAGAS: https://docs.ragas.io/en/latest/
-- Semantic Answer Similarity (SAS) metric: https://docs.ragas.io/en/latest/concepts/metrics/semantic_similarity.html
-- Answer Correctness metric: https://docs.ragas.io/en/latest/concepts/metrics/answer_correctness.html
-- Perplexity.ai: https://www.perplexity.ai/
-
-Plus, shout-out to a very interesting LLM evaluation library I discovered at ODSC: [continuous-eval](https://docs.relari.ai/v0.3). Worth checking out especially if SAS or answer correctness are too vague and high level for your domain.
diff --git a/content/talks/2024-07-10-europython-rag.md b/content/talks/2024-07-10-europython-rag.md
deleted file mode 100644
index 860088d7..00000000
--- a/content/talks/2024-07-10-europython-rag.md
+++ /dev/null
@@ -1,30 +0,0 @@
----
-title: "EuroPython: Is RAG all you need? A look at the limits of retrieval augmented generation"
-date: 2024-07-10
-author: "ZanSara"
-featuredImage: "/talks/2024-07-10-europython-rag.png"
----
-
-[Announcement](https://ep2024.europython.eu/session/is-rag-all-you-need-a-look-at-the-limits-of-retrieval-augmented-generation),
-[slides](https://drive.google.com/file/d/13OXMLaBQr1I_za7sqVHJWxRj5xFAg7KV/view?usp=sharing).
-Did you miss the talk? Check out the recording on [Youtube](https://youtu.be/9wk7mGB_Gp4?feature=shared)
-or on my [backup](https://drive.google.com/file/d/1OkYQ7WMt63QkdJTU3GIpSxBZmnLfZti6/view?usp=sharing) (cut from the [original stream](https://www.youtube.com/watch?v=tcXmnCJIvFc)),
-or read the [write-up](/posts/2024-04-29-odsc-east-rag) of a previous edition of the same talk.
-
----
-
-{{< raw >}}
-
-
-
-{{< /raw >}}
-
-At [EuroPython 2024](https://ep2024.europython.eu/) I talked about RAG: how it works, how it fails, and how to evaluate its performance objectively. I gave an overview of some useful open-source tools for RAG evalution such as [continuous-eval](https://docs.relari.ai/v0.3?utm_campaign=europython-2024) and how to use them with [Haystack](https://haystack.deepset.ai/?utm_campaign=europython-2024), and then offered some ideas on how to expand your RAG architecture further than a simple two-step process.
-
-Some resources mentioned in the talk:
-
-- [Haystack](https://haystack.deepset.ai/?utm_campaign=europython-2024): open-source LLM framework for RAG and beyond.
-- [continuous-eval](https://docs.relari.ai/v0.3?utm_campaign=europython-2024) by [Relari AI](https://www.relari.ai/?utm_campaign=europython-2024).
-- Build and evaluate RAG with Haystack: [https://haystack.deepset.ai/tutorials/35_model_based_evaluation_of_rag_pipelines](https://haystack.deepset.ai/tutorials/35_model_based_evaluation_of_rag_pipelines/?utm_campaign=europython-2024)
-- Use continuous-eval with Haystack:
-- Perplexity.ai: https://www.perplexity.ai/
diff --git a/content/talks/2024-09-05-odsc-europe-voice-agents.md b/content/talks/2024-09-05-odsc-europe-voice-agents.md
deleted file mode 100644
index 8c4643a7..00000000
--- a/content/talks/2024-09-05-odsc-europe-voice-agents.md
+++ /dev/null
@@ -1,32 +0,0 @@
----
-title: "ODSC Europe: Building Reliable Voice Agents with Open Source tools"
-date: 2024-09-06
-author: "ZanSara"
-featuredImage: "/talks/2024-09-06-odsc-europe-voice-agents.png"
----
-
-[Announcement](https://odsc.com/speakers/building-reliable-voice-agents-with-open-source-tools-2/),
-[slides](https://drive.google.com/file/d/1ubk7Q_l9C7epQgYrMttHMjW1AVfdm-LT/view?usp=sharing) and
-[notebook](https://colab.research.google.com/drive/1NCAAs8RB2FuqMChFKMIVWV0RiJr9O3IJ?usp=sharing).
-All resources can also be found on ODSC's website and in
-[my archive](https://drive.google.com/drive/folders/1rrXMTbfTZVuq9pMzneC8j-5GKdRQ6l2i?usp=sharing).
-Did you miss the talk? Check out the write-up [here](/posts/2024-09-05-odsc-europe-voice-agents-part-1/).
-
----
-
-_(Note: this is a recording of the notebook walkthrough only. The full recording will be shared soon)._
-
-{{< raw >}}
-
-
-
-{{< /raw >}}
-
-
-At [ODSC Europe 2024](https://odsc.com/europe/) I talked about building modern and reliable voice bots using Pipecat,
-a recently released open source tool. I gave an overview of the general structure of voice bots, of the improvements
-their underlying tech recently saw, and the new challenges that developers face when implementing one of these systems.
-
-The main highlight of the talk is the [notebook](https://colab.research.google.com/drive/1NCAAs8RB2FuqMChFKMIVWV0RiJr9O3IJ?usp=drive_link)
-where I implement first a simple Pipecat bot from scratch, and then I give an overview of how to blend intent detection
-and system prompt switching to improve our control of how LLM bots interact with users.
diff --git a/content/talks/2024-09-18-amta-2024-controlling-invariants-rag.md b/content/talks/2024-09-18-amta-2024-controlling-invariants-rag.md
deleted file mode 100644
index 3e726935..00000000
--- a/content/talks/2024-09-18-amta-2024-controlling-invariants-rag.md
+++ /dev/null
@@ -1,22 +0,0 @@
----
-title: "AMTA 2024 Virtual Tutorial Day: Controlling LLM Translations of Invariant Elements with RAG"
-date: 2024-09-18
-author: "ZanSara"
-featuredImage: "/talks/2024-09-18-amta-2024-controlling-invariants-rag.png"
----
-
-[Announcement](https://amtaweb.org/virtual-tutorial-day-program/),
-[notebook](https://colab.research.google.com/drive/1VMgK3DcVny_zTtAG_V3QSSdfSFBWAgmb?usp=sharing) and
-[glossary](https://docs.google.com/spreadsheets/d/1A1zk-u-RTSqBfE8LksZxihnp7KxWO7YK/edit?usp=sharing&ouid=102297935451395786183&rtpof=true&sd=true).
-All resources can also be found in
-[my archive](https://drive.google.com/drive/folders/1Tdq92P_E_77sErGjz7jSPfJ-or9UZXvn?usp=drive_link).
-
----
-
-_Recording coming soon._
-
----
-
-At the [AMTA 2024 Virtual Tutorial Day](https://amtaweb.org/virtual-tutorial-day-program/) I talked about controlling invariant translation elements with RAG. During the talk several speakers intervened on the topic, each bringing a different perspective of it.
-
-[Georg Kirchner](https://www.linkedin.com/in/georgkirchner/) introduced the concept of invariant translation elements, such as brand names, UI elements, and corporate slogans. [Christian Lang](https://www.linkedin.com/in/christian-lang-8942b0145/) gave a comprehensive overview of the challenges of handling invariant translation elements with existing tools and how LLMs can help at various stages of the translation, covering several approaches, including RAG. Building on his overview, I showed how to implement a simple RAG system to handle these invariants properly using [Haystack](https://haystack.deepset.ai/?utm_campaign=amta-2024): we run a [Colab notebook](https://colab.research.google.com/drive/1VMgK3DcVny_zTtAG_V3QSSdfSFBWAgmb?usp=sharing) live and checked how the translation changes by introducing context about the invariants to the LLM making the translation. Last, [Bruno Bitter](https://www.linkedin.com/in/brunobitter/) gave an overview of how you can use [Blackbird](https://www.blackbird.io/) to integrate a system like this with existing CAT tools and manage the whole lifecycle of content translation.
diff --git a/content/talks/2024-10-01-snail-opening-day-should-i-use-an-llm-framework.md b/content/talks/2024-10-01-snail-opening-day-should-i-use-an-llm-framework.md
deleted file mode 100644
index c23bfb45..00000000
--- a/content/talks/2024-10-01-snail-opening-day-should-i-use-an-llm-framework.md
+++ /dev/null
@@ -1,27 +0,0 @@
----
-title: "SNAIL Opening Day: Should I use an LLM Framework? (Private Event)"
-date: 2024-10-01
-author: "ZanSara"
-featuredImage: "/talks/2024-10-01-snail-opening-day-should-i-use-an-llm-framework.png"
----
-
-[Slides](https://drive.google.com/file/d/1GQJ1qEY2hXQ6EBF-rtqzJqZzidfS7HfI/view?usp=sharing),
-[notebook](https://colab.research.google.com/drive/11aOq-43wEWhSlxtkdXEAwPEarC0IQ3eN?usp=sharing), [RAG dataset](https://huggingface.co/datasets/ZanSara/seven-wonders) and [evaluation dataset](https://huggingface.co/datasets/ZanSara/seven-wonders-eval)
-All resources can also be found in
-[my archive](https://drive.google.com/drive/folders/1anl3adpxgbwq5nsFn8QXuofIWXX0jRKo?usp=sharing).
-
----
-
-{{< raw >}}
-
-
-
-{{< /raw >}}
-
-Find the transcript [here](https://drive.google.com/file/d/1wwnTFmGOANVmxUaVd1PC3cfztzIfSCEa/view?usp=sharing).
-
----
-
-For the [Springer Nature](https://group.springernature.com/gp/group) AI Lab Opening Day I talk about LLM frameworks: what they are, when they can be useful, and how to choose and compare one framework to the other.
-
-After an overview of six application frameworks ([LangChain](https://www.langchain.com/), [LlamaIndex](https://www.llamaindex.ai/), [Haystack](https://haystack.deepset.ai/), [txtai](https://neuml.github.io/txtai/), [DSPy](https://dspy-docs.vercel.app/) and [CrewAI](https://www.crewai.com/)), we run a notebook where we used [RAGAS](https://docs.ragas.io/en/latest/) to compare four small RAG applications and see which one performs better.
diff --git a/content/talks/2024-10-30-odsc-west-voice-agents.md b/content/talks/2024-10-30-odsc-west-voice-agents.md
deleted file mode 100644
index 6d6aa292..00000000
--- a/content/talks/2024-10-30-odsc-west-voice-agents.md
+++ /dev/null
@@ -1,6 +0,0 @@
----
-title: "[UPCOMING] ODSC West: Building Reliable Voice Agents with Open Source tools"
-date: 2024-10-30
-author: "ZanSara"
-externalLink: https://odsc.com/speakers/building-reliable-voice-agents-with-open-source-tools-2/
----
diff --git a/themes/hugo-coder/static/css/coder-dark.css b/css/coder-dark.css
similarity index 100%
rename from themes/hugo-coder/static/css/coder-dark.css
rename to css/coder-dark.css
diff --git a/themes/hugo-coder/static/css/coder.css b/css/coder.css
similarity index 100%
rename from themes/hugo-coder/static/css/coder.css
rename to css/coder.css
diff --git a/static/favicon.ico b/favicon.ico
similarity index 100%
rename from static/favicon.ico
rename to favicon.ico
diff --git a/static/favicon.png b/favicon.png
similarity index 100%
rename from static/favicon.png
rename to favicon.png
diff --git a/static/favicon.svg b/favicon.svg
similarity index 100%
rename from static/favicon.svg
rename to favicon.svg
diff --git a/themes/hugo-coder/static/fonts/forkawesome-webfont.eot b/fonts/forkawesome-webfont.eot
similarity index 100%
rename from themes/hugo-coder/static/fonts/forkawesome-webfont.eot
rename to fonts/forkawesome-webfont.eot
diff --git a/themes/hugo-coder/static/fonts/forkawesome-webfont.svg b/fonts/forkawesome-webfont.svg
similarity index 100%
rename from themes/hugo-coder/static/fonts/forkawesome-webfont.svg
rename to fonts/forkawesome-webfont.svg
diff --git a/themes/hugo-coder/static/fonts/forkawesome-webfont.ttf b/fonts/forkawesome-webfont.ttf
similarity index 100%
rename from themes/hugo-coder/static/fonts/forkawesome-webfont.ttf
rename to fonts/forkawesome-webfont.ttf
diff --git a/themes/hugo-coder/static/fonts/forkawesome-webfont.woff b/fonts/forkawesome-webfont.woff
similarity index 100%
rename from themes/hugo-coder/static/fonts/forkawesome-webfont.woff
rename to fonts/forkawesome-webfont.woff
diff --git a/themes/hugo-coder/static/fonts/forkawesome-webfont.woff2 b/fonts/forkawesome-webfont.woff2
similarity index 100%
rename from themes/hugo-coder/static/fonts/forkawesome-webfont.woff2
rename to fonts/forkawesome-webfont.woff2
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/COPYRIGHT.md b/fonts/teranoptia/COPYRIGHT.md
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/COPYRIGHT.md
rename to fonts/teranoptia/COPYRIGHT.md
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/LICENSE.txt b/fonts/teranoptia/LICENSE.txt
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/LICENSE.txt
rename to fonts/teranoptia/LICENSE.txt
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/METADATA.yml b/fonts/teranoptia/METADATA.yml
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/METADATA.yml
rename to fonts/teranoptia/METADATA.yml
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/README.md b/fonts/teranoptia/README.md
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/README.md
rename to fonts/teranoptia/README.md
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/TRADEMARKS.md b/fonts/teranoptia/TRADEMARKS.md
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/TRADEMARKS.md
rename to fonts/teranoptia/TRADEMARKS.md
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/documentation/specimen/teranopia-specimen-01.png b/fonts/teranoptia/documentation/specimen/teranopia-specimen-01.png
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/documentation/specimen/teranopia-specimen-01.png
rename to fonts/teranoptia/documentation/specimen/teranopia-specimen-01.png
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/documentation/specimen/teranopia-specimen-02.png b/fonts/teranoptia/documentation/specimen/teranopia-specimen-02.png
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/documentation/specimen/teranopia-specimen-02.png
rename to fonts/teranoptia/documentation/specimen/teranopia-specimen-02.png
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/documentation/specimen/teranopia-specimen-03.png b/fonts/teranoptia/documentation/specimen/teranopia-specimen-03.png
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/documentation/specimen/teranopia-specimen-03.png
rename to fonts/teranoptia/documentation/specimen/teranopia-specimen-03.png
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/documentation/specimen/teranopia-specimen-04.png b/fonts/teranoptia/documentation/specimen/teranopia-specimen-04.png
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/documentation/specimen/teranopia-specimen-04.png
rename to fonts/teranoptia/documentation/specimen/teranopia-specimen-04.png
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/documentation/teranoptia-specimen-print.pdf b/fonts/teranoptia/documentation/teranoptia-specimen-print.pdf
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/documentation/teranoptia-specimen-print.pdf
rename to fonts/teranoptia/documentation/teranoptia-specimen-print.pdf
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/documentation/teranoptia-specimen-web.pdf b/fonts/teranoptia/documentation/teranoptia-specimen-web.pdf
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/documentation/teranoptia-specimen-web.pdf
rename to fonts/teranoptia/documentation/teranoptia-specimen-web.pdf
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/fonts/Teranoptia-Furiae.otf b/fonts/teranoptia/fonts/Teranoptia-Furiae.otf
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/fonts/Teranoptia-Furiae.otf
rename to fonts/teranoptia/fonts/Teranoptia-Furiae.otf
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/fonts/Teranoptia-Furiae.ttf b/fonts/teranoptia/fonts/Teranoptia-Furiae.ttf
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/fonts/Teranoptia-Furiae.ttf
rename to fonts/teranoptia/fonts/Teranoptia-Furiae.ttf
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/fonts/web/Teranoptia-Furiae.woff b/fonts/teranoptia/fonts/web/Teranoptia-Furiae.woff
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/fonts/web/Teranoptia-Furiae.woff
rename to fonts/teranoptia/fonts/web/Teranoptia-Furiae.woff
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/fonts/web/Teranoptia-Furiae.woff2 b/fonts/teranoptia/fonts/web/Teranoptia-Furiae.woff2
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/fonts/web/Teranoptia-Furiae.woff2
rename to fonts/teranoptia/fonts/web/Teranoptia-Furiae.woff2
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/features.fea b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/features.fea
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/features.fea
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/features.fea
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/fontinfo.plist b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/fontinfo.plist
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/fontinfo.plist
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/fontinfo.plist
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/A_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/A_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/A_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/A_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/B_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/B_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/B_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/B_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/C_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/C_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/C_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/C_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/D_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/D_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/D_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/D_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/E_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/E_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/E_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/E_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/E_uro.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/E_uro.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/E_uro.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/E_uro.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/F_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/F_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/F_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/F_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/G_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/G_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/G_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/G_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/H_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/H_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/H_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/H_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/I_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/I_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/I_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/I_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/J_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/J_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/J_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/J_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/K_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/K_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/K_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/K_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/L_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/L_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/L_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/L_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/M_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/M_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/M_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/M_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/N_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/N_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/N_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/N_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/O_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/O_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/O_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/O_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/P_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/P_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/P_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/P_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Q_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Q_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Q_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Q_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/R_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/R_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/R_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/R_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/S_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/S_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/S_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/S_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/T_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/T_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/T_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/T_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/U_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/U_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/U_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/U_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/V_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/V_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/V_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/V_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/W_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/W_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/W_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/W_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/X_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/X_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/X_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/X_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Y_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Y_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Y_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Y_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_acute.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_acute.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_acute.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_acute.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_caron.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_caron.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_caron.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_caron.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_dotaccent.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_dotaccent.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_dotaccent.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_dotaccent.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/_notdef.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/_notdef.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/_notdef.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/_notdef.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/a.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/a.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/a.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/a.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/asterisk.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/asterisk.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/asterisk.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/asterisk.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/b.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/b.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/b.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/b.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/braceleft.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/braceleft.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/braceleft.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/braceleft.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/braceright.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/braceright.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/braceright.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/braceright.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/bracketleft.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/bracketleft.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/bracketleft.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/bracketleft.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/bracketright.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/bracketright.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/bracketright.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/bracketright.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/c.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/c.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/c.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/c.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/comma.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/comma.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/comma.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/comma.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/contents.plist b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/contents.plist
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/contents.plist
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/contents.plist
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/d.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/d.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/d.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/d.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/dollar.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/dollar.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/dollar.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/dollar.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/e.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/e.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/e.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/e.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/f.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/f.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/f.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/f.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/g.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/g.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/g.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/g.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guillemotleft.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guillemotleft.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guillemotleft.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guillemotleft.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guillemotright.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guillemotright.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guillemotright.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guillemotright.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guilsinglleft.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guilsinglleft.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guilsinglleft.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guilsinglleft.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guilsinglright.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guilsinglright.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guilsinglright.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guilsinglright.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/h.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/h.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/h.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/h.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/hyphen.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/hyphen.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/hyphen.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/hyphen.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/i.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/i.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/i.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/i.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/j.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/j.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/j.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/j.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/k.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/k.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/k.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/k.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/l.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/l.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/l.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/l.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/m.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/m.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/m.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/m.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/n.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/n.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/n.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/n.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/o.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/o.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/o.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/o.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/p.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/p.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/p.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/p.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/parenleft.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/parenleft.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/parenleft.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/parenleft.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/parenright.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/parenright.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/parenright.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/parenright.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/q.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/q.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/q.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/q.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quotedblleft.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quotedblleft.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quotedblleft.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quotedblleft.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quotedblright.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quotedblright.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quotedblright.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quotedblright.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quoteleft.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quoteleft.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quoteleft.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quoteleft.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quoteright.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quoteright.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quoteright.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quoteright.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/r.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/r.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/r.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/r.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/s.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/s.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/s.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/s.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/space.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/space.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/space.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/space.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/t.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/t.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/t.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/t.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/tainome.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/tainome.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/tainome.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/tainome.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/u.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/u.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/u.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/u.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/uni1E_05.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/uni1E_05.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/uni1E_05.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/uni1E_05.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/uni1E_8A_.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/uni1E_8A_.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/uni1E_8A_.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/uni1E_8A_.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/v.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/v.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/v.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/v.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/w.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/w.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/w.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/w.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/x.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/x.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/x.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/x.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/y.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/y.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/y.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/y.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/z.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/z.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/z.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/z.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zacute.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zacute.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zacute.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zacute.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zcaron.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zcaron.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zcaron.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zcaron.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zdotaccent.glif b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zdotaccent.glif
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zdotaccent.glif
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zdotaccent.glif
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/layercontents.plist b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/layercontents.plist
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/layercontents.plist
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/layercontents.plist
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/lib.plist b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/lib.plist
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/lib.plist
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/lib.plist
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/metainfo.plist b/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/metainfo.plist
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/metainfo.plist
rename to fonts/teranoptia/sources/Teranoptia-Furiae.ufo/metainfo.plist
diff --git a/static/posts/2024-05-06-teranoptia/teranoptia/sources/teranoptia.glyphs b/fonts/teranoptia/sources/teranoptia.glyphs
similarity index 100%
rename from static/posts/2024-05-06-teranoptia/teranoptia/sources/teranoptia.glyphs
rename to fonts/teranoptia/sources/teranoptia.glyphs
diff --git a/index.html b/index.html
new file mode 100644
index 00000000..51338c18
--- /dev/null
+++ b/index.html
@@ -0,0 +1,286 @@
+
+
+
+
+ Sara Zan
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Lead AI Engineer at Kwal,
+main contributor of Haystack,
+former CERN employee.
+I’m also an opinionated sci-fi reader, hiker, tinkerer and somewhat polyglot. Currently busy trying to learn Portuguese and Hungarian at the same time.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/index.xml b/index.xml
new file mode 100644
index 00000000..d64162d3
--- /dev/null
+++ b/index.xml
@@ -0,0 +1,4642 @@
+
+
+
+ Sara Zan
+ https://www.zansara.dev/
+ Recent content on Sara Zan
+ Hugo -- gohugo.io
+ en
+ Wed, 30 Oct 2024 00:00:00 +0000
+
+ [UPCOMING] ODSC West: Building Reliable Voice Agents with Open Source tools
+ https://www.zansara.dev/talks/2024-10-30-odsc-west-voice-agents/
+ Wed, 30 Oct 2024 00:00:00 +0000
+
+ https://www.zansara.dev/talks/2024-10-30-odsc-west-voice-agents/
+
+
+
+
+ SNAIL Opening Day: Should I use an LLM Framework? (Private Event)
+ https://www.zansara.dev/talks/2024-10-01-snail-opening-day-should-i-use-an-llm-framework/
+ Tue, 01 Oct 2024 00:00:00 +0000
+
+ https://www.zansara.dev/talks/2024-10-01-snail-opening-day-should-i-use-an-llm-framework/
+ <p><a href="https://drive.google.com/file/d/1GQJ1qEY2hXQ6EBF-rtqzJqZzidfS7HfI/view?usp=sharing" class="external-link" target="_blank" rel="noopener">Slides</a>,
+<a href="https://colab.research.google.com/drive/11aOq-43wEWhSlxtkdXEAwPEarC0IQ3eN?usp=sharing" class="external-link" target="_blank" rel="noopener">notebook</a>, <a href="https://huggingface.co/datasets/ZanSara/seven-wonders" class="external-link" target="_blank" rel="noopener">RAG dataset</a> and <a href="https://huggingface.co/datasets/ZanSara/seven-wonders-eval" class="external-link" target="_blank" rel="noopener">evaluation dataset</a>
+All resources can also be found in
+<a href="https://drive.google.com/drive/folders/1anl3adpxgbwq5nsFn8QXuofIWXX0jRKo?usp=sharing" class="external-link" target="_blank" rel="noopener">my archive</a>.</p>
+<hr>
+
+
+<div class='iframe-wrapper'>
+<iframe src="https://drive.google.com/file/d/1AORVusaHVBqNvJ5OtctyB5TWQZSadoqT/preview" width=100% height=100% allow="autoplay"></iframe>
+</div>
+
+
+<p>Find the transcript <a href="https://drive.google.com/file/d/1wwnTFmGOANVmxUaVd1PC3cfztzIfSCEa/view?usp=sharing" class="external-link" target="_blank" rel="noopener">here</a>.</p>
+<hr>
+<p>For the <a href="https://group.springernature.com/gp/group" class="external-link" target="_blank" rel="noopener">Springer Nature</a> AI Lab Opening Day I talk about LLM frameworks: what they are, when they can be useful, and how to choose and compare one framework to the other.</p>
+<p>After an overview of six application frameworks (<a href="https://www.langchain.com/" class="external-link" target="_blank" rel="noopener">LangChain</a>, <a href="https://www.llamaindex.ai/" class="external-link" target="_blank" rel="noopener">LlamaIndex</a>, <a href="https://haystack.deepset.ai/" class="external-link" target="_blank" rel="noopener">Haystack</a>, <a href="https://neuml.github.io/txtai/" class="external-link" target="_blank" rel="noopener">txtai</a>, <a href="https://dspy-docs.vercel.app/" class="external-link" target="_blank" rel="noopener">DSPy</a> and <a href="https://www.crewai.com/" class="external-link" target="_blank" rel="noopener">CrewAI</a>), we run a notebook where we used <a href="https://docs.ragas.io/en/latest/" class="external-link" target="_blank" rel="noopener">RAGAS</a> to compare four small RAG applications and see which one performs better.</p>
+
+
+
+
+ AMTA 2024 Virtual Tutorial Day: Controlling LLM Translations of Invariant Elements with RAG
+ https://www.zansara.dev/talks/2024-09-18-amta-2024-controlling-invariants-rag/
+ Wed, 18 Sep 2024 00:00:00 +0000
+
+ https://www.zansara.dev/talks/2024-09-18-amta-2024-controlling-invariants-rag/
+ <p><a href="https://amtaweb.org/virtual-tutorial-day-program/" class="external-link" target="_blank" rel="noopener">Announcement</a>,
+<a href="https://colab.research.google.com/drive/1VMgK3DcVny_zTtAG_V3QSSdfSFBWAgmb?usp=sharing" class="external-link" target="_blank" rel="noopener">notebook</a> and
+<a href="https://docs.google.com/spreadsheets/d/1A1zk-u-RTSqBfE8LksZxihnp7KxWO7YK/edit?usp=sharing&ouid=102297935451395786183&rtpof=true&sd=true" class="external-link" target="_blank" rel="noopener">glossary</a>.
+All resources can also be found in
+<a href="https://drive.google.com/drive/folders/1Tdq92P_E_77sErGjz7jSPfJ-or9UZXvn?usp=drive_link" class="external-link" target="_blank" rel="noopener">my archive</a>.</p>
+<hr>
+<p><em>Recording coming soon.</em></p>
+<hr>
+<p>At the <a href="https://amtaweb.org/virtual-tutorial-day-program/" class="external-link" target="_blank" rel="noopener">AMTA 2024 Virtual Tutorial Day</a> I talked about controlling invariant translation elements with RAG. During the talk several speakers intervened on the topic, each bringing a different perspective of it.</p>
+<p><a href="https://www.linkedin.com/in/georgkirchner/" class="external-link" target="_blank" rel="noopener">Georg Kirchner</a> introduced the concept of invariant translation elements, such as brand names, UI elements, and corporate slogans. <a href="https://www.linkedin.com/in/christian-lang-8942b0145/" class="external-link" target="_blank" rel="noopener">Christian Lang</a> gave a comprehensive overview of the challenges of handling invariant translation elements with existing tools and how LLMs can help at various stages of the translation, covering several approaches, including RAG. Building on his overview, I showed how to implement a simple RAG system to handle these invariants properly using <a href="https://haystack.deepset.ai/?utm_campaign=amta-2024" class="external-link" target="_blank" rel="noopener">Haystack</a>: we run a <a href="https://colab.research.google.com/drive/1VMgK3DcVny_zTtAG_V3QSSdfSFBWAgmb?usp=sharing" class="external-link" target="_blank" rel="noopener">Colab notebook</a> live and checked how the translation changes by introducing context about the invariants to the LLM making the translation. Last, <a href="https://www.linkedin.com/in/brunobitter/" class="external-link" target="_blank" rel="noopener">Bruno Bitter</a> gave an overview of how you can use <a href="https://www.blackbird.io/" class="external-link" target="_blank" rel="noopener">Blackbird</a> to integrate a system like this with existing CAT tools and manage the whole lifecycle of content translation.</p>
+
+
+
+
+ Building Reliable Voice Bots with Open Source Tools - Part 1
+ https://www.zansara.dev/posts/2024-09-05-odsc-europe-voice-agents-part-1/
+ Wed, 18 Sep 2024 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2024-09-05-odsc-europe-voice-agents-part-1/
+ <p><em>This is part one of the write-up of my talk at <a href="https://www.zansara.dev/talks/2024-09-05-odsc-europe-voice-agents/" >ODSC Europe 2024</a>.</em></p>
+<hr>
+<p>In the last few years, the world of voice agents saw dramatic leaps forward in the state of the art of all its most basic components. Thanks mostly to OpenAI, bots are now able to understand human speech almost like a human would, they’re able to speak back with completely naturally sounding voices, and are able to hold a free conversation that feels extremely natural.</p>
+<p>But building voice bots is far from a solved problem. These improved capabilities are raising the bar, and even users accustomed to the simpler capabilities of old bots now expect a whole new level of quality when it comes to interacting with them.</p>
+<p>In this post we’re going to focus mostly on <strong>the challenges</strong>: we’ll discuss the basic structure of most voice bots today, their shortcomings and the main issues that you may face on your journey to improve the quality of the conversation.</p>
+<p>In <a href="https://www.zansara.dev/posts/2024-09-05-odsc-europe-voice-agents-part-2/" >Part 2</a> we are going to focus on <strong>the solutions</strong> that are available today, and we are going to build our own voice bot using <a href="www.pipecat.ai" >Pipecat</a>, a recently released open-source library that makes building these bots a lot simpler.</p>
+<h1 id="outline">
+ Outline
+ <a class="heading-link" href="#outline">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<ul>
+<li><a href="#what-is-a-voice-agent" >What is a voice agent?</a>
+<ul>
+<li><a href="#speech-to-text-stt" >Speech-to-text (STT)</a></li>
+<li><a href="#text-to-speech-tts" >Text-to-speech (TTS)</a></li>
+<li><a href="#logic-engine" >Logic engine</a>
+<ul>
+<li><a href="#tree-based" >Tree-based</a></li>
+<li><a href="#intent-based" >Intent-based</a></li>
+<li><a href="#llm-based" >LLM-based</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a href="#new-challenges" >New challenges</a>
+<ul>
+<li><a href="#real-speech-is-not-turn-based" >Real speech is not turn-based</a></li>
+<li><a href="#real-conversation-flows-are-not-predictable" >Real conversation flows are not predictable</a></li>
+<li><a href="#llms-bring-their-own-problems" >LLMs bring their own problems</a></li>
+<li><a href="#the-context-window" >The context window</a></li>
+<li><a href="#working-in-real-time" >Working in real time</a></li>
+</ul>
+</li>
+</ul>
+<p><em>Continues in <a href="https://www.zansara.dev/posts/2024-09-05-odsc-europe-voice-agents-part-2/" >Part 2</a>.</em></p>
+<h1 id="what-is-a-voice-agent">
+ What is a voice agent?
+ <a class="heading-link" href="#what-is-a-voice-agent">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>As the name says, voice agents are programs that are able to carry on a task and/or take actions and decisions on behalf of a user (“software agents”) by using voice as their primary mean of communication (as opposed to the much more common text chat format). Voice agents are inherently harder to build than their text based counterparts: computers operate primarily with text, and the art of making machines understand human voices has been an elusive problem for decades.</p>
+<p>Today, the basic architecture of a modern voice agent can be decomposed into three main fundamental building blocks:</p>
+<ul>
+<li>a <strong>speech-to-text (STT)</strong> component, tasked to translate an audio stream into readable text,</li>
+<li>the agent’s <strong>logic engine</strong>, which works entirely with text only,</li>
+<li>a <strong>text-to-speech (TTS)</strong> component, which converts the bot’s text responses back into an audio stream of synthetic speech.</li>
+</ul>
+<p><img src="https://www.zansara.dev/posts/2024-09-05-odsc-europe-voice-agents/structure-of-a-voice-bot.png" alt=""></p>
+<p>Let’s see the details of each.</p>
+<h2 id="speech-to-text-stt">
+ Speech to text (STT)
+ <a class="heading-link" href="#speech-to-text-stt">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Speech-to-text software is able to convert the audio stream of a person saying something and produce a transcription of what the person said. Speech-to-text engines have a <a href="https://en.wikipedia.org/wiki/Speech_recognition#History" class="external-link" target="_blank" rel="noopener">long history</a>, but their limitations have always been quite severe: they used to require fine-tuning on each individual speaker, have a rather high word error rate (WER) and they mainly worked strictly with native speakers of major languages, failing hard on foreign and uncommon accents and native speakers of less mainstream languages. These issues limited the adoption of this technology for anything else than niche software and research applications.</p>
+<p>With the <a href="https://openai.com/index/whisper/" class="external-link" target="_blank" rel="noopener">first release of OpenAI’s Whisper models</a> in late 2022, the state of the art improved dramatically. Whisper enabled transcription (and even direct translation) of speech from many languages with an impressively low WER, finally comparable to the performance of a human, all with relatively low resources, higher then realtime speed, and no finetuning required. Not only, but the model was free to use, as OpenAI <a href="https://huggingface.co/openai" class="external-link" target="_blank" rel="noopener">open-sourced it</a> together with a <a href="https://github.com/openai/whisper" class="external-link" target="_blank" rel="noopener">Python SDK</a>, and the details of its architecture were <a href="https://cdn.openai.com/papers/whisper.pdf" class="external-link" target="_blank" rel="noopener">published</a>, allowing the scientific community to improve on it.</p>
+<p><img src="https://www.zansara.dev/posts/2024-09-05-odsc-europe-voice-agents/whisper-wer.png" alt=""></p>
+<p><em>The WER (word error rate) of Whisper was extremely impressive at the time of its publication (see the full diagram <a href="https://github.com/openai/whisper/assets/266841/f4619d66-1058-4005-8f67-a9d811b77c62" class="external-link" target="_blank" rel="noopener">here</a>).</em></p>
+<p>Since then, speech-to-text models kept improving at a steady pace. Nowadays the Whisper’s family of models sees some competition for the title of best STT model from companies such as <a href="https://deepgram.com/" class="external-link" target="_blank" rel="noopener">Deepgram</a>, but it’s still one of the best options in terms of open-source models.</p>
+<h2 id="text-to-speech-tts">
+ Text-to-speech (TTS)
+ <a class="heading-link" href="#text-to-speech-tts">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Text-to-speech model perform the exact opposite task than speech-to-text models: their goal is to convert written text into an audio stream of synthetic speech. Text-to-speech has <a href="https://en.wikipedia.org/wiki/Speech_synthesis#History" class="external-link" target="_blank" rel="noopener">historically been an easier feat</a> than speech-to-text, but it also recently saw drastic improvements in the quality of the synthetic voices, to the point that it could nearly be considered a solved problem in its most basic form.</p>
+<p>Today many companies (such as OpenAI, <a href="https://cartesia.ai/sonic" class="external-link" target="_blank" rel="noopener">Cartesia</a>, <a href="https://elevenlabs.io/" class="external-link" target="_blank" rel="noopener">ElevenLabs</a>, Azure and many others) offer TTS software with voices that sound nearly indistinguishable to a human. They also have the capability to clone a specific human voice with remarkably little training data (just a few seconds of speech) and to tune accents, inflections, tone and even emotion.</p>
+
+
+<div>
+<audio controls src="https://www.zansara.dev/posts/2024-09-05-odsc-europe-voice-agents/sonic-tts-sample.wav" style="width: 100%"></audio>
+</div>
+
+
+<p><em><a href="https://cartesia.ai/sonic" class="external-link" target="_blank" rel="noopener">Cartesia’s Sonic</a> TTS example of a gaming NPC. Note how the model subtly reproduces the breathing in between sentences.</em></p>
+<p>TTS is still improving in quality by the day, but due to the incredibly high quality of the output competition now tends to focus on price and performance.</p>
+<h2 id="logic-engine">
+ Logic engine
+ <a class="heading-link" href="#logic-engine">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Advancements in the agent’s ability to talk to users goes hand in hand with the progress of natural language understanding (NLU), another field with a <a href="https://en.wikipedia.org/wiki/Natural_language_understanding#History" class="external-link" target="_blank" rel="noopener">long and complicated history</a>. Until recently, the bot’s ability to understand the user’s request has been severely limited and often available only for major languages.</p>
+<p>Based on the way their logic is implemented, today you may come across bots that rely on three different categories.</p>
+<h3 id="tree-based">
+ Tree-based
+ <a class="heading-link" href="#tree-based">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>Tree-based (or rule-based) logic is one of the earliest method of implementing chatbot’s logic, still very popular today for its simplicity. Tree-based bots don’t really try to understand what the user is saying, but listen to the user looking for a keyword or key sentence that will trigger the next step. For example, a customer support chatbot may look for the keyword “refund” to give the user any information about how to perform a refund, or the name of a discount campaign to explain the user how to take advantage of that.</p>
+<p>Tree-based logic, while somewhat functional, doesn’t really resemble a conversation and can become very frustrating to the user when the conversation tree was not designed with care, because it’s difficult for the end user to understand which option or keyword they should use to achieve the desired outcome. It is also unsuitable to handle real questions and requests like a human would.</p>
+<p>One of its most effective usecases is as a first-line screening to triage incoming messages.</p>
+<p><img src="https://www.zansara.dev/posts/2024-09-05-odsc-europe-voice-agents/tree-based-logic.png" alt=""></p>
+<p><em>Example of a very simple decision tree for a chatbot. While rather minimal, this bot already has several flaws: there’s no way to correct the information you entered at a previous step, and it has no ability to recognize synonyms (“I want to buy an item” would trigger the fallback route.)</em></p>
+<h3 id="intent-based">
+ Intent-based
+ <a class="heading-link" href="#intent-based">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>In intent-based bots, <strong>intents</strong> are defined roughly as “actions the users may want to do”. With respect to a strict, keyword-based tree structure, intent-based bots may switch from an intent to another much more easily (because they lack a strict tree-based routing) and may use advanced AI techniques to understand what the user is actually trying to accomplish and perform the required action.</p>
+<p>Advanced voice assistants such as Siri and Alexa use variations of this intent-based system. However, as their owners are usually familiar with, interacting with an intent-based bot doesn’t always feel natural, especially when the available intents don’t match the user’s expectation and the bot ends up triggering an unexpected action. In the long run, this ends with users carefully second-guessing what words and sentence structures activate the response they need and eventually leads to a sort of “magical incantation” style of prompting the agent, where the user has to learn what is the “magical sentence” that the bot will recognize to perform a specific intent without misunderstandings.</p>
+<p><img src="https://www.zansara.dev/posts/2024-09-05-odsc-europe-voice-agents/amazon-echo.webp" alt=""></p>
+<p><em>Modern voice assistants like Alexa and Siri are often built on the concept of intent (image from Amazon).</em></p>
+<h3 id="llm-based">
+ LLM-based
+ <a class="heading-link" href="#llm-based">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>The introduction of instruction-tuned GPT models like ChatGPT revolutionized the field of natural language understanding and, with it, the way bots can be built today. LLMs are naturally good at conversation and can formulate natural replies to any sort of question, making the conversation feel much more natural than with any technique that was ever available earlier.</p>
+<p>However, LLMs tend to be harder to control. Their very ability of generating naturally sounding responses for anything makes them behave in ways that are often unexpected to the developer of the chatbot: for example, users can get the LLM-based bot to promise them anything they ask for, or they can be convinced to say something incorrect or even occasionally lie.</p>
+<p>The problem of controlling the conversation, one that traditionally was always on the user’s side, is now entirely on the shoulders of the developers and can easily backfire.</p>
+<p><img src="https://www.zansara.dev/posts/2024-09-05-odsc-europe-voice-agents/chatgpt-takesies-backsies.png" alt=""></p>
+<p><em>In a rather <a href="https://x.com/ChrisJBakke/status/1736533308849443121" class="external-link" target="_blank" rel="noopener">famous instance</a>, a user managed to convince a Chevrolet dealership chatbot to promise selling him a Chevy Tahoe for a single dollar.</em></p>
+<h1 id="new-challenges">
+ New challenges
+ <a class="heading-link" href="#new-challenges">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Thanks to all these recent improvements, it would seem that making natural-sounding, smart bots is getting easier and easier. It is indeed much simpler to make a simple bot sound better, understand more and respond appropriately, but there’s still a long way to go before users can interact with these new bots as they would with a human.</p>
+<p>The issue lays in the fact that <strong>users expectations grow</strong> with the quality of the bot. It’s not enough for the bot to have a voice that sounds human: users want to be able to interact with it in a way that it feels human too, which is far more rich and interactive than what the rigid tech of earlier chatbots allowed so far.</p>
+<p>What does this mean in practice? What are the expectations that users might have from our bots?</p>
+<h2 id="real-speech-is-not-turn-based">
+ Real speech is not turn-based
+ <a class="heading-link" href="#real-speech-is-not-turn-based">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Traditional bots can only handle turn-based conversations: the user talks, then the bot talks as well, then the user talks some more, and so on. A conversation with another human, however, has no such limitation: people may talk over each other, give audible feedback without interrupting, and more.</p>
+<p>Here are some examples of this richer interaction style:</p>
+<ul>
+<li>
+<p><strong>Interruptions</strong>. Interruptions occur when a person is talking and another one starts talking at the same time. It is expected that the first person stops talking, at least for a few seconds, to understand what the interruption was about, while the second person continue to talk.</p>
+</li>
+<li>
+<p><strong>Back-channeling</strong>. Back-channeling is the practice of saying “ok”, “sure”, “right” while the other person is explaining something, to give them feedback and letting them know we’re paying attention to what is being said. The person that is talking is not supposed to stop: the aim of this sort of feedback is to let them know they are being heard.</p>
+</li>
+<li>
+<p><strong>Pinging</strong>. This is the natural reaction a long silence, especially over a voice-only medium such as a phone call. When one of the two parties is supposed to speak but instead stays silent, the last one that talked might “ping” the silent party by asking “Are you there?”, “Did you hear?”, or even just “Hello?” to test whether they’re being heard. This behavior is especially difficult to handle for voice agents that have a significant delay, because it may trigger an ugly vicious cycle of repetitions and delayed replies.</p>
+</li>
+<li>
+<p><strong>Buying time</strong>. When one of the parties know that it will stay silent for a while, a natural reaction is to notify the other party in advance by saying something like “Hold on…”, “Wait a second…”, “Let me check…” and so on. This message has the benefit of preventing the “pinging” behavior we’ve seen before and can be very useful for voice bots that may need to carry on background work during the conversation, such as looking up information.</p>
+</li>
+<li>
+<p><strong>Audible clues</strong>. Not everything can be transcribed by a speech-to-text model, but audio carries a lot of nuance that is often used by humans to communicate. A simple example is pitch: humans can often tell if they’re talking to a child, a woman or a man by the pitch of their voice, but STT engines don’t transcribe that information. So if a child picks up the phone when your bot asks for their mother or father, the model won’t pick up the obvious audible clue and assume it is talking to the right person. Similar considerations should be made for tone (to detect mood, sarcasm, etc) or other sounds like laughter, sobs, and more</p>
+</li>
+</ul>
+<h2 id="real-conversation-flows-are-not-predictable">
+ Real conversation flows are not predictable
+ <a class="heading-link" href="#real-conversation-flows-are-not-predictable">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Tree-based bots, and to some degree intent-based too, work on the implicit assumption that conversation flows are largely predictable. Once the user said something and the bot replied accordingly, they can only follow up with a fixed set of replies and nothing else.</p>
+<p>This is often a flawed assumption and the primary reason why talking to chatbots tends to be so frustrating.</p>
+<p>In reality, natural conversations are largely unpredictable. For example, they may feature:</p>
+<ul>
+<li>
+<p><strong>Sudden changes of topic</strong>. Maybe user and bot were talking about making a refund, but then the user changes their mind and decides to ask for assistance finding a repair center for the product. Well designed intent-based bots can deal with that, but most bots are in practice unable to do so in a way that feels natural to the user.</p>
+</li>
+<li>
+<p><strong>Unexpected, erratic phrasing</strong>. This is common when users are nervous or in a bad mood for any reason. Erratic, convoluted phrasing, long sentences, rambling, are all very natural ways of expressing themselves, but such outbursts very often confuse bots completely.</p>
+</li>
+<li>
+<p><strong>Non native speakers</strong>. Due to the nature la language learning, non native speakers may have trouble pronouncing words correctly, they may use highly unusual synonyms, or structure sentences in complicated ways. This is also difficult for bots to handle, because understanding the sentence is harder and transcription issues are far more likely.</p>
+</li>
+<li>
+<p><em><strong>Non sequitur</strong></em>. <em>Non sequitur</em> is an umbrella term for a sequence of sentences that bear no relation to each other in a conversation. A simple example is the user asking the bot “What’s the capital of France” and the bot replies “It’s raining now”. When done by the bot, this is often due to a severe transcription issue or a very flawed conversation design. When done by the user, it’s often a malicious intent to break the bot’s logic, so it should be handled with some care.</p>
+</li>
+</ul>
+<h2 id="llms-bring-their-own-problems">
+ LLMs bring their own problems
+ <a class="heading-link" href="#llms-bring-their-own-problems">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>It may seem that some of these issues, especially the ones related to conversation flow, could be easily solved with an LLM. These models, however, bring their own set of issues too:</p>
+<ul>
+<li>
+<p><strong>Hallucinations</strong>. This is a technical term to say that LLMs can occasionally mis-remember information, or straight up lie. The problem is that they’re also very confident about their statements, sometimes to the point of trying to gaslight their users. Hallucinations are a major problem for all LLMs: although it may seem to get more manageable with larger and smarter models, the problem only gets more subtle and harder to spot.</p>
+</li>
+<li>
+<p><strong>Misunderstandings</strong>. While LLMs are great at understanding what the user is trying to say, they’re not immune to misunderstandings. Unlike a human though, LLMs rarely suspect a misunderstanding and they rather make assumptions that ask for clarifications, resulting in surprising replies and behavior that are reminiscent of intent-based bots.</p>
+</li>
+<li>
+<p><strong>Lack of assertiveness</strong>. LLMs are trained to listen to the user and do their best to be helpful. This means that LLMs are also not very good at taking the lead of the conversation when we would need them to, and are easily misled and distracted by a motivated user. Preventing your model to give your user’s a literary analysis of their unpublished poetry may sound silly, but it’s a lot harder than many suspect.</p>
+</li>
+<li>
+<p><strong>Prompt hacking</strong>. Often done with malicious intent by experienced users, prompt hacking is the practice of convincing an LLM to reveal its initial instructions, ignore them and perform actions they are explicitly forbidden from. This is especially dangerous and, while a lot of work has gone into this field, this is far from a solved problem.</p>
+</li>
+</ul>
+<h2 id="the-context-window">
+ The context window
+ <a class="heading-link" href="#the-context-window">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>LLMs need to keep track of the whole conversation, or at least most of it, to be effective. However, they often have a limitation to the amount of text they can keep in mind at any given time: this limit is called <strong>context window</strong> and for many models is still relatively low, at about 2000 tokens <strong>(between 1500-1800 words)</strong>.</p>
+<p>The problem is that this window also need to include all the instructions your bot needs for the conversation. This initial set of instructions is called <strong>system prompt</strong>, and is slightly distinct from the other messages in the conversation to make the LLM understand that it’s not part of it, but it’s a set of instructions about how to handle the conversation.</p>
+<p>For example, a system prompt for a customer support bot may look like this:</p>
+<pre tabindex="0"><code>You're a friendly customer support bot named VirtualAssistant.
+You are always kind to the customer and you must do your best
+to make them feel at ease and helped.
+
+You may receive a set of different requests. If the users asks
+you to do anything that is not in the list below, kindly refuse
+to do so.
+
+# Handle refunds
+
+If the user asks you to handle a refund, perform these actions:
+- Ask for their shipping code
+- Ask for their last name
+- Use the tool `get_shipping_info` to verify the shipping exists
+...
+</code></pre><p>and so on.</p>
+<p>Although very effective, system prompts have a tendency to become huge in terms of tokens. Adding information to it makes the LLM behave much more like you expect (although it’s not infallible), hallucinate less, and can even shape its personality to some degree. But if the system prompt becomes too long (more than 1000 words), this means that the bot will only be able to exchange about 800 words worth of messages with the user before it starts to <strong>forget</strong> either its instructions or the first messages of the conversation. For example, the bot will easily forget its own name and role, or it will forget the user’s name and initial demands, which can make the conversation drift completely.</p>
+<h2 id="working-in-real-time">
+ Working in real time
+ <a class="heading-link" href="#working-in-real-time">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>If all these issues weren’t enough, there’s also a fundamental issue related to voice interaction: <strong>latency</strong>. Voice bots interact with their users in real time: this means that the whole pipeline of transcription, understanding, formulating a reply and synthesizing it back but be very fast.</p>
+<p>How fast? On average, people expect a reply from another person to arrive within <strong>300-500ms</strong> to sound natural. They can normally wait for about 1-2 seconds. Any longer and they’ll likely ping the bot, breaking the flow.</p>
+<p>This means that, even if we had some solutions to all of the above problems (and we do have some), these solutions needs to operate at blazing fast speed. Considering that LLM inference alone can take the better part of a second to even start being generated, latency is often one of the major issues that voice bots face when deployed at scale.</p>
+<p><img src="https://www.zansara.dev/posts/2024-09-05-odsc-europe-voice-agents/ttft.jpg" alt=""></p>
+<p><em>Time to First Token (TTFT) stats for several LLM inference providers running Llama 2 70B chat. From <a href="https://github.com/ray-project/llmperf-leaderboard" class="external-link" target="_blank" rel="noopener">LLMPerf leaderboard</a>. You can see how the time it takes for a reply to even start being produced is highly variable, going up to more than one second in some scenarios.</em></p>
+<h1 id="to-be-continued">
+ To be continued…
+ <a class="heading-link" href="#to-be-continued">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p><em>Interested? Stay tuned for Part 2!</em></p>
+<p class="fleuron"><a href="https://www.zansara.dev/posts/2024-05-06-teranoptia/">F]</a></p>
+
+
+
+
+ ODSC Europe: Building Reliable Voice Agents with Open Source tools
+ https://www.zansara.dev/talks/2024-09-05-odsc-europe-voice-agents/
+ Fri, 06 Sep 2024 00:00:00 +0000
+
+ https://www.zansara.dev/talks/2024-09-05-odsc-europe-voice-agents/
+ <p><a href="https://odsc.com/speakers/building-reliable-voice-agents-with-open-source-tools-2/" class="external-link" target="_blank" rel="noopener">Announcement</a>,
+<a href="https://drive.google.com/file/d/1ubk7Q_l9C7epQgYrMttHMjW1AVfdm-LT/view?usp=sharing" class="external-link" target="_blank" rel="noopener">slides</a> and
+<a href="https://colab.research.google.com/drive/1NCAAs8RB2FuqMChFKMIVWV0RiJr9O3IJ?usp=sharing" class="external-link" target="_blank" rel="noopener">notebook</a>.
+All resources can also be found on ODSC’s website and in
+<a href="https://drive.google.com/drive/folders/1rrXMTbfTZVuq9pMzneC8j-5GKdRQ6l2i?usp=sharing" class="external-link" target="_blank" rel="noopener">my archive</a>.
+Did you miss the talk? Check out the write-up <a href="https://www.zansara.dev/posts/2024-09-05-odsc-europe-voice-agents-part-1/" >here</a>.</p>
+<hr>
+<p><em>(Note: this is a recording of the notebook walkthrough only. The full recording will be shared soon).</em></p>
+
+
+<div class='iframe-wrapper'>
+<iframe src="https://drive.google.com/file/d/15Kv8THmDsnnzfVBhHAf2O11RccpzAzYK/preview" width=100% height=100% allow="autoplay"></iframe>
+</div>
+
+
+<p>At <a href="https://odsc.com/europe/" class="external-link" target="_blank" rel="noopener">ODSC Europe 2024</a> I talked about building modern and reliable voice bots using Pipecat,
+a recently released open source tool. I gave an overview of the general structure of voice bots, of the improvements
+their underlying tech recently saw, and the new challenges that developers face when implementing one of these systems.</p>
+<p>The main highlight of the talk is the <a href="https://colab.research.google.com/drive/1NCAAs8RB2FuqMChFKMIVWV0RiJr9O3IJ?usp=drive_link" class="external-link" target="_blank" rel="noopener">notebook</a>
+where I implement first a simple Pipecat bot from scratch, and then I give an overview of how to blend intent detection
+and system prompt switching to improve our control of how LLM bots interact with users.</p>
+
+
+
+
+ EuroPython: Is RAG all you need? A look at the limits of retrieval augmented generation
+ https://www.zansara.dev/talks/2024-07-10-europython-rag/
+ Wed, 10 Jul 2024 00:00:00 +0000
+
+ https://www.zansara.dev/talks/2024-07-10-europython-rag/
+ <p><a href="https://ep2024.europython.eu/session/is-rag-all-you-need-a-look-at-the-limits-of-retrieval-augmented-generation" class="external-link" target="_blank" rel="noopener">Announcement</a>,
+<a href="https://drive.google.com/file/d/13OXMLaBQr1I_za7sqVHJWxRj5xFAg7KV/view?usp=sharing" class="external-link" target="_blank" rel="noopener">slides</a>.
+Did you miss the talk? Check out the recording on <a href="https://youtu.be/9wk7mGB_Gp4?feature=shared" class="external-link" target="_blank" rel="noopener">Youtube</a>
+or on my <a href="https://drive.google.com/file/d/1OkYQ7WMt63QkdJTU3GIpSxBZmnLfZti6/view?usp=sharing" class="external-link" target="_blank" rel="noopener">backup</a> (cut from the <a href="https://www.youtube.com/watch?v=tcXmnCJIvFc" class="external-link" target="_blank" rel="noopener">original stream</a>),
+or read the <a href="https://www.zansara.dev/posts/2024-04-29-odsc-east-rag" >write-up</a> of a previous edition of the same talk.</p>
+<hr>
+
+
+<div class='iframe-wrapper'>
+<iframe src="https://drive.google.com/file/d/1OkYQ7WMt63QkdJTU3GIpSxBZmnLfZti6/preview" width=100% height=100% allow="autoplay"></iframe>
+</div>
+
+
+<p>At <a href="https://ep2024.europython.eu/" class="external-link" target="_blank" rel="noopener">EuroPython 2024</a> I talked about RAG: how it works, how it fails, and how to evaluate its performance objectively. I gave an overview of some useful open-source tools for RAG evalution such as <a href="https://docs.relari.ai/v0.3?utm_campaign=europython-2024" class="external-link" target="_blank" rel="noopener">continuous-eval</a> and how to use them with <a href="https://haystack.deepset.ai/?utm_campaign=europython-2024" class="external-link" target="_blank" rel="noopener">Haystack</a>, and then offered some ideas on how to expand your RAG architecture further than a simple two-step process.</p>
+<p>Some resources mentioned in the talk:</p>
+<ul>
+<li><a href="https://haystack.deepset.ai/?utm_campaign=europython-2024" class="external-link" target="_blank" rel="noopener">Haystack</a>: open-source LLM framework for RAG and beyond.</li>
+<li><a href="https://docs.relari.ai/v0.3?utm_campaign=europython-2024" class="external-link" target="_blank" rel="noopener">continuous-eval</a> by <a href="https://www.relari.ai/?utm_campaign=europython-2024" class="external-link" target="_blank" rel="noopener">Relari AI</a>.</li>
+<li>Build and evaluate RAG with Haystack: <a href="https://haystack.deepset.ai/tutorials/35_model_based_evaluation_of_rag_pipelines/?utm_campaign=europython-2024" class="external-link" target="_blank" rel="noopener">https://haystack.deepset.ai/tutorials/35_model_based_evaluation_of_rag_pipelines</a></li>
+<li>Use continuous-eval with Haystack: <a href="https://github.com/relari-ai/examples/blob/main/examples/haystack/simple_rag/app.py" class="external-link" target="_blank" rel="noopener">https://github.com/relari-ai/examples/blob/main/examples/haystack/simple_rag/app.py</a></li>
+<li>Perplexity.ai: <a href="https://www.perplexity.ai/" class="external-link" target="_blank" rel="noopener">https://www.perplexity.ai/</a></li>
+</ul>
+
+
+
+
+ The Agent Compass
+ https://www.zansara.dev/posts/2024-06-10-the-agent-compass/
+ Mon, 10 Jun 2024 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2024-06-10-the-agent-compass/
+ <p>The concept of Agent is one of the vaguest out there in the post-ChatGPT landscape. The word has been used to identify systems that seem to have nothing in common with one another, from complex autonomous research systems down to a simple sequence of two predefined LLM calls. Even the distinction between Agents and techniques such as RAG and prompt engineering seems blurry at best.</p>
+<p>Let’s try to shed some light on the topic by understanding just how much the term “AI Agent” covers and set some landmarks to better navigate the space.</p>
+<h2 id="defining-agent">
+ Defining “Agent”
+ <a class="heading-link" href="#defining-agent">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>The problem starts with the definition of “agent”. For example, <a href="https://en.wikipedia.org/wiki/Software_agent" class="external-link" target="_blank" rel="noopener">Wikipedia</a> reports that a software agent is</p>
+<blockquote>
+<p>a computer program that acts for a user or another program in a relationship of agency.</p>
+</blockquote>
+<p>This definition is extremely high-level, to the point that it could be applied to systems ranging from ChatGPT to a thermostat. However, if we restrain our definition to “LLM-powered agents”, then it starts to mean something: an Agent is an LLM-powered application that is given some <strong>agency</strong>, which means that it can take actions to accomplish the goals set by its user. Here we see the difference between an agent and a simple chatbot, because a chatbot can only talk to a user. but don’t have the agency to take any action on their behalf. Instead, an Agent is a system you can effectively delegate tasks to.</p>
+<p>In short, an LLM powered application can be called an Agent when</p>
+<blockquote>
+<p>it can take decisions and choose to perform actions in order to achieve the goals set by the user.</p>
+</blockquote>
+<h2 id="autonomous-vs-conversational">
+ Autonomous vs Conversational
+ <a class="heading-link" href="#autonomous-vs-conversational">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>On top of this definition there’s an additional distinction to take into account, normally brought up by the terms <strong>autonomous</strong> and <strong>conversational</strong> agents.</p>
+<p>Autonomous Agents are applications that <strong>don’t use conversation as a tool</strong> to accomplish their goal. They can use several tools several times, but they won’t produce an answer for the user until their goal is accomplished in full. These agents normally interact with a single user, the one that set their goal, and the whole result of their operations might be a simple notification that the task is done. The fact that they can understand language is rather a feature that lets them receive the user’s task in natural language, understand it, and then to navigate the material they need to use (emails, webpages, etc).</p>
+<p>An example of an autonomous agent is a <strong>virtual personal assistant</strong>: an app that can read through your emails and, for example, pays the bills for you when they’re due. This is a system that the user sets up with a few credentials and then works autonomously, without the user’s supervision, on the user’s own behalf, possibly without bothering them at all.</p>
+<p>On the contrary, Conversational Agents <strong>use conversation as a tool</strong>, often their primary one. This doesn’t have to be a conversation with the person that set them off: it’s usually a conversation with another party, that may or may not be aware that they’re talking to an autonomous system. Naturally, they behave like agents only from the perspective of the user that assigned them the task, while in many cases they have very limited or no agency from the perspective of the users that holds the conversation with them.</p>
+<p>An example of a conversational agent is a <strong>virtual salesman</strong>: an app that takes a list of potential clients and calls them one by one, trying to persuade them to buy. From the perspective of the clients receiving the call this bot is not an agent: it can perform no actions on their behalf, in fact it may not be able to perform actions at all other than talking to them. But from the perspective of the salesman the bots are agents, because they’re calling people for them, saving a lot of their time.</p>
+<p>The distinction between these two categories is very blurry, and <strong>some systems may behave like both</strong> depending on the circumnstances. For example, an autonomous agent might become a conversational one if it’s configured to reschedule appointments for you by calling people, or to reply to your emails to automatically challenge parking fines, and so on. Alternatively, an LLM that asks you if it’s appropriate to use a tool before using it is behaving a bit like a conversational agent, because it’s using the chat to improve its odds of providing you a better result.</p>
+<h2 id="degrees-of-agency">
+ Degrees of agency
+ <a class="heading-link" href="#degrees-of-agency">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>All the distinctions we made above are best understood as a continuous spectrum rather than hard categories. Various AI systems may have more or less agency and may be tuned towards a more “autonomous” or “conversational” behavior.</p>
+<p>In order to understand this difference in practice, let’s try to categorize some well-known LLM techniques and apps to see how “agentic” they are. Having two axis to measure by, we can build a simple compass like this:</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/empty-compass.png" alt="a compass with two axis: no agency (left) to full agency (right) on the horizontal axis, and autonomous (bottom) to conversational (top) on the vertical axis."></p>
+<div style="text-align:center;"><i>Our Agent compass</i></div>
+<h3 id="bare-llms">
+ Bare LLMs
+ <a class="heading-link" href="#bare-llms">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>Many apps out there perform nothing more than direct calls to LLMs, such as ChatGPT’s free app and other similarly simple assistants and chatbots. There are no more components to this system other than the model itself and their mode of operation is very straightforward: a user asks a question to an LLM, and the LLM replies directly.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/direct-llm-call.png" alt="Diagram of the operation of a direct LLM call: a user asks a question to an LLM and the LLM replies directly."></p>
+<p>This systems are not designed with the intent of accomplishing a goal, and neither can take any actions on the user’s behalf. They focus on talking with a user in a reactive way and can do nothing else than talk back. An LLM on its own has <strong>no agency at all</strong>.</p>
+<p>At this level it also makes very little sense to distinguish between autonomous or conversational agent behavior, because the entire app shows no degrees of autonomy. So we can place them at the very center-left of the diagram.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/direct-llm-call-compass.png" alt="the updated compass"></p>
+<h3 id="basic-rag">
+ Basic RAG
+ <a class="heading-link" href="#basic-rag">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>Together with direct LLM calls and simple chatbots, basic RAG is also an example of an application that does not need any agency or goals to pursue in order to function. Simple RAG apps works in two stages: first the user question is sent to a retriever system, which fetches some additional data relevant to the question. Then, the question and the additional data is sent to the LLM to formulate an answer.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/basic-rag.png" alt="Diagram of the operation of a RAG app: first the user question is sent to a retriever system, which fetches some additional data relevant to the question. Then, the question and the additional data is sent to the LLM to formulate an answer."></p>
+<p>This means that simple RAG is not an agent: the LLM has no role in the retrieval step and simply reacts to the RAG prompt, doing little more than what a direct LLM call does. <strong>The LLM is given no agency</strong>, takes no decisions in order to accomplish its goals, and has no tools it can decide to use, or actions it can decide to take. It’s a fully pipelined, reactive system. However, we may rank basic RAG more on the autonomous side with respect to a direct LLM call, because there is one step that is done automonously (the retrieval).</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/basic-rag-compass.png" alt="the updated compass"></p>
+<h3 id="agentic-rag">
+ Agentic RAG
+ <a class="heading-link" href="#agentic-rag">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>Agentic RAG is a slightly more advanced version of RAG that does not always perform the retrieval step. This helps the app produce better prompts for the LLM: for example, if the user is asking a question about trivia, retrieval is very important, while if they’re quizzing the LLM with some mathematical problem, retrieval might confuse the LLM by giving it examples of solutions to different puzzles, and therefore make hallucinations more likely.</p>
+<p>This means that an agentic RAG app works as follows: when the user asks a question, before calling the retriever the app checks whether the retrieval step is necessary at all. Most of the time the preliminary check is done by an LLM as well, but in theory the same check coould be done by a properly trained classifier model. Once the check is done, if retrieval was necessary it is run, otherwise the app skips directly to the LLM, which then replies to the user.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/agentic-rag.png" alt="Diagram of the operation of an agentic RAG app: when the user asks a question, before calling the retriever the app checks whether the retrieval step is necessary at all. Once the check is done, if retrieval was necessary it is run, otherwise the app skips directly to the LLM, which then replies to the user."></p>
+<p>You can see immediately that there’s a fundamental difference between this type of RAG and the basic pipelined form: the app needs to <strong>take a decision</strong> in order to accomplish the goal of answering the user. The goal is very limited (giving a correct answer to the user), and the decision very simple (use or not use a single tool), but this little bit of agency given to the LLM makes us place an application like this definitely more towards the Agent side of the diagram.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/agentic-rag-compass.png" alt="the updated compass"></p>
+<p>We keep Agentic RAG towards the Autonomous side because in the vast majority of cases the decision to invoke the retriever is kept hidden from the user.</p>
+<h3 id="llms-with-function-calling">
+ LLMs with function calling
+ <a class="heading-link" href="#llms-with-function-calling">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>Some LLM applications, such as ChatGPT with GPT4+ or Bing Chat, can make the LLM use some predefined tools: a web search, an image generator, and maybe a few more. The way they work is quite straightforward: when a user asks a question, the LLM first needs to decide whether it should use a tool to answer the question. If it decides that a tool is needed, it calls it, otherwise it skips directly to generating a reply, which is then sent back to the user.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/llm-with-function-calling.png" alt="Diagram of the operation of an LLM with function calling: when a user asks a question, the LLM first needs to decide whether it should use a tool to answer the question. If it decides that a tool is needed, it calls it, otherwise it skips directly to generating a reply, which is then sent back to the user."></p>
+<p>You can see how this diagram resemble agentic RAG’s: before giving an answer to the user, the app needs to <strong>take a decision</strong>.</p>
+<p>With respect to Agentic RAG this decision is a lot more complex: it’s not a simple yes/no decision, but it involves choosing which tool to use and also generate the input parameters for the selected tool that will provide the desired output. In many cases the tool’s output will be given to the LLM to be re-elaborated (such as the output of a web search), while in some other it can go directly to the user (like in the case of image generators). This all implies that more agency is given to the system and, therefore, it can be placed more clearly towards the Agent end of the scale.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/llm-with-function-calling-compass.png" alt="the updated compass"></p>
+<p>We place LLMs with function calling in the middle between Conversational and Autonomous because the degree to which the user is aware of this decision can vary greatly between apps. For example, Bing Chat and ChatGPT normally notify the user that they’re going to use a tool when they do, and the user can instruct them to use them or not, so they’re slightly more conversational.</p>
+<h3 id="self-correcting-rag">
+ Self correcting RAG
+ <a class="heading-link" href="#self-correcting-rag">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>Self-correcting RAG is a technique that improves on simple RAG by making the LLM double-check its replies before returning them to the user. It comes from an LLM evaluation technique called “LLM-as-a-judge”, because an LLM is used to judge the output of a different LLM or RAG pipeline.</p>
+<p>Self-correcting RAG starts as simple RAG: when the user asks a question, the retriever is called and the results are sent to the LLM to extract an answer from. However, before returning the answer to the user, another LLM is asked to judge whether in their opinion, the answer is correct. If the second LLM agrees, the answer is sent to the user. If not, the second LLM generates a new question for the retriever and runs it again, or in other cases, it simply integrates its opinion in the prompt and runs the first LLM again.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/self-correcting-rag.png" alt="Diagram of the operation of self correcting RAG: when the user asks a question, the retriever is called and the results are sent to the LLM to extract an answer from. However, before returning the answer to the user, another LLM is asked to judge whether in their opinion, the answer is correct. If the second LLM agrees, the answer is sent to the user. If not, the second LLM generates a new question for the retriever and runs it again, or in other cases, it simply integrates its opinion in the prompt and runs the first LLM again."></p>
+<p>Self-correcting RAG can be seen as <strong>one more step towards agentic behavior</strong> because it unlocks a new possibility for the application: <strong>the ability to try again</strong>. A self-correcting RAG app has a chance to detect its own mistakes and has the agency to decide that it’s better to try again, maybe with a slightly reworded question or different retrieval parameters, before answering the user. Given that this process is entirely autonomous, we’ll place this technique quite towards the Autonomous end of the scale.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/self-correcting-rag-compass.png" alt="the updated compass"></p>
+<h3 id="chain-of-thought">
+ Chain-of-thought
+ <a class="heading-link" href="#chain-of-thought">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p><a href="https://arxiv.org/abs/2201.11903" class="external-link" target="_blank" rel="noopener">Chain-of-thought</a> is a family of prompting techniques that makes the LLM “reason out loud”. It’s very useful when the model needs to process a very complicated question, such as a mathematical problem or a layered question like “When was the eldest sistem of the current King of Sweden born?” Assuming that the LLM knows these facts, in order to not hallucinate it’s best to ask the model to proceed “step-by-step” and find out, in order:</p>
+<ol>
+<li>Who the current King of Sweden is,</li>
+<li>Whether he has an elder sister,</li>
+<li>If yes, who she is,</li>
+<li>The age of the person identified above.</li>
+</ol>
+<p>The LLM might know the final fact in any case, but the probability of it giving the right answer increases noticeably if the LLM is prompted this way.</p>
+<p>Chain-of-thought prompts can also be seen as the LLM accomplishing the task of finding the correct answer in steps, which implies that there are two lines of thinking going on: on one side the LLM is answering the questions it’s posing to itself, while on the other it’s constantly re-assessing whether it has a final answer for the user.</p>
+<p>In the example above, the chain of thought might end at step 2 if the LLM realizes that the current King of Sweden has no elder sisters (he <a href="https://en.wikipedia.org/wiki/Carl_XVI_Gustaf#Early_life" class="external-link" target="_blank" rel="noopener">doesn’t</a>): the LLM needs to keep an eye of its own thought process and decide whether it needs to continue or not.</p>
+<p>We can summarize an app using chain-of-thought prompting like this: when a user asks a question, first of all the LLM reacts to the chain-of-thought prompt to lay out the sub-questions it needs to answer. Then it answers its own questions one by one, asking itself each time whether the final answer has already been found. When the LLM believes it has the final answer, it rewrites it for the user and returns it.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/chain-of-thought.png" alt="Diagram of the operation of a chain-of-thought LLM app: when a user asks a question, first of all the LLM reacts to the chain-of-thought prompt to lay out the sub-questions it needs to answer. Then it answers its own questions one by one, asking itself each time whether the final answer has already been found. When the LLM believes it has the final answer, it rewrites it for the user and returns it "></p>
+<p>This new prompting technique makes a big step towards full agency: the ability for the LLM to <strong>assess whether the goal has been achieved</strong> before returning any answer to the user. While apps like Bing Chat iterate with the user and need their feedback to reach high-level goals, chain-of-thought gives the LLM the freedom to check its own answers before having the user judge them, which makes the loop much faster and can increase the output quality dramatically.</p>
+<p>This process is similar to what self-correcting RAG does, but has a wider scope, because the LLM does not only need to decide whether an answer is correct, it can also decide to continue reasoning in order to make it more complete, more detailed, to phrase it better, and so on.</p>
+<p>Another interesting trait of chain-of-thought apps is that they introduce the concept of <strong>inner monologue</strong>. The inner monologue is a conversation that the LLM has with itself, a conversation buffer where it keeps adding messages as the reasoning develops. This monologue is not visible to the user, but helps the LLM deconstruct a complex reasoning line into a more manageable format, like a researcher that takes notes instead of keeping all their earlier reasoning inside their head all the times.</p>
+<p>Due to the wider scope of the decision-making that chain-of-thought apps are able to do, they also place in the middle of our compass They can be seen as slightly more autonomous than conversational due to the fact that they hide their internal monologue to the user.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/chain-of-thought-compass.png" alt="the updated compass"></p>
+<p>From here, the next step is straightforward: using tools.</p>
+<h3 id="multi-hop-rag">
+ Multi-hop RAG
+ <a class="heading-link" href="#multi-hop-rag">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>Multi-hop RAG applications are nothing else than simple RAG apps that use chain-of-thought prompting and are free to invoke the retriever as many times as needed and only when needed.</p>
+<p>This is how it works. When the user makes a question, a chain of thought prompt is generated and sent to the LLM. The LLM assesses whether it knows the answer to the question and if not, asks itself whether a retrieval is necessary. If it decides that retrieval is necessary it calls it, otherwise it skips it and generates an answer directly. It then checks again whether the question is answered. Exiting the loop, the LLM produces a complete answer by re-reading its own inner monologue and returns this reply to the user.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/multi-hop-rag.png" alt="Diagram of the operation of multi-hop RAG: when the user makes a question, a chain of thought prompt is generated and sent to the LLM. The LLM assesses whether it knows the answer to the question and if not, asks itself whether a retrieval is necessary. If it decides that retrieval is necessary it calls it, otherwise it skips it and generates an answer directly. It then checks again whether the question is answered. Exiting the loop, the LLM produces a complete answer by re-reading its own inner monologue and returns this reply to the user."></p>
+<p>An app like this is getting quite close to a proper autonomous agent, because it can <strong>perform its own research autonomously</strong>. The LLM calls are made in such a way that the system is able to assess whether it knows enough to answer or whether it should do more research by formulating more questions for the retriever and then reasoning over the new collected data.</p>
+<p>Multi-hop RAG is a very powerful technique that shows a lot of agency and autonomy, and therefore can be placed in the lower-right quadrant of out compass. However, it is still limited with respect to a “true” autonomous agent, because the only action it can take is to invoke the retriever.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/multi-hop-rag-compass.png" alt="the updated compass"></p>
+<h3 id="react-agents">
+ ReAct Agents
+ <a class="heading-link" href="#react-agents">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>Let’s now move onto apps that can be defined proper “agents”. One of the first flavor of agentic LLM apps, and still the most popular nowadays, is called “<a href="https://arxiv.org/abs/2210.03629" class="external-link" target="_blank" rel="noopener">ReAct</a>” Agents, which stands for “Reason + Act”. ReAct is a prompting technique that belongs to the chain-of-thought extended family: it makes the LLM reason step by step, decide whether to perform any action, and then observe the result of the actions it took before moving further.</p>
+<p>A ReAct agent works more or less like this: when user sets a goal, the app builds a ReAct prompt, which first of all asks the LLM whether the answer is already known. If the LLM says no, the prompt makes it select a tool. The tool returns some values which are added to the inner monologue of the application toghether with the invitation to re-assess whether the goal has been accomplished. The app loops over until the answer is found, and then the answer is returned to the user.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/react-agent.png" alt="Diagram of the operation of a ReAct Agent: when user sets a goal, the app builds a ReAct prompt, which first of all asks the LLM whether the answer is already known. If the LLM says no, the prompt makes it select a tool. The tool returns some values which are added to the inner monologue of the application toghether with the invitation to re-assess whether the goal has been accomplished. The app loops over until the answer is found, and then the answer is returned to the user."></p>
+<p>As you can see, the structure is very similar to a multi-hop RAG, with an important difference: ReAct Agents normally have <strong>many tools to choose from</strong> rather than a single retriever. This gives them the agency to take much more complex decisions and can be finally called “agents”.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/react-agent-compass.png" alt="the updated compass"></p>
+<p>ReAct Agents are very autonomous in their tasks and rely on an inner monologue rather than a conversation with a user to achieve their goals. Therefore we place them very much on the Autonomous end of the spectrum.</p>
+<h3 id="conversational-agents">
+ Conversational Agents
+ <a class="heading-link" href="#conversational-agents">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>Conversational Agents are a category of apps that can vary widely. As stated earlier, conversational agents focus on using the conversation itself as a tool to accomplish goals, so in order to understand them, one has to distinguish between the people that set the goal (let’s call them <em>owners</em>) and those who talk with the bot (the <em>users</em>).</p>
+<p>Once this distinction is made, this is how the most basic conversational agents normally work. First, the owner sets a goal. The application then starts a conversation with a user and, right after the first message, starts asking itself if the given goal was accomplished. It then keeps talking to the target user until it believes the goal was attained and, once done, it returns back to its owner to report the outcome.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/basic-conversational-agent.png" alt="Diagram of the operation of a Conversational Agent: first, the owner sets a goal. The application then starts a conversation with a user and, right after the first message, starts asking itself if the given goal was accomplished. It then keeps talking to the target user until it believes the goal was attained and, once done, it returns back to its owner to report the outcome."></p>
+<p>Basic conversational agents are very agentic in the sense that they can take a task off the hands of their owners and keep working on them until the goal is achieved. However, <strong>they have varying degrees of agency</strong> depending on how many tools they can use and how sophisticated is their ability to talk to their target users.</p>
+<p>For example, can the communication occurr over one single channel, be it email, chat, voice, or something else? Can the agent choose among different channels to reach the user? Can it perform side tasks to behalf of either party to work towards its task? There is a large variety of these agents available and no clear naming distinction between them, so depending on their abilities, their position on our compass might be very different. This is why we place them in the top center, spreading far out in both directions.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/conversational-agent-compass.png" alt="the updated compass"></p>
+<h3 id="ai-crews">
+ AI Crews
+ <a class="heading-link" href="#ai-crews">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>By far the most advanced agent implementation available right now is called AI Crew, such as the ones provided by <a href="https://www.crewai.com/" class="external-link" target="_blank" rel="noopener">CrewAI</a>. These apps take the concept of autonomous agent to the next level by making several different agents work together.</p>
+<p>The way these apps works is very flexible. For example, let’s imagine we are making an AI application that can build a fully working mobile game from a simple description. This is an extremely complex task that, in real life, requires several developers. To achieve the same with an AI Crew, the crew needs to contain several agents, each one with their own special skills, tools, and background knowledge. There could be:</p>
+<ul>
+<li>a Designer Agent, that has all the tools to generate artwork and assets;</li>
+<li>a Writer Agent that writes the story, the copy, the dialogues, and most of the text;</li>
+<li>a Frontend Developer Agent that designs and implements the user interface;</li>
+<li>a Game Developer Agent that writes the code for the game itself;</li>
+<li>a Manager Agent, that coordinates the work of all the other agents, keeps them on track and eventually reports the results of their work to the user.</li>
+</ul>
+<p>These agents interact with each other just like a team of humans would: by exchanging messages in a chat format, asking each other to perform actions for them, until their manager decides that the overall goal they were set to has been accomplished, and reports to the user.</p>
+<p>AI Crews are very advanced and dynamic systems that are still actively researched and explored. One thing that’s clear though is that they show the highest level of agency of any other LLM-based app, so we can place them right at the very bottom-right end of the scale.</p>
+<p><img src="https://www.zansara.dev/posts/2024-06-10-the-agent-compass/ai-crews-compass.png" alt="the updated compass"></p>
+<h2 id="conclusion">
+ Conclusion
+ <a class="heading-link" href="#conclusion">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>What we’ve seen here are just a few examples of LLM-powered applications and how close or far they are to the concept of a “real” AI agent. AI agents are still a very active area of research, and their effectiveness is getting more and more reasonable as LLMs become cheaper and more powerful.</p>
+<p>As matter of fact, with today’s LLMs true AI agents are possible, but in many cases they’re too brittle and expensive for real production use cases. Agentic systems today suffer from two main issues: they perform <strong>huge and frequent LLM calls</strong> and they <strong>tolerate a very low error rate</strong> in their decision making.</p>
+<p>Inner monologues can grow to an unbounded size during the agent’s operation, making the context window size a potential limitation. A single bad decision can send a chain-of-thought reasoning train in a completely wrong direction and many LLM calls will be performed before the system realizes its mistake, if it does at all. However, as LLMs become faster, cheaper and smarter, the day when AI Agent will become reliable and cheap enough is nearer than many think.</p>
+<p>Let’s be ready for it!</p>
+<p class="fleuron"><a href="https://www.zansara.dev/posts/2024-05-06-teranoptia/">SDH</a></p>
+
+
+
+ Generating creatures with Teranoptia
+ https://www.zansara.dev/posts/2024-05-06-teranoptia/
+ Mon, 06 May 2024 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2024-05-06-teranoptia/
+
+
+<style>
+ @font-face {
+ font-family: teranoptia;
+ src: url("/posts/2024-05-06-teranoptia/teranoptia/fonts/Teranoptia-Furiae.ttf");
+ }
+
+ .teranoptia {
+ font-size: 5rem;
+ font-family: teranoptia;
+ hyphens: none!important;
+ line-height: 70px;
+ }
+
+ .small {
+ font-size:3rem;
+ line-height: 40px;
+ }
+
+ .glyphset {
+ display: flex;
+ flex-wrap: wrap;
+ }
+ .glyphset div {
+ margin: 3px;
+ }
+ .glyphset div p {
+ text-align: center;
+ }
+
+</style>
+
+
+<p>Having fun with fonts doesn’t always mean obsessing over kerning and ligatures. Sometimes, writing text is not even the point!</p>
+<p>You don’t believe it? Type something in here.</p>
+
+
+
+<textarea id="test-generated-animal" class="teranoptia" style="width: 100%; line-height: 50pt;"></textarea>
+
+<div style="display: flex; gap: 10px;">
+ Characters to generate:
+ <input id="test-glyph-count" type="number" value=10 ></input>
+ <button onclick="generateTest(document.getElementById('test-glyph-count').value);">Generate!</button>
+</div>
+
+<script>
+function makeBreakable(animal){
+ // Line break trick - avoid hypens and allows wrapping
+ const animalFragments = animal.split(/(?=[yvspmieaźACFILOSWŹv])/g);
+ animal = animalFragments.join("<wbr>");
+ return animal;
+}
+
+function generateTest(value){
+ var newAnimal = '';
+ for (var i = 0; i < value; i++) {
+ newAnimal += randomFrom(validChars);
+ }
+ document.getElementById("test-generated-animal").value = newAnimal;
+}
+
+</script>
+
+
+<p><a href="https://www.tunera.xyz/fonts/teranoptia/" class="external-link" target="_blank" rel="noopener">Teranoptia</a> is a cool font that lets you build small creatures by mapping each letter (and a few other characters) to a piece of a creature like a head, a tail, a leg, a wing and so on. By typing words you can create strings of creatures.</p>
+<p>Here is the glyphset:</p>
+
+
+<div class="glyphset">
+ <div><p>A</p><p class="teranoptia">A</p></div>
+
+ <div><p>B</p><p class="teranoptia">B</p></div>
+
+ <div><p>C</p><p class="teranoptia">C</p></div>
+
+ <div><p>D</p><p class="teranoptia">D</p></div>
+
+ <div><p>E</p><p class="teranoptia">E</p></div>
+
+ <div><p>F</p><p class="teranoptia">F</p></div>
+
+ <div><p>G</p><p class="teranoptia">G</p></div>
+
+ <div><p>H</p><p class="teranoptia">H</p></div>
+
+ <div><p>I</p><p class="teranoptia">I</p></div>
+
+ <div><p>J</p><p class="teranoptia">J</p></div>
+
+ <div><p>K</p><p class="teranoptia">K</p></div>
+
+ <div><p>L</p><p class="teranoptia">L</p></div>
+
+ <div><p>M</p><p class="teranoptia">M</p></div>
+
+ <div><p>N</p><p class="teranoptia">N</p></div>
+
+ <div><p>O</p><p class="teranoptia">O</p></div>
+
+ <div><p>P</p><p class="teranoptia">P</p></div>
+
+ <div><p>Q</p><p class="teranoptia">Q</p></div>
+
+ <div><p>R</p><p class="teranoptia">R</p></div>
+
+ <div><p>S</p><p class="teranoptia">S</p></div>
+
+ <div><p>T</p><p class="teranoptia">T</p></div>
+
+ <div><p>U</p><p class="teranoptia">U</p></div>
+
+ <div><p>V</p><p class="teranoptia">V</p></div>
+
+ <div><p>W</p><p class="teranoptia">W</p></div>
+
+ <div><p>X</p><p class="teranoptia">X</p></div>
+
+ <div><p>Ẋ</p><p class="teranoptia">Ẋ</p></div>
+
+ <div><p>Y</p><p class="teranoptia">Y</p></div>
+
+ <div><p>Z</p><p class="teranoptia">Z</p></div>
+
+ <div><p>Ź</p><p class="teranoptia">Ź</p></div>
+
+ <div><p>Ž</p><p class="teranoptia">Ž</p></div>
+
+ <div><p>Ż</p><p class="teranoptia">Ż</p></div>
+
+ <div><p>a</p><p class="teranoptia">a</p></div>
+
+ <div><p>b</p><p class="teranoptia">b</p></div>
+
+ <div><p>ḅ</p><p class="teranoptia">ḅ</p></div>
+
+ <div><p>c</p><p class="teranoptia">c</p></div>
+
+ <div><p>d</p><p class="teranoptia">d</p></div>
+
+ <div><p>e</p><p class="teranoptia">e</p></div>
+
+ <div><p>f</p><p class="teranoptia">f</p></div>
+
+ <div><p>g</p><p class="teranoptia">g</p></div>
+
+ <div><p>h</p><p class="teranoptia">h</p></div>
+
+ <div><p>i</p><p class="teranoptia">i</p></div>
+
+ <div><p>j</p><p class="teranoptia">j</p></div>
+
+ <div><p>k</p><p class="teranoptia">k</p></div>
+
+ <div><p>l</p><p class="teranoptia">l</p></div>
+
+ <div><p>m</p><p class="teranoptia">m</p></div>
+
+ <div><p>n</p><p class="teranoptia">n</p></div>
+
+ <div><p>o</p><p class="teranoptia">o</p></div>
+
+ <div><p>p</p><p class="teranoptia">p</p></div>
+
+ <div><p>q</p><p class="teranoptia">q</p></div>
+
+ <div><p>r</p><p class="teranoptia">r</p></div>
+
+ <div><p>s</p><p class="teranoptia">s</p></div>
+
+ <div><p>t</p><p class="teranoptia">t</p></div>
+
+ <div><p>u</p><p class="teranoptia">u</p></div>
+
+ <div><p>v</p><p class="teranoptia">v</p></div>
+
+ <div><p>w</p><p class="teranoptia">w</p></div>
+
+ <div><p>x</p><p class="teranoptia">x</p></div>
+
+ <div><p>y</p><p class="teranoptia">y</p></div>
+
+ <div><p>z</p><p class="teranoptia">z</p></div>
+
+ <div><p>ź</p><p class="teranoptia">ź</p></div>
+
+ <div><p>ž</p><p class="teranoptia">ž</p></div>
+
+ <div><p>ż</p><p class="teranoptia">ż</p></div>
+
+ <div><p>,</p><p class="teranoptia">,</p></div>
+
+ <div><p>*</p><p class="teranoptia">*</p></div>
+
+ <div><p>(</p><p class="teranoptia">(</p></div>
+
+ <div><p>)</p><p class="teranoptia">)</p></div>
+
+ <div><p>{</p><p class="teranoptia">{</p></div>
+
+ <div><p>}</p><p class="teranoptia">}</p></div>
+
+ <div><p>[</p><p class="teranoptia">[</p></div>
+
+ <div><p>]</p><p class="teranoptia">]</p></div>
+
+ <div><p>‐</p><p class="teranoptia">‐</p></div>
+
+ <div><p>“</p><p class="teranoptia">“</p></div>
+
+ <div><p>”</p><p class="teranoptia">”</p></div>
+
+ <div><p>‘</p><p class="teranoptia">‘</p></div>
+
+ <div><p>’</p><p class="teranoptia">’</p></div>
+
+ <div><p>«</p><p class="teranoptia">«</p></div>
+
+ <div><p>»</p><p class="teranoptia">»</p></div>
+
+ <div><p>‹</p><p class="teranoptia">‹</p></div>
+
+ <div><p>›</p><p class="teranoptia">›</p></div>
+
+ <div><p>$</p><p class="teranoptia">$</p></div>
+
+ <div><p>€</p><p class="teranoptia">€</p></div>
+</div>
+
+You'll notice that there's a lot you can do with it, from assembling simple creatures:
+
+<p class="teranoptia">vTN</p>
+
+to more complex, multi-line designs:
+
+<p class="teranoptia"><wbr> {Ž}</p>
+<p class="teranoptia">F] [Z</p>
+
+
+
+<p>Let’s play with it a bit and see how we can put together a few “correct” looking creatures.</p>
+<div class="notice info">
+ <div class="notice-content"><em>As you’re about to notice, I’m no JavaScript developer. Don’t expect high-quality JS in this post.</em></div>
+</div>
+
+<h2 id="mirroring-animals">
+ Mirroring animals
+ <a class="heading-link" href="#mirroring-animals">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>To begin with, let’s start with a simple function: animal mirroring. The glyphset includes a mirrored version of each non-symmetric glyph, but the mapping is rather arbitrary, so we are going to need a map.</p>
+<p>Here are the pairs:</p>
+<p class="small teranoptia" style="letter-spacing: 5px;"> By Ev Hs Kp Nm Ri Ve Za Żź Az Cx Fu Ir Lo Ol Sh Wd Źż vE Dw Gt Jq Mn Pk Qj Tg Uf Xc Ẋḅ Yb Žž bY cX () [] {} </p>
+<h3 id="animal-mirror">
+ Animal mirror
+ <a class="heading-link" href="#animal-mirror">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+
+
+
+<div style="display: flex; gap: 10px;">
+ <input id="original-animal" type="text" class="teranoptia" style="width: 50%; text-align:right;" oninput="mirrorAnimal(this.value);" value="WYZ*p»gh"></input>
+ <p id="mirrored-animal" class="teranoptia" style="line-height: 50pt;">ST»K*abd</p>
+</div>
+
+<script>
+const mirrorPairs = {"B": "y", "y": "B", "E": "v", "v": "E", "H": "s", "s": "H", "K": "p", "p": "K", "N": "m", "m": "N", "R": "i", "i": "R", "V": "e", "e": "V", "Z": "a", "a": "Z", "Ż": "ź", "ź": "Ż", "A": "z", "z": "A", "C": "x", "x": "C", "F": "u", "u": "F", "I": "r", "r": "I", "L": "o", "o": "L", "O": "l", "l": "O", "S": "h", "h": "S", "W": "d", "d": "W", "Ź": "ż", "ż": "Ź", "v": "E", "E": "v", "D": "w", "w": "D", "G": "t", "t": "G", "J": "q", "q": "J", "M": "n", "n": "M", "P": "k", "k": "P", "Q": "j", "j": "Q", "T": "g", "g": "T", "U": "f", "f": "U", "X": "c", "c": "X", "Ẋ": "ḅ", "ḅ": "Ẋ", "Y": "b", "b": "Y", "Ž": "ž", "ž": "Ž", "b": "Y", "Y": "b", "c": "X", "X": "c", "(": ")", ")": "(", "[": "]", "]": "[", "{": "}", "}": "{"};
+
+function mirrorAnimal(original){
+ var mirror = '';
+ for (i = original.length-1; i >= 0; i--){
+ newChar = mirrorPairs[original.charAt(i)];
+ if (newChar){
+ mirror += newChar;
+ } else {
+ mirror += original.charAt(i)
+ }
+ console.log(original, original.charAt(i), mirrorPairs[original.charAt(i)], mirror);
+ }
+ document.getElementById("mirrored-animal").innerHTML = mirror;
+}
+</script>
+
+
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-javascript" data-lang="javascript"><span style="display:flex;"><span><span style="color:#ff7b72">const</span> mirrorPairs <span style="color:#ff7b72;font-weight:bold">=</span> {<span style="color:#a5d6ff">"B"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"y"</span>, <span style="color:#a5d6ff">"y"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"B"</span>, <span style="color:#a5d6ff">"E"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"v"</span>, <span style="color:#a5d6ff">"v"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"E"</span>, <span style="color:#a5d6ff">"H"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"s"</span>, <span style="color:#a5d6ff">"s"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"H"</span>, <span style="color:#a5d6ff">"K"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"p"</span>, <span style="color:#a5d6ff">"p"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"K"</span>, <span style="color:#a5d6ff">"N"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"m"</span>, <span style="color:#a5d6ff">"m"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"N"</span>, <span style="color:#a5d6ff">"R"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"i"</span>, <span style="color:#a5d6ff">"i"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"R"</span>, <span style="color:#a5d6ff">"V"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"e"</span>, <span style="color:#a5d6ff">"e"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"V"</span>, <span style="color:#a5d6ff">"Z"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"a"</span>, <span style="color:#a5d6ff">"a"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"Z"</span>, <span style="color:#a5d6ff">"Ż"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"ź"</span>, <span style="color:#a5d6ff">"ź"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"Ż"</span>, <span style="color:#a5d6ff">"A"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"z"</span>, <span style="color:#a5d6ff">"z"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"A"</span>, <span style="color:#a5d6ff">"C"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"x"</span>, <span style="color:#a5d6ff">"x"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"C"</span>, <span style="color:#a5d6ff">"F"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"u"</span>, <span style="color:#a5d6ff">"u"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"F"</span>, <span style="color:#a5d6ff">"I"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"r"</span>, <span style="color:#a5d6ff">"r"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"I"</span>, <span style="color:#a5d6ff">"L"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"o"</span>, <span style="color:#a5d6ff">"o"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"L"</span>, <span style="color:#a5d6ff">"O"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"l"</span>, <span style="color:#a5d6ff">"l"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"O"</span>, <span style="color:#a5d6ff">"S"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"h"</span>, <span style="color:#a5d6ff">"h"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"S"</span>, <span style="color:#a5d6ff">"W"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"d"</span>, <span style="color:#a5d6ff">"d"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"W"</span>, <span style="color:#a5d6ff">"Ź"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"ż"</span>, <span style="color:#a5d6ff">"ż"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"Ź"</span>, <span style="color:#a5d6ff">"v"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"E"</span>, <span style="color:#a5d6ff">"E"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"v"</span>, <span style="color:#a5d6ff">"D"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"w"</span>, <span style="color:#a5d6ff">"w"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"D"</span>, <span style="color:#a5d6ff">"G"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"t"</span>, <span style="color:#a5d6ff">"t"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"G"</span>, <span style="color:#a5d6ff">"J"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"q"</span>, <span style="color:#a5d6ff">"q"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"J"</span>, <span style="color:#a5d6ff">"M"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"n"</span>, <span style="color:#a5d6ff">"n"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"M"</span>, <span style="color:#a5d6ff">"P"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"k"</span>, <span style="color:#a5d6ff">"k"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"P"</span>, <span style="color:#a5d6ff">"Q"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"j"</span>, <span style="color:#a5d6ff">"j"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"Q"</span>, <span style="color:#a5d6ff">"T"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"g"</span>, <span style="color:#a5d6ff">"g"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"T"</span>, <span style="color:#a5d6ff">"U"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"f"</span>, <span style="color:#a5d6ff">"f"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"U"</span>, <span style="color:#a5d6ff">"X"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"c"</span>, <span style="color:#a5d6ff">"c"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"X"</span>, <span style="color:#a5d6ff">"Ẋ"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"ḅ"</span>, <span style="color:#a5d6ff">"ḅ"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"Ẋ"</span>, <span style="color:#a5d6ff">"Y"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"b"</span>, <span style="color:#a5d6ff">"b"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"Y"</span>, <span style="color:#a5d6ff">"Ž"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"ž"</span>, <span style="color:#a5d6ff">"ž"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"Ž"</span>, <span style="color:#a5d6ff">"b"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"Y"</span>, <span style="color:#a5d6ff">"Y"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"b"</span>, <span style="color:#a5d6ff">"c"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"X"</span>, <span style="color:#a5d6ff">"X"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"c"</span>, <span style="color:#a5d6ff">"("</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">")"</span>, <span style="color:#a5d6ff">")"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"("</span>, <span style="color:#a5d6ff">"["</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"]"</span>, <span style="color:#a5d6ff">"]"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"["</span>, <span style="color:#a5d6ff">"{"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"}"</span>, <span style="color:#a5d6ff">"}"</span><span style="color:#ff7b72;font-weight:bold">:</span> <span style="color:#a5d6ff">"{"</span>};
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">function</span> mirrorAnimal(original){
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">var</span> mirror <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">''</span>;
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">for</span> (i <span style="color:#ff7b72;font-weight:bold">=</span> original.length<span style="color:#ff7b72;font-weight:bold">-</span><span style="color:#a5d6ff">1</span>; i <span style="color:#ff7b72;font-weight:bold">>=</span> <span style="color:#a5d6ff">0</span>; i<span style="color:#ff7b72;font-weight:bold">--</span>){
+</span></span><span style="display:flex;"><span> newChar <span style="color:#ff7b72;font-weight:bold">=</span> mirrorPairs[original.charAt(i)];
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">if</span> (newChar){
+</span></span><span style="display:flex;"><span> mirror <span style="color:#ff7b72;font-weight:bold">+=</span> newChar;
+</span></span><span style="display:flex;"><span> } <span style="color:#ff7b72">else</span> {
+</span></span><span style="display:flex;"><span> mirror <span style="color:#ff7b72;font-weight:bold">+=</span> original.charAt(i)
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">return</span> mirror;
+</span></span><span style="display:flex;"><span>}
+</span></span></code></pre></div><h2 id="random-animal-generation">
+ Random animal generation
+ <a class="heading-link" href="#random-animal-generation">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>While it’s fun to build complicated animals this way, you’ll notice something: it’s pretty hard to make them come out right by simply typing something. Most of the time you need quite careful planning. In addition there’s almost no meaningful (English) word that corresponds to a well-defined creature. Very often the characters don’t match, creating a sequence of “chopped” creatures.</p>
+<p>For example, “Hello” becomes:</p>
+<p class="teranoptia">Hello</p>
+<p>This is a problem if we want to make a parametric or random creature generator, because most of the random strings won’t look good.</p>
+<h3 id="naive-random-generator">
+ Naive random generator
+ <a class="heading-link" href="#naive-random-generator">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+
+
+<div style="display: flex; gap: 10px;">
+ Characters to generate:
+ <input id="naive-glyph-count" type="number" value=10></input>
+ <button onclick="generateNaive(document.getElementById('naive-glyph-count').value);">Generate!</button>
+</div>
+
+<p id="naive-generated-animal" class="teranoptia" style="line-height: 50pt;">n]Zgameź)‐</p>
+
+<script>
+const validChars = "ABCDEFGHIJKLMNOPQRSTUVWXẊYZŹŽŻabḅcdefghijklmnopqrstuvwxyzźžż,*(){}[]‐“”«»$"; //‘’‹›€
+
+function randomFrom(list){
+ return list[Math.floor(Math.random() * list.length)];
+}
+
+function generateNaive(value){
+ var newAnimal = '';
+ for (var i = 0; i < value; i++) {
+ newAnimal += randomFrom(validChars);
+ }
+
+ // Line break trick - helps with wrapping
+ const animalFragments = newAnimal.split('');
+ newAnimal = animalFragments.join("<wbr>");
+
+ document.getElementById("naive-generated-animal").innerHTML = newAnimal;
+}
+generateNaive(document.getElementById('naive-glyph-count').value);
+
+</script>
+
+
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-javascript" data-lang="javascript"><span style="display:flex;"><span><span style="color:#ff7b72">const</span> validChars <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"ABCDEFGHIJKLMNOPQRSTUVWXẊYZŹŽŻabḅcdefghijklmnopqrstuvwxyzźžż,*(){}[]‐“”«»$"</span>; <span style="color:#8b949e;font-style:italic">// ‘’‹›€ excluded because they're mostly vertical
+</span></span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"></span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">function</span> randomFrom(list){
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">return</span> list[Math.floor(Math.random() <span style="color:#ff7b72;font-weight:bold">*</span> list.length)];
+</span></span><span style="display:flex;"><span>}
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">function</span> generateNaive(value){
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">var</span> newAnimal <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">''</span>;
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">for</span> (<span style="color:#ff7b72">var</span> i <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">0</span>; i <span style="color:#ff7b72;font-weight:bold"><</span> value; i<span style="color:#ff7b72;font-weight:bold">++</span>) {
+</span></span><span style="display:flex;"><span> newAnimal <span style="color:#ff7b72;font-weight:bold">+=</span> randomFrom(validChars);
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">return</span> newAnimal;
+</span></span><span style="display:flex;"><span>}
+</span></span></code></pre></div><p>Can we do better than this?</p>
+<h2 id="generating-good-animals">
+ Generating “good” animals
+ <a class="heading-link" href="#generating-good-animals">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>There are many ways to define “good” or “well-formed” creatures. One of the first rules we can introduce is that we don’t want chopped body parts to float alone.</p>
+<p>Translating it into a rule we can implement: a character that is “open” on the right must be followed by a character that is open on the left, and a character that is <em>not</em> open on the right must be followed by another character that is <em>not</em> open on the left.</p>
+<p>For example, <span class="small teranoptia">A</span> may be followed by <span class="small teranoptia">B</span> to make <span class="small teranoptia">AB</span>, but <span class="small teranoptia">A</span> cannot be followed by <span class="small teranoptia">C</span> to make <span class="small teranoptia">AC</span>.</p>
+<p>In the same way, <span class="small teranoptia">Z</span> may be followed by <span class="small teranoptia">A</span> to make <span class="small teranoptia">ZA</span>, but <span class="small teranoptia">Z</span> cannot be followed by <span class="small teranoptia">ż</span> to make <span class="small teranoptia">Zż</span>.</p>
+<p>This way we will get rid of all those “chopped” monsters that make up most of the randomly generated string.</p>
+<p>To summarize, the rules we have to implement are:</p>
+<ul>
+<li>Any character that is open on the right must be followed by another character that is open on the left.</li>
+<li>Any character that is closed on the right must be followed by another character that is closed on the left.</li>
+<li>The first character must not be open on the left.</li>
+<li>The last character must not be open on the right.</li>
+</ul>
+<h3 id="non-chopped-animals-generator">
+ Non-chopped animals generator
+ <a class="heading-link" href="#non-chopped-animals-generator">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+
+
+<div style="display: flex; gap: 10px;">
+ Characters to generate:
+ <input id="nochop-glyph-count" type="number" value=10></input>
+ <button onclick="generateNoChop(document.getElementById('nochop-glyph-count').value);">Generate!</button>
+</div>
+
+<p id="nochop-generated-animal" class="teranoptia" style="line-height: 50pt;">suSHebQ«EIl</p>
+
+<script>
+const charsOpenOnTheRightOnly = "yvspmieaźACFILOSWŹ({[";
+const charsOpenOnTheLeftOnly = "BEHKNRVZŻzxurolhdż)]}";
+const charsOpenOnBothSides = "DGJMPQTUXẊYŽbcwtqnkjgfcḅbžYX«»";
+const charsOpenOnNoSides = ",*-“”";
+
+const charsOpenOnTheRight = charsOpenOnTheRightOnly + charsOpenOnBothSides;
+const charsOpenOnTheLeft = charsOpenOnTheLeftOnly + charsOpenOnBothSides;
+const validInitialChars = charsOpenOnTheRightOnly + charsOpenOnNoSides;
+
+function generateNoChop(value){
+ document.getElementById("nochop-generated-animal").innerHTML = "";
+ var newAnimal = '' + randomFrom(validInitialChars);
+ for (var i = 0; i < value-1; i++) {
+ if (charsOpenOnTheRight.indexOf(newAnimal[i]) > -1){
+ newAnimal += randomFrom(charsOpenOnTheLeft);
+
+ } else if (charsOpenOnTheLeftOnly.indexOf(newAnimal[i]) > -1){
+ newAnimal += randomFrom(charsOpenOnTheRightOnly);
+
+ } else if (charsOpenOnNoSides.indexOf(newAnimal[i]) > -1){
+ newAnimal += randomFrom(validInitialChars);
+ }
+ }
+ // Final character
+ if (charsOpenOnTheRight.indexOf(newAnimal[i]) > -1){
+ newAnimal += randomFrom(charsOpenOnTheLeftOnly);
+ } else {
+ newAnimal += randomFrom(charsOpenOnNoSides);
+ }
+ document.getElementById("nochop-generated-animal").innerHTML = makeBreakable(newAnimal);
+}
+generateNoChop(document.getElementById("nochop-glyph-count").value);
+
+</script>
+
+
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-javascript" data-lang="javascript"><span style="display:flex;"><span><span style="color:#ff7b72">const</span> charsOpenOnTheRightOnly <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"yvspmieaźACFILOSWŹ({["</span>;
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> charsOpenOnTheLeftOnly <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"BEHKNRVZŻzxurolhdż)]}"</span>;
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> charsOpenOnBothSides <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"DGJMPQTUXẊYŽbcwtqnkjgfcḅbžYX«»"</span>;
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> charsOpenOnNoSides <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">",*-“”"</span>;
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> charsOpenOnTheRight <span style="color:#ff7b72;font-weight:bold">=</span> charsOpenOnTheRightOnly <span style="color:#ff7b72;font-weight:bold">+</span> charsOpenOnBothSides;
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> charsOpenOnTheLeft <span style="color:#ff7b72;font-weight:bold">=</span> charsOpenOnTheLeftOnly <span style="color:#ff7b72;font-weight:bold">+</span> charsOpenOnBothSides;
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> validInitialChars <span style="color:#ff7b72;font-weight:bold">=</span> charsOpenOnTheRightOnly <span style="color:#ff7b72;font-weight:bold">+</span> charsOpenOnNoSides;
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">function</span> generateNoChop(value){
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">var</span> newAnimal <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">''</span> <span style="color:#ff7b72;font-weight:bold">+</span> randomFrom(validInitialChars);
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">for</span> (<span style="color:#ff7b72">var</span> i <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">0</span>; i <span style="color:#ff7b72;font-weight:bold"><</span> value<span style="color:#ff7b72;font-weight:bold">-</span><span style="color:#a5d6ff">1</span>; i<span style="color:#ff7b72;font-weight:bold">++</span>) {
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">if</span> (charsOpenOnTheRight.indexOf(newAnimal[i]) <span style="color:#ff7b72;font-weight:bold">></span> <span style="color:#ff7b72;font-weight:bold">-</span><span style="color:#a5d6ff">1</span>){
+</span></span><span style="display:flex;"><span> newAnimal <span style="color:#ff7b72;font-weight:bold">+=</span> randomFrom(charsOpenOnTheLeft);
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> } <span style="color:#ff7b72">else</span> <span style="color:#ff7b72">if</span> (charsOpenOnTheLeftOnly.indexOf(newAnimal[i]) <span style="color:#ff7b72;font-weight:bold">></span> <span style="color:#ff7b72;font-weight:bold">-</span><span style="color:#a5d6ff">1</span>){
+</span></span><span style="display:flex;"><span> newAnimal <span style="color:#ff7b72;font-weight:bold">+=</span> randomFrom(charsOpenOnTheRightOnly);
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> } <span style="color:#ff7b72">else</span> <span style="color:#ff7b72">if</span> (charsOpenOnNoSides.indexOf(newAnimal[i]) <span style="color:#ff7b72;font-weight:bold">></span> <span style="color:#ff7b72;font-weight:bold">-</span><span style="color:#a5d6ff">1</span>){
+</span></span><span style="display:flex;"><span> newAnimal <span style="color:#ff7b72;font-weight:bold">+=</span> randomFrom(validInitialChars);
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> <span style="color:#8b949e;font-style:italic">// Final character
+</span></span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"></span> <span style="color:#ff7b72">if</span> (charsOpenOnTheRight.indexOf(newAnimal[i]) <span style="color:#ff7b72;font-weight:bold">></span> <span style="color:#ff7b72;font-weight:bold">-</span><span style="color:#a5d6ff">1</span>){
+</span></span><span style="display:flex;"><span> newAnimal <span style="color:#ff7b72;font-weight:bold">+=</span> randomFrom(charsOpenOnTheLeftOnly);
+</span></span><span style="display:flex;"><span> } <span style="color:#ff7b72">else</span> {
+</span></span><span style="display:flex;"><span> newAnimal <span style="color:#ff7b72;font-weight:bold">+=</span> randomFrom(charsOpenOnNoSides);
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">return</span> newAnimal;
+</span></span><span style="display:flex;"><span>}
+</span></span></code></pre></div><p>The resulting animals are already quite better!</p>
+<p>There are still a few things we may want to fix. For example, some animals end up being just a pair of heads (such as <span class="small teranoptia">sN</span>); others instead have their bodies oriented in the wrong direction (like <span class="small teranoptia">IgV</span>).</p>
+<p>Let’s try to get rid of those too.</p>
+<p>The trick here is to separate the characters into two groups: elements that are “facing left”, elements that are “facing right”, and symmetric ones. At this point, it’s convenient to call them “heads”, “bodies” and “tails” to make the code more understandable, like the following:</p>
+<ul>
+<li>
+<p>Right heads: <span class="small teranoptia">BEHKNRVZŻ</span></p>
+</li>
+<li>
+<p>Left heads: <span class="small teranoptia">yvspmieaź</span></p>
+</li>
+<li>
+<p>Right tails: <span class="small teranoptia">ACFILOSWŹv</span></p>
+</li>
+<li>
+<p>Left tails: <span class="small teranoptia">zxurolhdżE</span></p>
+</li>
+<li>
+<p>Right bodies: <span class="small teranoptia" style="letter-spacing: 5px;">DGJMPQTUẊŽ</span></p>
+</li>
+<li>
+<p>Left bodies: <span class="small teranoptia" style="letter-spacing: 5px;">wtqnkjgfḅž</span></p>
+</li>
+<li>
+<p>Entering hole: <span class="small teranoptia" style="letter-spacing: 5px;">)]}</span></p>
+</li>
+<li>
+<p>Exiting hole: <span class="small teranoptia" style="letter-spacing: 5px;">([{</span></p>
+</li>
+<li>
+<p>Bounce & symmetric bodies: <span class="small teranoptia" style="letter-spacing: 5px;">«»$bcXY</span></p>
+</li>
+<li>
+<p>Singletons: <span class="small teranoptia" style="letter-spacing: 5px;">,*-</span></p>
+</li>
+</ul>
+<p>Let’s put this all together!</p>
+<h3 id="oriented-animals-generator">
+ Oriented animals generator
+ <a class="heading-link" href="#oriented-animals-generator">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+
+
+<div style="display: flex; gap: 10px;">
+ Characters to generate:
+ <input id="oriented-glyph-count" type="number" value=10></input>
+ <button onclick="generateOriented(document.getElementById('oriented-glyph-count').value);">Generate!</button>
+</div>
+
+<p id="oriented-generated-animal" class="teranoptia" style="line-height: 50pt;">suSHebQ«EIl</p>
+
+<script>
+
+const rightAnimalHeads = "BEHKNRVZŻ";
+const leftAnimalHeads = "yvspmieaź";
+const rightAnimalTails = "ACFILOSWŹv";
+const leftAnimalTails = "zxurolhdżE";
+const rightAnimalBodies = "DGJMPQTUẊŽ";
+const leftAnimalBodies = "wtqnkjgfḅž";
+const singletons = ",*‐";
+const exitingHole = "([{";
+const enteringHole = ")]}";
+const bounce = "«»$bcXY";
+
+const validStarts = leftAnimalHeads + rightAnimalTails + exitingHole;
+const validSuccessors = {
+ [exitingHole + bounce]: rightAnimalHeads + rightAnimalBodies + leftAnimalBodies + leftAnimalTails + enteringHole + bounce,
+ [enteringHole]: rightAnimalTails + leftAnimalHeads + exitingHole + singletons,
+ [rightAnimalHeads + leftAnimalTails + singletons]: rightAnimalTails + leftAnimalHeads + exitingHole + singletons,
+ [leftAnimalHeads]: leftAnimalBodies + leftAnimalBodies + leftAnimalBodies + leftAnimalTails + enteringHole + bounce,
+ [rightAnimalTails]: rightAnimalBodies + rightAnimalBodies + rightAnimalBodies + rightAnimalHeads + enteringHole + bounce,
+ [rightAnimalBodies]: rightAnimalBodies + rightAnimalBodies + rightAnimalBodies + rightAnimalHeads + enteringHole + bounce,
+ [leftAnimalBodies]: leftAnimalBodies + leftAnimalBodies + leftAnimalBodies + leftAnimalTails + enteringHole + bounce,
+};
+const validEnds = {
+ [exitingHole + bounce]: leftAnimalTails + rightAnimalHeads + enteringHole,
+ [rightAnimalHeads + leftAnimalTails + enteringHole]: singletons,
+ [leftAnimalHeads]: leftAnimalTails + enteringHole,
+ [rightAnimalTails]: rightAnimalHeads + enteringHole,
+ [rightAnimalBodies]: rightAnimalHeads,
+ [leftAnimalBodies]: leftAnimalTails,
+};
+
+function generateOriented(value){
+
+ var newAnimal = '' + randomFrom(validStarts);
+ for (var i = 0; i < value-1; i++) {
+ last_char = newAnimal[i-1];
+ for (const [predecessor, successor] of Object.entries(validSuccessors)) {
+ if (predecessor.indexOf(last_char) > -1){
+ newAnimal += randomFrom(successor);
+ break;
+ }
+ }
+ }
+ last_char = newAnimal[i-1];
+ for (const [predecessor, successor] of Object.entries(validEnds)) {
+ if (predecessor.indexOf(last_char) > -1){
+ newAnimal += randomFrom(successor);
+ break;
+ }
+ }
+ document.getElementById("oriented-generated-animal").innerHTML = makeBreakable(newAnimal);
+}
+generateOriented(document.getElementById("oriented-glyph-count").value);
+
+</script>
+
+
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-javascript" data-lang="javascript"><span style="display:flex;"><span><span style="color:#ff7b72">const</span> rightAnimalHeads <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"BEHKNRVZŻ"</span>;
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> leftAnimalHeads <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"yvspmieaź"</span>;
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> rightAnimalTails <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"ACFILOSWŹv"</span>;
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> leftAnimalTails <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"zxurolhdżE"</span>;
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> rightAnimalBodies <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"DGJMPQTUẊŽ"</span>;
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> leftAnimalBodies <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"wtqnkjgfḅž"</span>;
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> singletons <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">",*‐"</span>;
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> exitingHole <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"([{"</span>;
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> enteringHole <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">")]}"</span>;
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> bounce <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"«»$bcXY"</span>;
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> validStarts <span style="color:#ff7b72;font-weight:bold">=</span> leftAnimalHeads <span style="color:#ff7b72;font-weight:bold">+</span> rightAnimalTails <span style="color:#ff7b72;font-weight:bold">+</span> exitingHole;
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> validSuccessors <span style="color:#ff7b72;font-weight:bold">=</span> {
+</span></span><span style="display:flex;"><span> [exitingHole <span style="color:#ff7b72;font-weight:bold">+</span> bounce]<span style="color:#ff7b72;font-weight:bold">:</span> rightAnimalHeads <span style="color:#ff7b72;font-weight:bold">+</span> rightAnimalBodies <span style="color:#ff7b72;font-weight:bold">+</span> leftAnimalBodies <span style="color:#ff7b72;font-weight:bold">+</span> leftAnimalTails <span style="color:#ff7b72;font-weight:bold">+</span> enteringHole <span style="color:#ff7b72;font-weight:bold">+</span> bounce,
+</span></span><span style="display:flex;"><span> [enteringHole]<span style="color:#ff7b72;font-weight:bold">:</span> rightAnimalTails <span style="color:#ff7b72;font-weight:bold">+</span> leftAnimalHeads <span style="color:#ff7b72;font-weight:bold">+</span> exitingHole <span style="color:#ff7b72;font-weight:bold">+</span> singletons,
+</span></span><span style="display:flex;"><span> [rightAnimalHeads <span style="color:#ff7b72;font-weight:bold">+</span> leftAnimalTails <span style="color:#ff7b72;font-weight:bold">+</span> singletons]<span style="color:#ff7b72;font-weight:bold">:</span> rightAnimalTails <span style="color:#ff7b72;font-weight:bold">+</span> leftAnimalHeads <span style="color:#ff7b72;font-weight:bold">+</span> exitingHole <span style="color:#ff7b72;font-weight:bold">+</span> singletons,
+</span></span><span style="display:flex;"><span> [leftAnimalHeads]<span style="color:#ff7b72;font-weight:bold">:</span> leftAnimalBodies <span style="color:#ff7b72;font-weight:bold">+</span> leftAnimalBodies <span style="color:#ff7b72;font-weight:bold">+</span> leftAnimalBodies <span style="color:#ff7b72;font-weight:bold">+</span> leftAnimalTails <span style="color:#ff7b72;font-weight:bold">+</span> enteringHole <span style="color:#ff7b72;font-weight:bold">+</span> bounce,
+</span></span><span style="display:flex;"><span> [rightAnimalTails]<span style="color:#ff7b72;font-weight:bold">:</span> rightAnimalBodies <span style="color:#ff7b72;font-weight:bold">+</span> rightAnimalBodies <span style="color:#ff7b72;font-weight:bold">+</span> rightAnimalBodies <span style="color:#ff7b72;font-weight:bold">+</span> rightAnimalHeads <span style="color:#ff7b72;font-weight:bold">+</span> enteringHole <span style="color:#ff7b72;font-weight:bold">+</span> bounce,
+</span></span><span style="display:flex;"><span> [rightAnimalBodies]<span style="color:#ff7b72;font-weight:bold">:</span> rightAnimalBodies <span style="color:#ff7b72;font-weight:bold">+</span> rightAnimalBodies <span style="color:#ff7b72;font-weight:bold">+</span> rightAnimalBodies <span style="color:#ff7b72;font-weight:bold">+</span> rightAnimalHeads <span style="color:#ff7b72;font-weight:bold">+</span> enteringHole <span style="color:#ff7b72;font-weight:bold">+</span> bounce,
+</span></span><span style="display:flex;"><span> [leftAnimalBodies]<span style="color:#ff7b72;font-weight:bold">:</span> leftAnimalBodies <span style="color:#ff7b72;font-weight:bold">+</span> leftAnimalBodies <span style="color:#ff7b72;font-weight:bold">+</span> leftAnimalBodies <span style="color:#ff7b72;font-weight:bold">+</span> leftAnimalTails <span style="color:#ff7b72;font-weight:bold">+</span> enteringHole <span style="color:#ff7b72;font-weight:bold">+</span> bounce,
+</span></span><span style="display:flex;"><span>};
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">const</span> validEnds <span style="color:#ff7b72;font-weight:bold">=</span> {
+</span></span><span style="display:flex;"><span> [exitingHole <span style="color:#ff7b72;font-weight:bold">+</span> bounce]<span style="color:#ff7b72;font-weight:bold">:</span> leftAnimalTails <span style="color:#ff7b72;font-weight:bold">+</span> rightAnimalHeads <span style="color:#ff7b72;font-weight:bold">+</span> enteringHole,
+</span></span><span style="display:flex;"><span> [rightAnimalHeads <span style="color:#ff7b72;font-weight:bold">+</span> leftAnimalTails <span style="color:#ff7b72;font-weight:bold">+</span> enteringHole]<span style="color:#ff7b72;font-weight:bold">:</span> singletons,
+</span></span><span style="display:flex;"><span> [leftAnimalHeads]<span style="color:#ff7b72;font-weight:bold">:</span> leftAnimalTails <span style="color:#ff7b72;font-weight:bold">+</span> enteringHole,
+</span></span><span style="display:flex;"><span> [rightAnimalTails]<span style="color:#ff7b72;font-weight:bold">:</span> rightAnimalHeads <span style="color:#ff7b72;font-weight:bold">+</span> enteringHole,
+</span></span><span style="display:flex;"><span> [rightAnimalBodies]<span style="color:#ff7b72;font-weight:bold">:</span> rightAnimalHeads,
+</span></span><span style="display:flex;"><span> [leftAnimalBodies]<span style="color:#ff7b72;font-weight:bold">:</span> leftAnimalTails,
+</span></span><span style="display:flex;"><span>};
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">function</span> generateOriented(value){
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">var</span> newAnimal <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">''</span> <span style="color:#ff7b72;font-weight:bold">+</span> randomFrom(validStarts);
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">for</span> (<span style="color:#ff7b72">var</span> i <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">0</span>; i <span style="color:#ff7b72;font-weight:bold"><</span> value<span style="color:#ff7b72;font-weight:bold">-</span><span style="color:#a5d6ff">1</span>; i<span style="color:#ff7b72;font-weight:bold">++</span>) {
+</span></span><span style="display:flex;"><span> last_char <span style="color:#ff7b72;font-weight:bold">=</span> newAnimal[i<span style="color:#ff7b72;font-weight:bold">-</span><span style="color:#a5d6ff">1</span>];
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">for</span> (<span style="color:#ff7b72">const</span> [predecessor, successor] <span style="color:#ff7b72">of</span> Object.entries(validSuccessors)) {
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">if</span> (predecessor.indexOf(last_char) <span style="color:#ff7b72;font-weight:bold">></span> <span style="color:#ff7b72;font-weight:bold">-</span><span style="color:#a5d6ff">1</span>){
+</span></span><span style="display:flex;"><span> newAnimal <span style="color:#ff7b72;font-weight:bold">+=</span> randomFrom(successor);
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">break</span>;
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> last_char <span style="color:#ff7b72;font-weight:bold">=</span> newAnimal[i<span style="color:#ff7b72;font-weight:bold">-</span><span style="color:#a5d6ff">1</span>];
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">for</span> (<span style="color:#ff7b72">const</span> [predecessor, successor] <span style="color:#ff7b72">of</span> Object.entries(validEnds)) {
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">if</span> (predecessor.indexOf(last_char) <span style="color:#ff7b72;font-weight:bold">></span> <span style="color:#ff7b72;font-weight:bold">-</span><span style="color:#a5d6ff">1</span>){
+</span></span><span style="display:flex;"><span> newAnimal <span style="color:#ff7b72;font-weight:bold">+=</span> randomFrom(successor);
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">break</span>;
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">return</span> newAnimal;
+</span></span><span style="display:flex;"><span>}
+</span></span></code></pre></div><h2 id="a-regular-grammar">
+ A regular grammar
+ <a class="heading-link" href="#a-regular-grammar">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Let’s move up a level now.</p>
+<p>What we’ve defined up to this point is a set of rules that, given a string, determine what characters are allowed next. This is called a <a href="https://en.wikipedia.org/wiki/Formal_grammar" class="external-link" target="_blank" rel="noopener"><strong>formal grammar</strong></a> in Computer Science.</p>
+<p>A grammar is defined primarily by:</p>
+<ul>
+<li>an <strong>alphabet</strong> of symbols (our Teranoptia font).</li>
+<li>a set of <strong>starting characters</strong>: all the characters that can be used at the start of the string (such as <span class="small teranoptia">a</span> or <span class="small teranoptia">*</span>).</li>
+<li>a set of <strong>terminating character</strong>: all the characters that can be used to terminate the string (such as <span class="small teranoptia">d</span> or <span class="small teranoptia">-</span>).</li>
+<li>a set of <strong>production rules</strong>: the rules needed to generate valid strings in that grammar.</li>
+</ul>
+<p>In our case, we’re looking for a grammar that defines “well formed” animals. For example, our production rules might look like this:</p>
+<ul>
+<li>S (the start of the string) → a (<span class="small teranoptia">a</span>)</li>
+<li>a (<span class="small teranoptia">a</span>) → ad (<span class="small teranoptia">ad</span>)</li>
+<li>a (<span class="small teranoptia">a</span>) → ab (<span class="small teranoptia">ab</span>)</li>
+<li>b (<span class="small teranoptia">b</span>) → bb (<span class="small teranoptia">bb</span>)</li>
+<li>b (<span class="small teranoptia">b</span>) → bd (<span class="small teranoptia">bd</span>)</li>
+<li>d (<span class="small teranoptia">d</span>) → E (the end of the string)</li>
+<li>, (<span class="small teranoptia">,</span>) → E (the end of the string)</li>
+</ul>
+<p>and so on. Each combination would have its own rule.</p>
+<p>There are three main types of grammars according to Chomsky’s hierarchy:</p>
+<ul>
+<li><strong>Regular grammars</strong>: in all rules, the left-hand side is only a single nonterminal symbol and right-hand side may be the empty string, or a single terminal symbol, or a single terminal symbol followed by a nonterminal symbol, but nothing else.</li>
+<li><strong>Context-free grammars</strong>: in all rules, the left-hand side of each production rule consists of only a single nonterminal symbol, while the right-hand side may contain any number of terminal and non-terminal symbols.</li>
+<li><strong>Context-sensitive grammars</strong>: rules can contain many terminal and non-terminal characters on both sides.</li>
+</ul>
+<p>In our case, all the production rules look very much like the examples we defined above: one character on the left-hand side, at most two on the right-hand side. This means we’re dealing with a regular grammar. And this is good news, because it means that this language can be encoded into a <strong>regular expression</strong>.</p>
+<h2 id="building-the-regex">
+ Building the regex
+ <a class="heading-link" href="#building-the-regex">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Regular expressions are a very powerful tool, one that needs to be used with care. They’re best used for string validation: given an arbitrary string, they are going to check whether it respects the grammar, i.e. whether the string it could have been generated by applying the rules above.</p>
+<p>Having a regex for our Teranoptia animals will allow us to search for valid animals in long lists of stroings, for example an English dictionary. Such a search would have been prohibitively expensive without a regular expression: using one, while still quite costly, is orders of magnitude more efficient.</p>
+<p>In order to build this complex regex, let’s start with a very limited example: a regex that matches left-facing snakes.</p>
+<pre tabindex="0"><code class="language-regex" data-lang="regex">^(a(b|c|X|Y)*d)+$
+</code></pre><p>This regex is fairly straightforward: the string must start with a (<span class="small teranoptia">a</span>), can contain any number of b (<span class="small teranoptia">b</span>), c (<span class="small teranoptia">c</span>), X (<span class="small teranoptia">X</span>) and Y (<span class="small teranoptia">Y</span>), and must end with d (<span class="small teranoptia">d</span>). While we’re at it, let’s add a + to the end, meaning that this pattern can repeat multiple times: the string will simply contain many snakes.</p>
+<h3 id="left-facing-snakes-regex">
+ Left-facing snakes regex
+ <a class="heading-link" href="#left-facing-snakes-regex">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+
+
+<div style="display: flex; gap: 10px;">
+ <input id="left-facing-snakes-input" type="string" class="teranoptia" value="abd" oninput="validateLeftFacingSnake();"></input>
+ <p id="left-facing-snakes-result">Valid</p>
+</div>
+
+<script>
+var leftFacingSnake = new RegExp("^(a(b|c|X|Y)*d)+$");
+
+function validateLeftFacingSnake(){
+ const candidate = document.getElementById('left-facing-snakes-input').value;
+ if (leftFacingSnake.test(candidate)){
+ document.getElementById('left-facing-snakes-input').style.color = "green";
+ document.getElementById('left-facing-snakes-result').innerHTML = "Valid!";
+ } else {
+ document.getElementById('left-facing-snakes-input').style.color = "red";
+ document.getElementById('left-facing-snakes-result').innerHTML = "NOT valid!";
+ }
+}
+validateLeftFacingSnake()
+</script>
+
+
+<p>What would it take to extend it to snakes that face either side? Luckily, snake bodies are symmetrical, so we can take advantage of that and write:</p>
+<pre tabindex="0"><code class="language-regex" data-lang="regex">^((a|W)(b|c|X|Y)*(d|Z))+$
+</code></pre><h3 id="naive-snakes">
+ Naive snakes
+ <a class="heading-link" href="#naive-snakes">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+
+
+<div style="display: flex; gap: 10px;">
+ <input id="naive-snakes-input" type="string" class="teranoptia" value="abdWXZ" oninput="validateNaiveSnake();"></input>
+ <p id="naive-snakes-result">Valid</p>
+</div>
+
+<script>
+var naiveSnake = new RegExp("^((a|W)(b|c|X|Y)*(d|Z))+$");
+
+function validateNaiveSnake(){
+ const candidate = document.getElementById('naive-snakes-input').value;
+ if (naiveSnake.test(candidate)){
+ document.getElementById('naive-snakes-input').style.color = "green";
+ document.getElementById('naive-snakes-result').innerHTML = "Valid!";
+ } else {
+ document.getElementById('naive-snakes-input').style.color = "red";
+ document.getElementById('naive-snakes-result').innerHTML = "NOT valid!";
+ }
+}
+validateNaiveSnake();
+</script>
+
+
+<p>That looks super-promising until we realize that there’s a problem: this “snake” <span class="small teranoptia">aZ</span> also matches the regex. To generate well-formed animals we need to keep heads and tails separate. In the regex, it would look like:</p>
+<pre tabindex="0"><code class="language-regex" data-lang="regex">^(
+ (a)(b|c|X|Y)*(d) |
+ (W)(b|c|X|Y)*(Z)
+)+$
+</code></pre><h3 id="correct-snakes">
+ Correct snakes
+ <a class="heading-link" href="#correct-snakes">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+
+
+<div style="display: flex; gap: 10px;">
+ <input id="correct-snakes-input" type="string" class="teranoptia" value="abdWXZ" oninput="validateCorrectSnake();"></input>
+ <p id="correct-snakes-result">Valid</p>
+</div>
+
+<script>
+var correctSnake = new RegExp("^(((a)(b|c|X|Y)*(d))|((W)(b|c|X|Y)*(Z)))+$");
+
+function validateCorrectSnake(){
+ const candidate = document.getElementById('correct-snakes-input').value;
+ if (correctSnake.test(candidate)){
+ document.getElementById('correct-snakes-input').style.color = "green";
+ document.getElementById('correct-snakes-result').innerHTML = "Valid!";
+ } else {
+ document.getElementById('correct-snakes-input').style.color = "red";
+ document.getElementById('correct-snakes-result').innerHTML = "NOT valid!";
+ }
+}
+validateCorrectSnake()
+</script>
+
+
+<p>Once here, building the rest of the regex is simply matter of adding the correct characters to each group. We’re gonna trade some extra characters for an easier structure by duplicating the symmetric characters when needed.</p>
+<pre tabindex="0"><code class="language-regex" data-lang="regex">^(
+ // Left-facing animals
+ (
+ y|v|s|p|m|i|e|a|ź|(|[|{ // Left heads & exiting holes
+ )(
+ w|t|q|n|k|j|g|f|ḅ|ž|X|Y|b|c|$|«|» // Left & symmetric bodies
+ )*(
+ z|x|u|r|o|l|h|d|ż|E|)|]|} // Left tails & entering holes
+ ) |
+
+ // Right facing animals
+ (
+ A|C|F|I|L|O|S|W|Ź|v|(|[|{ // right tails & exiting holes
+ )(
+ D|G|J|M|P|Q|T|U|Ẋ|Ž|b|c|X|Y|$|«|» // right & symmetric bodies
+ )*(
+ B|E|H|K|N|R|V|Z|Ż|)|]|} // right heads & entering holes
+ ) |
+
+ // Singletons
+ (,|-|*)
+)+$
+</code></pre><h3 id="well-formed-animals-regex">
+ Well-formed animals regex
+ <a class="heading-link" href="#well-formed-animals-regex">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+
+
+<div style="display: flex; gap: 10px;">
+ <input id="correct-animal-input" type="string" class="teranoptia" value="abu*W«XZ" oninput="validateCorrectAnimal();"></input>
+ <p id="correct-animal-result">Valid</p>
+</div>
+
+<script>
+var correctAnimal = new RegExp("^((y|v|s|p|m|i|e|a|ź|\\(|\\[|\\{)(w|t|q|n|k|j|g|f|ḅ|ž|b|c|X|Y|\\$|«|»)*(z|x|u|r|o|l|h|d|ż|E|\\)|\\]|\\})|(A|C|F|I|L|O|S|W|Ź|v|\\(|\\[|\\{)(D|G|J|M|P|Q|T|U|Ẋ|Ž|b|c|X|Y|\\$|«|»)*(B|E|H|K|N|R|V|Z|Ż|\\)|\\]|\\})|(-|\\*|,))+$");
+
+function validateCorrectAnimal(){
+ const candidate = document.getElementById('correct-animal-input').value;
+ if (correctAnimal.test(candidate)){
+ document.getElementById('correct-animal-input').style.color = "green";
+ document.getElementById('correct-animal-result').innerHTML = "Valid!";
+ } else {
+ document.getElementById('correct-animal-input').style.color = "red";
+ document.getElementById('correct-animal-result').innerHTML = "NOT valid!";
+ }
+}
+validateCorrectAnimal();
+</script>
+
+
+<p>If you play with the above regex, you’ll notice a slight discrepancy with what our well-formed animal generator creates. The generator can create “double-headed” monsters where a symmetric body part is inserted, like <span class="small teranoptia">a«Z</span>. However, the regex does not allow it. Extending it to account for these scenarios would make it even more unreadable, so this is left as an exercise for the reader.</p>
+<h2 id="searching-for-monstrous-words">
+ Searching for “monstrous” words
+ <a class="heading-link" href="#searching-for-monstrous-words">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Let’s put the regex to use! There must be some English words that match the regex, right?</p>
+<p>Google helpfully compiled a text file with the most frequent 10.000 English words by frequency. Let’s load it up and match every line with our brand-new regex. Unfortunately Teranoptia is case-sensitive and uses quite a few odd letters and special characters, so it’s unlikely we’re going to find many interesting creatures. Still worth an attempt.</p>
+<h3 id="monster-search">
+ Monster search
+ <a class="heading-link" href="#monster-search">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+
+
+<div style="display: flex; gap: 10px;">
+ <input id="file-url" type="url" value="https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english.txt" style="width: 100%;"></input>
+ <button onclick="searchFile();">Search</button>
+</div>
+<p id="search-result"></p>
+<div id="words-found"></div>
+
+<script>
+var correctAnimal = new RegExp("^((y|v|s|p|m|i|e|a|ź|\\(|\\[|\\{)(w|t|q|n|k|j|g|f|ḅ|ž|b|c|X|Y|\\$|«|»)*(z|x|u|r|o|l|h|d|ż|E|\\)|\\]|\\})|(A|C|F|I|L|O|S|W|Ź|v|\\(|\\[|\\{)(D|G|J|M|P|Q|T|U|Ẋ|Ž|b|c|X|Y|\\$|«|»)*(B|E|H|K|N|R|V|Z|Ż|\\)|\\]|\\})|(-|\\*|,))+$");
+
+function searchFile(){
+ document.getElementById('search-result').innerHTML = "Loading...";
+
+ fetch(document.getElementById('file-url').value)
+ .then((response) => {
+ if (!response.ok) {
+ throw new Error(`HTTP error: ${response.status}`);
+ }
+ return response.text();
+ })
+ .then((text) => {
+ lines = text.split('\n');
+ counter = 0;
+
+ for (i = 0; i < lines.length; i++){
+ var candidate = lines[i];
+ document.getElementById('search-result').innerHTML = "Checking " + candidate;
+ if (correctAnimal.test(candidate)){
+ document.getElementById('words-found').innerHTML += "<p>"+candidate+"<span class='teranoptia'> "+candidate+"</span></p>";
+ counter++;
+ }
+ }
+ document.getElementById('search-result').innerHTML = "Done! Found "+ counter +" animals over "+lines.length+" words tested.";
+ })
+ .catch((error) => {
+ document.getElementById('search-result').innerHTML = "Failed to fetch file :(";
+ });
+}
+</script>
+
+
+<p>Go ahead and put your own vocabulary file to see if your language contains more animals!</p>
+<h2 id="conclusion">
+ Conclusion
+ <a class="heading-link" href="#conclusion">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>In this post I’ve just put together a few exercises for fun, but these tools can be great for teaching purposes: the output is very easy to validate visually, and the grammar involved, while not trivial, is not as complex as natural language or as dry as numerical sequences. If you need something to keep your students engaged, this might be a simple trick to help them visualize the concepts better.</p>
+<p>On my side, I think I’m going to use these neat little monsters as weird <a href="https://en.wikipedia.org/wiki/Fleuron_%28typography%29" class="external-link" target="_blank" rel="noopener">fleurons</a> :)</p>
+<p class="fleuron"><a href="https://www.zansara.dev/posts/2024-05-06-teranoptia/">su</a></p>
+<hr>
+<p><em>Download Teranoptia at this link: <a href="https://www.tunera.xyz/fonts/teranoptia/" class="external-link" target="_blank" rel="noopener">https://www.tunera.xyz/fonts/teranoptia/</a></em></p>
+
+
+
+
+ RAG, the bad parts (and the good!)
+ https://www.zansara.dev/posts/2024-04-29-odsc-east-rag/
+ Mon, 29 Apr 2024 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2024-04-29-odsc-east-rag/
+ <p><em>This is a writeup of my talk at <a href="https://www.zansara.dev/talks/2024-04-25-odsc-east-rag/" >ODSC East 2024</a> and <a href="https://www.zansara.dev/talks/2024-07-10-europython-rag/" >EuroPython 2024</a>.</em></p>
+<hr>
+<p>If you’ve been at any AI or Python conference this year, there’s one acronym that you’ve probably heard in nearly every talk: it’s RAG. RAG is one of the most used techniques to enhance LLMs in production, but why is it so? And what are its weak points?</p>
+<p>In this post, we will first describe what RAG is and how it works at a high level. We will then see what type of failures we may encounter, how they happen, and a few reasons that may trigger these issues. Next, we will look at a few tools to help us evaluate a RAG application in production. Last, we’re going to list a few techniques to enhance your RAG app and make it more capable in a variety of scenarios.</p>
+<p>Let’s dive in.</p>
+<h1 id="outline">
+ Outline
+ <a class="heading-link" href="#outline">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<ul>
+<li><a href="#what-is-rag" >What is RAG?</a></li>
+<li><a href="#why-should-i-use-it" >Why should I use it?</a>
+<ul>
+<li><a href="#a-weather-chatbot" >A weather chatbot</a></li>
+<li><a href="#a-real-world-example" >A real-world example</a></li>
+</ul>
+</li>
+<li><a href="#failure-modes" >Failure modes</a>
+<ul>
+<li><a href="#retrieval-failure" >Retrieval failure</a></li>
+<li><a href="#generation-failure" >Generation failure</a></li>
+</ul>
+</li>
+<li><a href="#evaluation-strategies" >Evaluation strategies</a>
+<ul>
+<li><a href="#evaluating-retrieval" >Evaluating Retrieval</a></li>
+<li><a href="#evaluating-generation" >Evaluating Generation</a></li>
+<li><a href="#end-to-end-evaluation" >End-to-end evaluation</a></li>
+<li><a href="#putting-it-all-together" >Putting it all together</a></li>
+</ul>
+</li>
+<li><a href="#advanced-flavors-of-rag" >Advanced flavors of RAG</a>
+<ul>
+<li><a href="#use-multiple-retrievers" >Use multiple retrievers</a></li>
+<li><a href="#self-correction" >Self-correction</a></li>
+<li><a href="#agentic-rag" >Agentic RAG</a></li>
+<li><a href="#multihop-rag" >Multihop RAG</a></li>
+</ul>
+</li>
+<li><a href="#a-word-on-finetuning" >A word on finetuning</a></li>
+<li><a href="#conclusion" >Conclusion</a></li>
+</ul>
+<h1 id="what-is-rag">
+ What is RAG?
+ <a class="heading-link" href="#what-is-rag">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>RAG stands for <strong>R</strong>etrieval <strong>A</strong>ugmented <strong>G</strong>eneration, which can be explained as: “A technique to <strong>augment</strong> LLM’s knowledge beyond its training data by <strong>retrieving</strong> contextual information before a <strong>generating</strong> an answer.”</p>
+<p><img src="https://www.zansara.dev/posts/2024-04-29-odsc-east-rag/rag-diagram.png" alt=""></p>
+<p>RAG is a technique that works best for question-answering tasks, such as chatbots or similar knowledge extraction applications. This means that the user of a RAG app is a user who needs an answer to a question.</p>
+<p>The first step of RAG is to take the question and hand it over to a component called <a href="https://docs.haystack.deepset.ai/docs/retrievers?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener"><strong>retriever</strong></a>. A retriever is any system that, given a question, can find data relevant to the question within a vast dataset, be it text, images, rows in a DB, or anything else.</p>
+<p>When implementing RAG, many developers think immediately that a vector database is necessary for retrieval. While vector databases such as <a href="https://haystack.deepset.ai/integrations/qdrant-document-store?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">Qdrant</a>, <a href="https://haystack.deepset.ai/integrations/chroma-documentstore?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">ChromaDB</a>, <a href="https://haystack.deepset.ai/integrations/weaviate-document-store?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">Weaviate</a> and so on, are great for retrieval in some applications, they’re not the only option. Keyword-based algorithms such as <a href="https://haystack.deepset.ai/integrations/elasticsearch-document-store?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">Elasticsearch BM25</a> or TF-IDF can be used as retrievers in a RAG application, and you can even go as far as using a <a href="https://docs.haystack.deepset.ai/docs/websearch?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">web search engine API</a>, such as Google or Bing. Anything that is given a question and can return information relevant to the question can be used here.</p>
+<p>Once our retriever sifted through all the data and returned a few relevant snippets of context, the question and the context are assembled into a <strong>RAG prompt</strong>. It looks like this:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-markdown" data-lang="markdown"><span style="display:flex;"><span>Read the text below and answer the question at the bottom.
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>Text: [all the text found by the retriever]
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>Question: [the user's question]
+</span></span></code></pre></div><p>This prompt is then fed to the last component, called a <a href="https://docs.haystack.deepset.ai/docs/components_overview#generators?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener"><strong>generator</strong></a>. A generator is any system that, given a prompt, can answer the question that it contains. In practice, “generator” is an umbrella term for any LLM, be it behind an API like GPT-3.5 or running locally, such as a Llama model. The generator receives the prompt, reads and understands it, and then writes down an answer that can be given back to the user, closing the loop.</p>
+<h1 id="why-should-i-use-it">
+ Why should I use it?
+ <a class="heading-link" href="#why-should-i-use-it">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>There are three main benefits of using a RAG architecture for your LLM apps instead of querying the LLM directly.</p>
+<ol>
+<li>
+<p><strong>Reduces hallucinations</strong>. The RAG prompt contains the answer to the user’s question together with the question, so the LLM doesn’t need to <em>know</em> the answer, but it only needs to read the prompt and rephrase a bit of its content.</p>
+</li>
+<li>
+<p><strong>Allows access to fresh data</strong>. RAG makes LLMs capable of reasoning about data that wasn’t present in their training set, such as highly volatile figures, news, forecasts, and so on.</p>
+</li>
+<li>
+<p><strong>Increases transparency</strong>. The retrieval step is much easier to inspect than LLM’s inference process, so it’s far easier to spot and fact-check any answer the LLM provides.</p>
+</li>
+</ol>
+<p>To understand these points better, let’s see an example.</p>
+<h2 id="a-weather-chatbot">
+ A weather chatbot
+ <a class="heading-link" href="#a-weather-chatbot">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>We’re making a chatbot for a weather forecast app. Suppose the user asks an LLM directly, “Is it going to rain in Lisbon tomorrow morning?”. In that case, the LLM will make up a random answer because it obviously didn’t have tomorrow’s weather forecast for Lisbon in its training set and knows nothing about it.</p>
+<p>When an LLM is queried with a direct question, it will use its internal knowledge to answer it. LLMs have read the entire Internet during their training phase, so they learned that whenever they saw a line such as “What’s the capital of France?”, the string “Paris” always appeared among the following few words. So when a user asks the same question, the answer will likely be “Paris”.</p>
+<p>This “recalling from memory” process works for well-known facts but is not always practical. For more nuanced questions or something that the LLM hasn’t seen during training, it often fails: in an attempt to answer the question, the LLM will make up a response that is not based on any real source. This is called a <strong>hallucination</strong>, one of LLMs’ most common and feared failure modes.</p>
+<p>RAG helps prevent hallucinations because, in the RAG prompt, the question and all the data needed to answer it are explicitly given to the LLM. For our weather chatbot, the retriever will first do a Google search and find some data. Then, we will put together the RAG prompt. The result will look like this:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-markdown" data-lang="markdown"><span style="display:flex;"><span>Read the text below and answer the question at the bottom.
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>Text: According to the weather forecast, the weather in Lisbon tomorrow
+</span></span><span style="display:flex;"><span>is expected to be mostly sunny, with a high of 18°C and a low of 11°C.
+</span></span><span style="display:flex;"><span>There is a 25% chance of showers in the evening.
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>Question: Is it going to rain in Lisbon tomorrow morning?
+</span></span></code></pre></div><p>Now, it’s clear that the LLM doesn’t have to recall anything about the weather in Lisbon from its memory because the prompt already contains the answer. The LLM only needs to rephrase the context. This makes the task much simpler and drastically reduces the chances of hallucinations.</p>
+<p>In fact, RAG is the only way to build an LLM-powered system that can answer a question like this with any confidence at all. Retraining an LLM every morning with the forecast for the day would be a lot more wasteful, require a ton of data, and won’t return consistent results. Imagine if we were making a chatbot that gives you figures from the stock market!</p>
+<p>In addition, a weather chatbot built with RAG <strong>can be fact-checked</strong>. If users have access to the web pages that the retriever found, they can check the pages directly when the results are not convincing, which helps build trust in the application.</p>
+<h2 id="a-real-world-example">
+ A real-world example
+ <a class="heading-link" href="#a-real-world-example">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>If you want to compare a well-implemented RAG system with a plain LLM, you can put <a href="https://chat.openai.com/" class="external-link" target="_blank" rel="noopener">ChatGPT</a> (the free version, powered by GPT-3.5) and <a href="https://www.perplexity.ai/" class="external-link" target="_blank" rel="noopener">Perplexity</a> to the test. ChatGPT does not implement RAG, while Perplexity is one of the most effective implementations existing today.</p>
+<p>Let’s ask both: “Where does ODSC East 2024 take place?”</p>
+<p>ChatGPT says:</p>
+<p><img src="https://www.zansara.dev/posts/2024-04-29-odsc-east-rag/chatgpt.png" alt=""></p>
+<p>While Perplexity says:</p>
+<p><img src="https://www.zansara.dev/posts/2024-04-29-odsc-east-rag/perplexity-ai.png" alt=""></p>
+<p>Note how ChatGPT clearly says that it doesn’t know: this is better than many other LLMs, which would just make up a place and date. On the contrary, Perplexity states some specific facts, and in case of doubt it’s easy to verify that it’s right by simply checking the sources above. Even just looking at the source’s URL can give users a lot more confidence in whether the answer is grounded.</p>
+<h1 id="failure-modes">
+ Failure modes
+ <a class="heading-link" href="#failure-modes">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Now that we understand how RAG works, let’s see what can go wrong in the process.</p>
+<p>As we’ve just described, a RAG app goes in two steps – retrieval and generation. Therefore, we can classify RAG failures into two broad categories:</p>
+<ol>
+<li>
+<p><strong>Retrieval failures</strong>: The retriever component fails to find the correct context for the given question. The RAG prompt injects irrelevant noise into the prompt, which confuses the LLM and results in a wrong or unrelated answer.</p>
+</li>
+<li>
+<p><strong>Generation failures</strong>: The LLM fails to produce a correct answer even with a proper RAG prompt containing a question and all the data needed to answer it.</p>
+</li>
+</ol>
+<p>To understand them better, let’s pretend an imaginary user poses our application the following question about a <a href="https://en.wikipedia.org/wiki/Republic_of_Rose_Island" class="external-link" target="_blank" rel="noopener">little-known European microstate</a>:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-markdown" data-lang="markdown"><span style="display:flex;"><span>What was the official language of the Republic of Rose Island?
+</span></span></code></pre></div><p>Here is what would happen in an ideal case:</p>
+<p><img src="https://www.zansara.dev/posts/2024-04-29-odsc-east-rag/successful-query.png" alt=""></p>
+<p>First, the retriever searches the dataset (let’s imagine, in this case, Wikipedia) and returns a few snippets. The retriever did a good job here, and the snippets contain clearly stated information about the official language of Rose Island. The LLM reads these snippets, understands them, and replies to the user (correctly):</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-markdown" data-lang="markdown"><span style="display:flex;"><span>The official language of the Republic of Rose Island was Esperanto.
+</span></span></code></pre></div><h2 id="retrieval-failure">
+ Retrieval failure
+ <a class="heading-link" href="#retrieval-failure">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>What would happen if the retrieval step didn’t go as planned?</p>
+<p><img src="https://www.zansara.dev/posts/2024-04-29-odsc-east-rag/retrieval-failure.png" alt=""></p>
+<p>Here, the retriever finds some information about Rose Island, but none of the snippets contain any information about the official language. They only say where it was located, what happened to it, and so on. So the LLM, which knows nothing about this nation except what the prompt says, takes an educated guess and replies:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-markdown" data-lang="markdown"><span style="display:flex;"><span>The official language of the Republic of Rose Island was Italian.
+</span></span></code></pre></div><p>The wrong answer here is none of the LLM’s fault: the retriever is the component to blame.</p>
+<p>When and why can retrieval fail? There are as many answers to this question as retrieval methods, so each should be inspected for its strengths and weaknesses. However there are a few reasons that are common to most of them.</p>
+<ul>
+<li>
+<p><strong>The relevant data does not exist in the database</strong>. When the data does not exist, it’s impossible to retrieve it. Many retrieval techniques, however, give a relevance score to each result that they return, so filtering out low-relevance snippets may help mitigate the issue.</p>
+</li>
+<li>
+<p><strong>The retrieval algorithm is too naive to match a question with its relevant context</strong>. This is a common issue for keyword-based retrieval methods such as TF-IDF or BM25 (Elasticsearch). These algorithms can’t deal with synonims or resolve acronyms, so if the question and the relevant context don’t share the exact same words, the retrieval won’t work.</p>
+</li>
+<li>
+<p><strong>Embedding model (if used) is too small or unsuitable for the data</strong>. The data must be embedded before being searchable when doing a vector-based search. “Embedded” means that every snippet of context is associated with a list of numbers called an <strong>embedding</strong>. The quality of the embedding then determines the quality of the retrieval. If you embed your documents with a naive embedding model, or if you are dealing with a very specific domain such as narrow medical and legal niches, the embedding of your data won’t be able to represent their content precisely enough for the retrieval to be successful.</p>
+</li>
+<li>
+<p><strong>The data is not chunked properly (too big or too small chunks)</strong>. Retrievers thrive on data that is chunked properly. Huge blocks of text will be found relevant to almost any question and will drown the LLM with information. Too small sentences or sentence fragments won’t carry enough context for the LLM to benefit from the retriever’s output. Proper chunking can be a huge lever to improve the quality of your retrieval.</p>
+</li>
+<li>
+<p><strong>The data and the question are in different languages</strong>. Keyword-based retrieval algorithms suffer from this issue the most because keywords in different languages rarely match. If you expect questions to come in a different language than the data you are retrieving from, consider adding a translation step or performing retrieval with a multilingual embedder instead.</p>
+</li>
+</ul>
+<p>One caveat with retrieval failures is that if you’re using a very powerful LLM such as GPT-4, sometimes your LLM is smart enough to understand that the retrieved context is incorrect and will discard it, <strong>hiding the failure</strong>. This means that it’s even more important to make sure retrieval is working well in isolation, something we will see in a moment.</p>
+<h2 id="generation-failure">
+ Generation failure
+ <a class="heading-link" href="#generation-failure">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Assuming that retrieval was successful, what would happen if the LLM still hallucinated?</p>
+<p><img src="https://www.zansara.dev/posts/2024-04-29-odsc-east-rag/generation-failure.png" alt=""></p>
+<p>This is clearly an issue with our LLM: even when given all the correct data, the LLM still generated a wrong answer. Maybe our LLM doesn’t know that Esperanto is even a language? Or perhaps we’re using an LLM that doesn’t understand English well?</p>
+<p>Naturally, each LLM will have different weak points that can trigger issues like these. Here are some common reasons why you may be getting generation failures.</p>
+<ul>
+<li>
+<p><strong>The model is too small and can’t follow instructions well</strong>. When building in a resource-constrained environment (such as local smartphone apps or IoT), the choice of LLMs shrinks to just a few tiny models. However, the smaller the model, the less it will be able to understand natural language, and even when it does, it limits its ability to follow instructions. If you notice that your model consistently doesn’t pay enough attention to the question when answering it, consider switching to a larger or newer LLM.</p>
+</li>
+<li>
+<p><strong>The model knows too little about the domain to even understand the question</strong>. This can happen if your domain is highly specific, uses specific terminology, or relies on uncommon acronyms. Models are trained on general-purpose text, so they might not understand some questions without finetuning, which helps specify the meaning of the most critical key terms and acronyms. When the answers given by your model somewhat address the question but miss the point entirely and stay generic or hand-wavy, this is likely the case.</p>
+</li>
+<li>
+<p><strong>The model is not multilingual, but the questions and context may be</strong>. It’s essential that the model understands the question being asked in order to be able to answer it. The same is true for context: if the data found by the retriever is in a language that the LLM cannot understand, it won’t help it answer and might even confuse it further. Always make sure that your LLM understands the languages your users use.</p>
+</li>
+<li>
+<p><strong>The RAG prompt is not built correctly</strong>. Some LLMs, especially older or smaller ones, may be very sensitive to how the prompt is built. If your model ignores part of the context or misses the question, the prompt might contain contradicting information, or it might be simply too large. LLMs are not always great at <a href="https://cs.stanford.edu/~nfliu/papers/lost-in-the-middle.arxiv2023.pdf" class="external-link" target="_blank" rel="noopener">finding a needle in the haystack</a>: if you are consistently building huge RAG prompts and you observe generation issues, consider cutting it back to help the LLM focus on the data that actually contains the answer.</p>
+</li>
+</ul>
+<p><img src="https://www.zansara.dev/posts/2024-04-29-odsc-east-rag/lost-in-the-middle.png" alt=""></p>
+<h1 id="evaluation-strategies">
+ Evaluation strategies
+ <a class="heading-link" href="#evaluation-strategies">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Once we put our RAG system in production, we should keep an eye on its performance at scale. This is where evaluation frameworks come into play.</p>
+<p>To properly evaluate the performance of RAG, it’s best to perform two evaluation steps:</p>
+<ol>
+<li>
+<p><strong>Isolated Evaluation</strong>. Being a two-step process, failures at one stage can hide or mask the other, so it’s hard to understand where the failures originate from. To address this issue, evaluate the retrieval and generation separately: both must work well in isolation.</p>
+</li>
+<li>
+<p><strong>End to end evaluation</strong>. To ensure the system works well from start to finish, it’s best to evaluate it as a whole. End-to-end evaluation brings its own set of challenges, but it correlates more directly to the quality of the overall app.</p>
+</li>
+</ol>
+<h2 id="evaluating-retrieval">
+ Evaluating Retrieval
+ <a class="heading-link" href="#evaluating-retrieval">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Each retrieval method has its own state-of-the-art evaluation method and framework, so it’s usually best to refer to those.</p>
+<p>For <strong>keyword-based</strong> retrieval algorithms such as TD-IDF, BM25, PageRank, and so on, evaluation is often done by checking the keywords match well. For this, you can use <a href="https://en.wikipedia.org/wiki/Evaluation_measures_%28information_retrieval%29" class="external-link" target="_blank" rel="noopener">one of the many metrics</a> used for this purpose: <a href="https://en.wikipedia.org/wiki/Precision_and_recall" class="external-link" target="_blank" rel="noopener">recall, precision</a>, <a href="https://en.wikipedia.org/wiki/F-score" class="external-link" target="_blank" rel="noopener">F1</a>, <a href="https://en.wikipedia.org/wiki/Mean_reciprocal_rank" class="external-link" target="_blank" rel="noopener">MRR</a>, <a href="https://en.wikipedia.org/wiki/Evaluation_measures_%28information_retrieval%29#Mean_average_precision" class="external-link" target="_blank" rel="noopener">MAP</a>, …</p>
+<p>For <strong>vector-based</strong> retrievers like vector DBs, the evaluation is more tricky because checking for matching keywords is not sufficient: the semantics of the question and the answer must evaluated for similarity. We are going to see some libraries that help with this when evaluating generation: in short, they use another LLM to judge the similarity or compute metrics like <a href="https://docs.ragas.io/en/latest/concepts/metrics/semantic_similarity.html" class="external-link" target="_blank" rel="noopener">semantic similarity</a>.</p>
+<h2 id="evaluating-generation">
+ Evaluating Generation
+ <a class="heading-link" href="#evaluating-generation">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p><img src="https://www.zansara.dev/posts/2024-04-29-odsc-east-rag/uptrain-logo.png" alt=""></p>
+<p>Evaluating an LLM’s answers to a question is still a developing art, and several libraries can help with the task. One commonly used framework is <a href="https://haystack.deepset.ai/integrations/uptrain?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">UpTrain</a>, which implements an “LLM-as-a-judge” approach. This means that the answers given by an LLM are then evaluated by another LLM, normally a larger and more powerful model.</p>
+<p>This approach has the benefit that responses are not simply checked strictly for the presence or absence of keywords but can be evaluated according to much more sophisticated criteria like <a href="https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness" class="external-link" target="_blank" rel="noopener">completeness</a>, <a href="https://docs.uptrain.ai/predefined-evaluations/response-quality/response-conciseness" class="external-link" target="_blank" rel="noopener">conciseness</a>, <a href="https://docs.uptrain.ai/predefined-evaluations/response-quality/response-relevance" class="external-link" target="_blank" rel="noopener">relevance</a>, <a href="https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy" class="external-link" target="_blank" rel="noopener">factual accuracy</a>, <a href="https://docs.uptrain.ai/predefined-evaluations/conversation-evals/user-satisfaction" class="external-link" target="_blank" rel="noopener">conversation quality</a>, and more.</p>
+<p>This approach leads to a far more detailed view of what the LLM is good at and what aspects of the generation could or should be improved. The criteria to select depend strongly on the application: for example, in medical or legal apps, factual accuracy should be the primary metric to optimize for, while in customer support, user satisfaction and conversation quality are also essential. For personal assistants, it’s usually best to focus on conciseness, and so on.</p>
+<div class="notice info">
+ <div class="notice-content">💡 <em>UpTrain can also be used to evaluate RAG applications end-to-end. Check <a href="https://docs.uptrain.ai/getting-started/introduction" class="external-link" target="_blank" rel="noopener">its documentation</a> for details.</em></div>
+</div>
+
+<h2 id="end-to-end-evaluation">
+ End-to-end evaluation
+ <a class="heading-link" href="#end-to-end-evaluation">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p><img src="https://www.zansara.dev/posts/2024-04-29-odsc-east-rag/ragas-logo.png" alt=""></p>
+<p>The evaluation of RAG systems end-to-end is also quite complex and can be implemented in many ways, depending on the aspect you wish to monitor. One of the simplest approaches is to focus on semantic similarity between the question and the final answer.</p>
+<p>A popular framework that can be used for such high-level evaluation is <a href="https://haystack.deepset.ai/integrations/ragas?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">RAGAS</a>. In fact, RAGAS offers two interesting metrics:</p>
+<ul>
+<li>
+<p><a href="https://docs.ragas.io/en/stable/concepts/metrics/semantic_similarity.html" class="external-link" target="_blank" rel="noopener"><strong>Answer semantic similarity</strong></a>. This is computed simply by taking the cosine similarity between the answer and the ground truth.</p>
+</li>
+<li>
+<p><a href="https://docs.ragas.io/en/stable/concepts/metrics/answer_correctness.html" class="external-link" target="_blank" rel="noopener"><strong>Answer correctness</strong></a>. Answer correctness is defined as a weighted average of the semantic similarity and the F1 score between the generated answer and the ground truth. This metric is more oriented towards fact-based answers, where F1 can help ensure that relevant facts such as dates, names, and so on are explicitly stated.</p>
+</li>
+</ul>
+<p>On top of evaluation metrics, RAGAS also offers the capability to build <a href="https://docs.ragas.io/en/stable/concepts/testset_generation.html" class="external-link" target="_blank" rel="noopener">synthetic evaluation datasets</a> to evaluate your app against. Such datasets spare you the work-intensive process of building a real-world evaluation dataset with human-generated questions and answers but also trade high quality for volume and speed. If your domain is very specific or you need extreme quality, synthetic datasets might not be an option, but for most real-world apps, such datasets can save tons of labeling time and resources.</p>
+<div class="notice info">
+ <div class="notice-content">💡 <em>RAGAS can also be used to evaluate each step of a RAG application in isolation. Check <a href="https://docs.ragas.io/en/stable/getstarted/index.html" class="external-link" target="_blank" rel="noopener">its documentation</a> for details.</em></div>
+</div>
+
+<div class="notice info">
+ <div class="notice-content"><p>💡 <em>I recently discovered an even more comprehensive framework for end-to-end evaluation called <a href="https://docs.relari.ai/v0.3" class="external-link" target="_blank" rel="noopener"><strong>continuous-eval</strong></a> from <a href="https://relari.ai/" class="external-link" target="_blank" rel="noopener">Relari.ai</a>, which focuses on modular evaluation of RAG pipelines. Check it out if you’re interested in this topic and RAGAS doesn’t offer enough flexibility for your use case.</em></p>
+<p><img src="https://www.zansara.dev/posts/2024-04-29-odsc-east-rag/relari-logo.png" alt=""></p></div>
+</div>
+
+<h2 id="putting-it-all-together">
+ Putting it all together
+ <a class="heading-link" href="#putting-it-all-together">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p><img src="https://www.zansara.dev/posts/2024-04-29-odsc-east-rag/haystack-logo.png" alt=""></p>
+<p>Once you know how you want to evaluate your app, it’s time to put it together. A convenient framework for this step is <a href="https://haystack.deepset.ai/?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">Haystack</a>, a Python open-source LLM framework focused on building RAG applications. Haystack is an excellent choice because it can be used through all stages of the application lifecycle, from prototyping to production, including evaluation.</p>
+<p>Haystack supports several evaluation libraries including <a href="https://haystack.deepset.ai/integrations/uptrain?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">UpTrain</a>, <a href="https://haystack.deepset.ai/integrations/ragas?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">RAGAS</a> and <a href="https://haystack.deepset.ai/integrations/deepeval?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">DeepEval</a>. To understand more about how to implement and evaluate a RAG application with it, check out their tutorial about model evaluation <a href="https://haystack.deepset.ai/tutorials/35_model_based_evaluation_of_rag_pipelines?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">here</a>.</p>
+<h1 id="advanced-flavors-of-rag">
+ Advanced flavors of RAG
+ <a class="heading-link" href="#advanced-flavors-of-rag">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Once our RAG app is ready and deployed in production, the natural next step is to look for ways to improve it even further. RAG is a very versatile technique, and many different flavors of “advanced RAG” have been experimented with, many more than I can list here. Depending on the situation, you may focus on different aspects, so let’s list some examples of tactics you can deploy to make your pipeline more powerful, context-aware, accurate, and so on.</p>
+<h2 id="use-multiple-retrievers">
+ Use multiple retrievers
+ <a class="heading-link" href="#use-multiple-retrievers">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Sometimes, a RAG app needs access to vastly different types of data simultaneously. For example, a personal assistant might need access to the Internet, your Slack, your emails, your personal notes, and maybe even your pictures. Designing a single retriever that can handle data of so many different kinds is possible. Still, it can be a real challenge and require, in many cases, an entire data ingestion pipeline.</p>
+<p>Instead of going that way, you can instead use multiple retrievers, each specialized to a specific subset of your data: for example, one retriever that browses the web, one that searches on Slack and in your emails, one that checks for relevant pictures.</p>
+<p>When using many retrievers, however, it’s often best to introduce another step called <strong>reranking</strong>. A reranker double-checks that all the results returned by each retriever are actually relevant and sorts them again before the RAG prompt is built. Rerankers are usually much more precise than retrievers in assessing the relative importance of various snippets of context, so they can dramatically improve the quality of the pipeline. In exceptional cases, they can be helpful even in RAG apps with a single retriever.</p>
+<p>Here is an <a href="https://haystack.deepset.ai/tutorials/33_hybrid_retrieval?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">example</a> of such a pipeline built with Haystack.</p>
+<h2 id="self-correction">
+ Self-correction
+ <a class="heading-link" href="#self-correction">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>We mentioned that one of the most common evaluation strategies for RAG output is “LLM-as-a-judge”: the idea of using another LLM to evaluate the answer of the first. However, why use this technique only for evaluation?</p>
+<p>Self-correcting RAG apps add one extra step at the end of the pipeline: they take the answer, pass it to a second LLM, and ask it to assess whether the answer is likely to be correct. If the check fails, the second LLM will provide some feedback on why it believes the answer is wrong, and this feedback will be given back to the first LLM to try answering another time until an agreement is reached.</p>
+<p>Self-correcting LLMs can help improve the accuracy of the answers at the expense of more LLM calls per user question.</p>
+<h2 id="agentic-rag">
+ Agentic RAG
+ <a class="heading-link" href="#agentic-rag">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>In the LLMs field, the term “agent” or “agentic” is often used to identify systems that use LLMs to make decisions. In the case of a RAG application, this term refers to a system that does not always perform retrieval but decides whether to perform it by reading the question first.</p>
+<p>For example, imagine we’re building a RAG app to help primary school children with their homework. When the question refers to topics like history or geography, RAG is very helpful to avoid hallucinations. However, if the question regards math, the retrieval step is entirely unnecessary, and it might even confuse the LLM by retrieving similar math problems with different answers.</p>
+<p>Making your RAG app agentic is as simple as giving the question to an LLM before retrieval in a prompt such as:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-markdown" data-lang="markdown"><span style="display:flex;"><span>Reply YES if the answer to this question should include facts and
+</span></span><span style="display:flex;"><span>figures, NO otherwise.
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>Question: What's the capital of France?
+</span></span></code></pre></div><p>Then, retrieval is run or skipped depending on whether the answer is YES or NO.</p>
+<p>This is the most basic version of agentic RAG. Some advanced LLMs can do better: they support so-called “function calling,” which means that they can tell you exactly how to invoke the retriever and even provide specific parameters instead of simply answering YES or NO.</p>
+<p>For more information about function calling with LLMs, check out <a href="https://platform.openai.com/docs/guides/function-calling" class="external-link" target="_blank" rel="noopener">OpenAI’s documentation</a> on the topic or the equivalent documentation of your LLM provider.</p>
+<h2 id="multihop-rag">
+ Multihop RAG
+ <a class="heading-link" href="#multihop-rag">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Multihop RAG is an even more complex version of agentic RAG. Multihop pipelines often use <strong>chain-of-thought prompts</strong>, a type of prompt that looks like this:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-markdown" data-lang="markdown"><span style="display:flex;"><span>You are a helpful and knowledgeable agent.
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>To answer questions, you'll need to go through multiple steps involving step-by-step
+</span></span><span style="display:flex;"><span>thinking and using a search engine to do web searches. The browser will respond with
+</span></span><span style="display:flex;"><span>snippets of text from web pages. When you are ready for a final answer, respond with
+</span></span><span style="display:flex;"><span><span style="color:#a5d6ff">`Final Answer:`</span>.
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>Use the following format:
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">-</span> Question: the question to be answered
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">-</span> Thought: Reason if you have the final answer. If yes, answer the question. If not,
+</span></span><span style="display:flex;"><span> find out the missing information needed to answer it.
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">-</span> Search Query: the query for the search engine
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">-</span> Observation: the search engine will respond with the results
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">-</span> Final Answer: the final answer to the question, make it short (1-5 words)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>Thought, Search Query, and Observation steps can be repeated multiple times, but
+</span></span><span style="display:flex;"><span>sometimes, we can find an answer in the first pass.
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>---
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">-</span> Question: "Was the capital of France founded earlier than the discovery of America?"
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">-</span> Thought:
+</span></span></code></pre></div><p>This prompt is very complex, so let’s break it down:</p>
+<ol>
+<li>The LLM reads the question and decides which information to retrieve.</li>
+<li>The LLM returns a query for the search engine (or a retriever of our choice).</li>
+<li>Retrieval is run with the query the LLM provided, and the resulting context is appended to the original prompt.</li>
+<li>The entire prompt is returned to the LLM, which reads it, follows all the reasoning it did in the previous steps, and decides whether to do another search or reply to the user.</li>
+</ol>
+<p>Multihop RAG is used for autonomous exploration of a topic, but it can be very expensive because many LLM calls are performed, and the prompts tend to become really long really quickly. The process can also take quite some time, so it’s not suitable for low-latency applications. However, the idea is quite powerful, and it can be adapted into other forms.</p>
+<h1 id="a-word-on-finetuning">
+ A word on finetuning
+ <a class="heading-link" href="#a-word-on-finetuning">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>It’s important to remember that finetuning is not an alternative to RAG. Finetuning can and should be used together with RAG on very complex domains, such as medical or legal.</p>
+<p>When people think about finetuning, they usually focus on finetuning the LLM. In RAG, though, it is not only the LLM that needs to understand the question: it’s crucial that the retriever understands it well, too! This means <strong>the embedding model needs finetuning as much as the LLM</strong>. Finetuning your embedding models, and in some cases also your reranker, can improve the effectiveness of your RAG by orders of magnitude. Such a finetune often requires only a fraction of the training data, so it’s well worth the investment.</p>
+<p>Finetuning the LLM is also necessary if you need to alter its behavior in production, such as making it more colloquial, more concise, or stick to a specific voice. Prompt engineering can also achieve these effects, but it’s often more brittle and can be more easily worked around. Finetuning the LLM has a much more powerful and lasting effect.</p>
+<h1 id="conclusion">
+ Conclusion
+ <a class="heading-link" href="#conclusion">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>RAG is a vast topic that could fill books: this was only an overview of some of the most important concepts to remember when working on a RAG application. For more on this topic, check out my <a href="https://www.zansara.dev/posts" >other blog posts</a> and stay tuned for <a href="https://www.zansara.dev/talks" >future talks</a>!</p>
+<p class="fleuron"><a href="https://www.zansara.dev/posts/2024-05-06-teranoptia/">[K</a></p>
+
+
+
+
+ ODSC East: RAG, the bad parts (and the good!)
+ https://www.zansara.dev/talks/2024-04-25-odsc-east-rag/
+ Thu, 25 Apr 2024 00:00:00 +0000
+
+ https://www.zansara.dev/talks/2024-04-25-odsc-east-rag/
+ <p><a href="https://odsc.com/speakers/rag-the-bad-parts-and-the-good-building-a-deeper-understanding-of-this-hot-llm-paradigms-weaknesses-strengths-and-limitations/" class="external-link" target="_blank" rel="noopener">Announcement</a>,
+<a href="https://drive.google.com/file/d/19EDFCqOiAo9Cvx5fxx6Wq1Z-EoMKwxbs/view?usp=sharing" class="external-link" target="_blank" rel="noopener">slides</a>.
+Did you miss the talk? Check out the <a href="https://www.zansara.dev/posts/2024-04-29-odsc-east-rag" >write-up</a>.</p>
+<hr>
+<p>At <a href="https://odsc.com/boston/" class="external-link" target="_blank" rel="noopener">ODSC East 2024</a> I talked about RAG: how it works, how it fails, and how to evaluate its performance objectively. I gave an overview of some useful open-source tools for RAG evalution and how to use them with <a href="https://haystack.deepset.ai/?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">Haystack</a>, and then offered some ideas on how to expand your RAG architecture further than a simple two-step process.</p>
+<p>Some resources mentioned in the talk:</p>
+<ul>
+<li>Haystack: open-source LLM framework for RAG and beyond: <a href="https://haystack.deepset.ai/?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">https://haystack.deepset.ai/</a></li>
+<li>Build and evaluate RAG with Haystack: <a href="https://haystack.deepset.ai/tutorials/35_model_based_evaluation_of_rag_pipelines/?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">https://haystack.deepset.ai/tutorials/35_model_based_evaluation_of_rag_pipelines</a></li>
+<li>Evaluating LLMs with UpTrain: <a href="https://docs.uptrain.ai/getting-started/introduction" class="external-link" target="_blank" rel="noopener">https://docs.uptrain.ai/getting-started/introduction</a></li>
+<li>Evaluating RAG end-to-end with RAGAS: <a href="https://docs.ragas.io/en/latest/" class="external-link" target="_blank" rel="noopener">https://docs.ragas.io/en/latest/</a></li>
+<li>Semantic Answer Similarity (SAS) metric: <a href="https://docs.ragas.io/en/latest/concepts/metrics/semantic_similarity.html" class="external-link" target="_blank" rel="noopener">https://docs.ragas.io/en/latest/concepts/metrics/semantic_similarity.html</a></li>
+<li>Answer Correctness metric: <a href="https://docs.ragas.io/en/latest/concepts/metrics/answer_correctness.html" class="external-link" target="_blank" rel="noopener">https://docs.ragas.io/en/latest/concepts/metrics/answer_correctness.html</a></li>
+<li>Perplexity.ai: <a href="https://www.perplexity.ai/" class="external-link" target="_blank" rel="noopener">https://www.perplexity.ai/</a></li>
+</ul>
+<p>Plus, shout-out to a very interesting LLM evaluation library I discovered at ODSC: <a href="https://docs.relari.ai/v0.3" class="external-link" target="_blank" rel="noopener">continuous-eval</a>. Worth checking out especially if SAS or answer correctness are too vague and high level for your domain.</p>
+
+
+
+
+ Explain me LLMs like I'm five: build a story to help anyone get the idea
+ https://www.zansara.dev/posts/2024-04-14-eli5-llms/
+ Sun, 14 Apr 2024 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2024-04-14-eli5-llms/
+ <p>These days everyone’s boss seems to want some form of GenAI in their products. That doesn’t always make sense: however, understanding when it does and when it doesn’t is not obvious even for us experts, and nearly impossible for everyone else.</p>
+<p>How can we help our colleagues understand the pros and cons of this tech, and figure out when and how it makes sense to use it?</p>
+<p>In this post I am going to outline a narrative that explains LLMs without tecnicalities and help you frame some high level technical decisions, such as RAG vs finetuning, or which specific model size to use, in a way that a non-technical audience can not only grasp but also reason about. We’ll start by “translating” a few terms into their “human equivalent” and then use this metaphor to reason about the differences between RAG and finetuning.</p>
+<p>Let’s dive in!</p>
+<h1 id="llms-are-high-school-students">
+ LLMs are high-school students
+ <a class="heading-link" href="#llms-are-high-school-students">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Large Language Models are often described as “super-intelligent” entities that know far more than any human could possibly know. This makes a lot of people think that they are also extremely intelligent and are able to reason about anything in a super-human way. The reality is very different: LLMs are able to memorize and repeat far more facts that humans do, but in their abilities to reason they are often inferior to the average person.</p>
+<p>Rather than describing LLMs as all-knowing geniuses, it’s much better to frame them as <strong>an average high-school student</strong>. They’re not the smartest humans on the planet, but they can help a lot if you guide them through the process. And just as a normal person might, sometimes they forget things, and occasionally they remember them wrong.</p>
+<h1 id="some-llms-are-smarter-than-others">
+ Some LLMs are smarter than others
+ <a class="heading-link" href="#some-llms-are-smarter-than-others">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Language models are not all born equal. Some are inherently able to do more complex reasoning, to learn more facts and to talk more smoothly in more languages.</p>
+<p>The <strong>“IQ”</strong> of an LLM can be approximated, more or less, to its <strong>parameter count</strong>. An LLM with 7 billion parameters is almost always less clever than a 40 billion parameter model, will have a harder time learning more facts, and will be harder to reason with.</p>
+<p>However, just like with real humans, there are exceptions. Recent “small” models can easily outperform older and larger models, due to improvements in the way they’re built. Also, some small models are very good at some very specialized job and can outperform a large, general purpose model on that task.</p>
+<h1 id="llms-learn-by-studying">
+ LLMs learn by “studying”
+ <a class="heading-link" href="#llms-learn-by-studying">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Another similarity to human students is that LLMs also learn all the fact they know by <strong>“going to school”</strong> and studying a ton of general and unrelated facts. This is what <strong>training</strong> an LLM means. This implies that, just like with students, an LLM needs a lot of varied material to study from. This material is what is usually called “training data” or “training dataset”.</p>
+<p>They can also learn more than what they currently know and <strong>specialize</strong> on a topic: all they need to do is to study further on it. This is what <strong>finetuning</strong> represents, and as you would expect, it also needs some study material. This is normally called “finetuning data/datasets”.</p>
+<p>The distinction between training and fine tuning is not much about how it’s done, but mostly about <strong>the size and contents of the dataset required</strong>. The initial training usually takes a lot of time, computing power, and tons of very varied data, just like what’s needed to bring a baby to the level of a high-schooler. Fine tuning instead looks like preparing for a specific test or exam: the study material is a lot less and a lot more specific.</p>
+<p>Keep in mind that, just like for humans, studying more can make a student a bit smarter, but it won’t make it a genius. In many cases, no amount of training and/or finetuning can close the gap between the 7 billion parameter version of an LLM and the 40 billion one.</p>
+<h1 id="every-chat-is-an-exam">
+ Every chat is an exam
+ <a class="heading-link" href="#every-chat-is-an-exam">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>One of the most common usecases for LLMs is question answering, an NLP task where users ask questions to the model and expect a correct answer back. The fact that the answer must be correct means that this interaction is very similar to an <strong>exam</strong>: the LLM is being tested by the user on its knowledge.</p>
+<p>This means that, just like a student, when the LLM is used directly it has to rely on its knowledge to answer the question. If it studied the topic well it will answer accurately most of the times. However if it didn’t study the subject, it will do what students always do: they will <strong>make up stuff that sounds legit</strong>, hoping that the teacher will not notice how little they know. This is what we call <strong>hallucinations</strong>.</p>
+<p>When the answer is known to the user the answer of the LLM can be graded, just like in a real exam, to make the LLM improve. This process is called <strong>evaluation</strong>. Just like with humans, there are many ways in which the answer can be graded: the LLM can be graded on the accuracy of the facts it recalled, or the fluency it delivered its answer with, or it can be scored on the correctness of a reasoning exercise, like a math problem. These ways of grading an LLM are called <strong>metrics</strong>.</p>
+<h1 id="making-the-exams-easier">
+ Making the exams easier
+ <a class="heading-link" href="#making-the-exams-easier">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Hallucinations are very dangerous if the user doesn’t know what the LLM was supposed to reply, so they need to be reduced, possibly eliminated entirely. It’s like we really need the students to pass the exam with flying colors, no matter how much they studied.</p>
+<p>Luckily there are many ways to help our student succeed. One way to improve the score is, naturally, to make them study more and better. Giving them more time to study (<strong>more finetuning</strong>) and better material (<strong>better finetuning datasets</strong>) is one good way to make LLMs reply correctly more often. The issue is that this method is <strong>expensive</strong>, because it needs a lot of computing power and high quality data, and the student may still forget something during the exam.</p>
+<p>We can make the exams even easier by converting them into <strong>open-book exams</strong>. Instead of asking the students to memorize all the facts and recall them during the exam, we can let them bring the book and lookup the information they need when the teacher asks the question. This method can be applied to LLMs too and is called <strong>RAG</strong>, which stands for “retrieval augmented generation”.</p>
+<p>RAG has a few interesting properties. First of all, it can make very easy even for “dumb”, small LLMs to recall nearly all the important facts correctly and consistently. By letting your students carry their history books to the exam, all of them will be able to tell you the date of any historical event by just looking it up, regardless of how smart they are or how much they studied.</p>
+<p>RAG doesn’t need a lot of data, but you need an <strong>efficient way to access it</strong>. In our metaphor, you need a well structured book with a good index to help the student find the correct facts when asked, or they might fail to find the information they need when they’re quizzed.</p>
+<p>A trait that makes RAG unique is that is can be used to keep the LLM up-to-date with <strong>information that can’t be “studied”</strong> because it changes too fast. Let’s imagine a teacher that wants to quiz the students about today’s stock prices. They can’t expect the pupils to know them if they don’t have access to the latest financial data. Even if they were to study the prices every hour the result would be quite pointless, because all the knowledge they acquire becomes immediately irrelevant and might even confuse them.</p>
+<p>Last but not least, RAG can be used <em>together</em> with finetuning. Just as a teacher can make students study the topic and then also bring the book to the exam to make sure they will answer correctly, you can also use RAG and finetuning together.</p>
+<p>However, there are situations where RAG doesn’t help. For example, it’s pointless if the questions are asked in language that the LLM doesn’t know, or if the exam is made of tasks that require complex reasoning. This is true for human students too: books won’t help them much to understand a foreign language to the point that they can take an exam in it, and won’t be useful to crack a hard math problem. For these sort of exams the students just need to be smart and study more, which in LLM terms means that you should prefer a large model and you probably need to finetune it.</p>
+<h1 id="telling-a-story">
+ Telling a story
+ <a class="heading-link" href="#telling-a-story">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Let’s recap the terminology we used:</p>
+<ul>
+<li>The <strong>LLM</strong> is a <strong>student</strong></li>
+<li>The <strong>LLM’s IQ</strong> corresponds to its <strong>parameter count</strong></li>
+<li><strong>Training</strong> an LLM is the same as making a student <strong>go to school</strong></li>
+<li><strong>Finetuning</strong> it means to make it <strong>specialize on a subject</strong> by making it study only books and other material on the subject</li>
+<li>A <strong>training dataset</strong> is the <strong>books and material</strong> the student needs to study on</li>
+<li><strong>User interactions</strong> are like <strong>university exams</strong></li>
+<li><strong>Evaluating</strong> an LLM means to <strong>score its answers</strong> as if they were the responses to a test</li>
+<li>A <strong>metric</strong> is a <strong>type of evaluation</strong> that focuses on a specific trait of the answer</li>
+<li>A <strong>hallucination</strong> is a <strong>wrong answer</strong> that the LLM makes up just like a student would, to in order to try passing an exam when it doesn’t know the answer or can’t recall it in that moment</li>
+<li><strong>RAG (retrieval augmented generation)</strong> is like an <strong>open-book exam</strong>: it gives the LLM access to some material on the question’s topic, so it won’t need to hallucinate an answer. It will help the LLM recall facts, but it won’t make it smarter.</li>
+</ul>
+<p>By drawing a parallel with a human student it can be a lot easier to explain to non-technical audience why some decisions were taken.</p>
+<p>For example, it might not be obvious why RAG is cheaper than finetuning, because both need domain-specific data. By explaining that RAG is like an open-book exam versus a closed-book one, the difference is clearer: the students need less time and effort to prepare and they’re less likely to make trivial mistakes if they can bring the book with them at the exam.</p>
+<p>Another example is hallucinations. It’s difficult for many people to understand why LLMs don’t like to say “I don’t know”, until they realise that from the LLM’s perspective every question is like an exam: better make up something that admit they’re unprepared! And so on.</p>
+<p>Building a shared, simple intuition of how LLM works is a very powerful tool. Next time you’re asked to explain a technical decision related to LLMs, building a story around it may get the message across in a much more effective way and help everyone be on the same page. Give it a try!</p>
+<p class="fleuron"><a href="https://www.zansara.dev/posts/2024-05-06-teranoptia/">WYZ</a></p>
+
+
+
+ ClozeGPT: Write Anki cloze cards with a custom GPT
+ https://www.zansara.dev/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/
+ Wed, 28 Feb 2024 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/
+ <p>As everyone who has been serious about studying with <a href="https://apps.ankiweb.net/" class="external-link" target="_blank" rel="noopener">Anki</a> knows, the first step of the journey is writing your own flashcards. Writing the cards yourself is often cited as the most straigthforward way to make the review process more effective. However, this can become a big chore, and not having enough cards to study is a sure way to not learn anything.</p>
+<p>What can we do to make this process less tedious?</p>
+<h1 id="write-simple-cards">
+ Write simple cards
+ <a class="heading-link" href="#write-simple-cards">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p><a href="https://www.reddit.com/r/Anki/" class="external-link" target="_blank" rel="noopener">A lot</a> has been written about the best way to create Anki cards. However, as a <a href="https://news.ycombinator.com/item?id=39002138" class="external-link" target="_blank" rel="noopener">HackerNews commenter</a> once said:</p>
+<blockquote>
+<p>One massively overlooked way to improve spaced repetition is to make easier cards.</p>
+</blockquote>
+<p>Cards can hardly be <a href="https://www.supermemo.com/en/blog/twenty-rules-of-formulating-knowledge" class="external-link" target="_blank" rel="noopener">too simple to be effective</a>. You don’t need to write complicated tricky questions to make sure you are making the most of your reviews. On the contrary, even a long sentence where the word you need to study is highlighted is often enough to make the review worthwhile.</p>
+<p>In the case of language learning, if you’re an advanced learner one of the easiest way to create such cards is to <a href="https://www.supermemo.com/en/blog/learn-whole-phrases-supertip-4" class="external-link" target="_blank" rel="noopener">copy-paste a sentence</a> with your target word into a card and write the translation of that word (or sentence) on the back. But if you’re a beginner, even these cards can be complicated both to write and to review. What if the sentence where you found the new word is too complex? You’ll need to write a brand new sentence. But what if you write an incorrect sentence? And so on.</p>
+<h1 id="automating-the-process">
+ Automating the process
+ <a class="heading-link" href="#automating-the-process">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Automated card generation has been often compared to the usage of <a href="https://www.reddit.com/r/languagelearning/comments/6ysx7g/is_there_value_in_making_your_own_anki_deck_or/" class="external-link" target="_blank" rel="noopener">pre-made decks</a>, because the students don’t see the content of the cards they’re adding to their decks before doing so. However, this depends a lot on how much the automation is hiding from the user.</p>
+<p>In my family we’re currently learning Portuguese, so we end up creating a lot of cards with Portuguese vocabulary. Given that many useful words are hard to make sense of without context, having cards with sample sentences helps me massively to remember them. But our sample sentences often sound unnatural in Portuguese, even when they’re correct. It would be great if we could have a “sample sentence generator” that creates such sample sentences for me in more colloquial Portuguese!</p>
+<p>This is when we’ve got the idea of using an LLM to help with the task. GPT models are great sentence generators: can we get them to make some good sample sentence cards?</p>
+<p>A <a href="https://chat.openai.com/share/89c821b8-6048-45f3-9fc1-c3875fdbe1c5" class="external-link" target="_blank" rel="noopener">quick experiment</a> proves that there is potential to this concept.</p>
+<p><img src="https://www.zansara.dev/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/chatgpt-anki-card-creation.png" alt=""></p>
+<h1 id="custom-gpts">
+ Custom GPTs
+ <a class="heading-link" href="#custom-gpts">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>The natural next step is to store that set of instructions into a custom prompt, or as they’re called now, a <a href="https://help.openai.com/en/articles/8554407-gpts-faq#h_40756527ce" class="external-link" target="_blank" rel="noopener">custom GPT</a>. Making these small wrapper is <a href="https://help.openai.com/en/articles/8554397-creating-a-gpt" class="external-link" target="_blank" rel="noopener">really easy</a>: it requires no coding, only a well crafted prompt and a catchy name. So we called our new GPT “ClozeGPT” and started off with a prompt like this:</p>
+<pre><code>Your job is to create Portuguese Anki cloze cards.
+I might give you a single word or a pair (word + translation).
+
+Front cards:
+- Use Anki's `{{c1::...}}` feature to template in cards.
+- You can create cards with multiple clozes.
+- Keep the verb focused, and don't rely too much on auxiliary verbs like
+ "precisar", "gostar", etc...
+- Use the English translation as a cloze hint.
+
+Back cards:
+- The back card should contain the Portuguese word.
+- If the word could be mistaken (e.g. "levantar" vs. "acordar"),
+ write a hint that can help me remember the difference.
+- The hint should be used sparingly.
+
+Examples:
+
+---------
+
+Input: cozinhar
+
+# FRONT
+```
+Eu {{c1::cozinho::cook}} todos os dias para minha família.
+```
+
+# BACK
+```
+cozinhar - to cook
+```
+---------
+
+Input: levantar
+
+# FRONT
+```
+Eu preciso {{c1::levantar::get up}} cedo amanhã para ir ao trabalho.
+```
+
+# BACK
+```
+levantar - to get up, to raise (don't mistake this with "acordar", which is to wake up from sleep)
+```
+</code></pre>
+<p>This simple prompt already gives very nice results!</p>
+<p><img src="https://www.zansara.dev/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/beber-flashcard.png" alt=""></p>
+<h1 id="bells-and-whistles">
+ Bells and whistles
+ <a class="heading-link" href="#bells-and-whistles">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Naturally, once a tool works well it’s hard to resist the urge to add some new features to it. So for our ClozeGPT we added a few more abilities:</p>
+<pre><code># Commands
+
+## `+<<word>>`
+Expands the back card with an extra word present in the sentence.
+Include all the previous words, plus the one given.
+In this case, only the back card needs to be printed; don't show the front card again.
+
+## `R[: <<optional hint>>]`
+Regenerates the response based on the hint given.
+If the hint is absent, regenerate the sentence with a different context.
+Do not change the target words, the hint most often a different context I would like to have a sentence for.
+
+## `Q: <<question>>`
+This is an escape to a normal chat about a related question.
+Answer the question as usual, you don't need to generate anything.
+</code></pre>
+<p>The <code>+</code> command is useful when the generated sentence contains some other interesting word you can take the occasion to learn as well:</p>
+<p><img src="https://www.zansara.dev/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/maca-flashcard.png" alt=""></p>
+<p>The <code>R</code> command can be used to direct the card generation a bit better than with a simple press on the “Regenerate” icon:</p>
+<p><img src="https://www.zansara.dev/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/morango-flashcard.png" alt=""></p>
+<p>And finally <code>Q</code> is a convenient escape hatch to make this GPT revert back to its usual helpful self, where it can engage in conversation.</p>
+<p><img src="https://www.zansara.dev/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/esquecer-flashcard.png" alt=""></p>
+<h1 id="have-fun">
+ Have fun
+ <a class="heading-link" href="#have-fun">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Our small <a href="https://chat.openai.com/g/g-wmHCaGcCZ-clozegpt" class="external-link" target="_blank" rel="noopener">ClozeGPT</a> works only for Portuguese now, but feel free to play with it if you find it useful. And, of course, always keep in mind that LLMs are only <a href="https://chat.openai.com/share/07295647-9f43-4346-97a5-b35f62251d55" class="external-link" target="_blank" rel="noopener">pretending to be humans</a>.</p>
+<p><img src="https://www.zansara.dev/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/laranja-flashcard.png" alt=""></p>
+<p><em>Front: I like orange juice in my morning coffee.</em></p>
+<p class="fleuron"><a href="https://www.zansara.dev/posts/2024-05-06-teranoptia/">SDE</a></p>
+
+
+
+ Is RAG all you need? A look at the limits of retrieval augmentation
+ https://www.zansara.dev/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser/
+ Wed, 21 Feb 2024 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser/
+ <p><em>This blogpost is a teaser for <a href="https://odsc.com/speakers/rag-the-bad-parts-and-the-good-building-a-deeper-understanding-of-this-hot-llm-paradigms-weaknesses-strengths-and-limitations/" class="external-link" target="_blank" rel="noopener">my upcoming talk</a> at ODSC East 2024 in Boston, April 23-25. It is published on the ODSC blog <a href="https://opendatascience.com/is-rag-all-you-need-a-look-at-the-limits-of-retrieval-augmentation/" class="external-link" target="_blank" rel="noopener">at this link</a>.</em></p>
+<hr>
+<p>Retrieval Augmented Generation (RAG) is by far one of the most popular and effective techniques to bring LLMs to production. Introduced by a Meta <a href="https://arxiv.org/abs/2005.11401" class="external-link" target="_blank" rel="noopener">paper</a> in 2021, it since took off and evolved to become a field in itself, fueled by the immediate benefits that it provides: lowered risk of hallucinations, access to updated information, and so on. On top of this, RAG is relatively cheap to implement for the benefit it provides, especially when compared to costly techniques like LLM finetuning. This makes it a no-brainer for a lot of usecases, to the point that nowadays every production system that uses LLMs in production seems to be implemented as some form of RAG.</p>
+<p><img src="https://www.zansara.dev/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser/rag_paper.png" alt=""></p>
+<p><em>A diagram of a RAG system from the <a href="https://arxiv.org/abs/2005.11401" class="external-link" target="_blank" rel="noopener">original paper</a>.</em></p>
+<p>However, retrieval augmentation is not a silver bullet that many claim it is. Among all these obvious benefits, RAG brings its own set of weaknesses and limitations, which it’s good to be aware of when scale and accuracy need to be improved further.</p>
+<h1 id="how-does-a-rag-application-fail">
+ How does a RAG application fail?
+ <a class="heading-link" href="#how-does-a-rag-application-fail">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>At a high level, RAG introduces a retrieval step right before the LLM generation. This means that we can classify the failure modes of a RAG system into two main categories:</p>
+<ul>
+<li>
+<p>Retrieval failures: when the retriever returns only documents which are irrelevant to the query or misleading, which in turn gives the LLM wrong information to build the final answer from.</p>
+</li>
+<li>
+<p>Generation failures: when the LLM generates a reply that is unrelated or directly contradicts the documents that were retrieved. This is a classic LLM hallucination.</p>
+</li>
+</ul>
+<p>When developing a simple system or a PoC, these sorts of errors tends to have a limited impact on the results as long as you are using the best available tools. Powerful LLMs such as GPT 4 and Mixtral are not at all prone to hallucination when the provided documents are correct and relevant, and specialized systems such as vector databases, combined with specialized embedding models, that can easily achieve high retrieval accuracy, precision and recall on most queries.</p>
+<p>However, as the system scales to larger corpora, lower quality documents, or niche and specialized fields, these errors end up amplifying each other and may degrade the overall system performance noticeably. Having a good grasp of the underlying causes of these issues, and an idea of how to minimize them, can make a huge difference.</p>
+<p><img src="https://www.zansara.dev/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser/rag_failures.png" alt=""></p>
+<p><em>The difference between retrieval and generation failures. Identifying where your RAG system is more likely to fail is key to improve the quality of its answers.</em></p>
+<h1 id="a-case-study-customer-support-chatbots">
+ A case study: customer support chatbots
+ <a class="heading-link" href="#a-case-study-customer-support-chatbots">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>This is one of the most common applications of LLMs is a chatbot that helps users by answering their questions about a product or a service. Apps like this can be used in situations that are more or less sensitive for the user and difficult for the LLM: from simple developer documentation search, customer support for airlines or banks, up to bots that provide legal or medical advice.</p>
+<p>These three systems are very similar from a high level perspective: the LLM needs to use snippets retrieved from a a corpus of documents to build a coherent answer for the user. In fact, RAG is a fitting architecture for all of them, so let’s assume that all the three systems are build more or less equally, with a retrieval step followed by a generation one.
+Let’s see what are the challenges involved in each of them.</p>
+<h2 id="enhanced-search-for-developer-documentation">
+ Enhanced search for developer documentation
+ <a class="heading-link" href="#enhanced-search-for-developer-documentation">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>For this usecase, RAG is usually sufficient to achieve good results. A simple proof of concept may even overshoots expectations.</p>
+<p>When present and done well, developer documentation is structured and easy for a chatbot to understand. Retrieval is usually easy and effective, and the LLM can reinterpret the retrieved snippets effectively. On top of that, hallucinations are easy to spot by the user or even by an automated system like a REPL, so they have a limited impact on the perceived quality of the results.</p>
+<p>As a bonus, the queries are very likely to always be in English, which happens to be the case for the documentation too and to be the language which LLMs are the strongest at.</p>
+<p><img src="https://www.zansara.dev/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser/mongodb.png" alt=""></p>
+<p><em>The MongoDB documentation provides a chatbot interface which is quite useful.</em></p>
+<h2 id="customer-support-bots-for-airlines-and-banks">
+ Customer support bots for airlines and banks
+ <a class="heading-link" href="#customer-support-bots-for-airlines-and-banks">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>In this case, the small annoyances that are already present above have a <a href="https://www.theguardian.com/world/2024/feb/16/air-canada-chatbot-lawsuit" class="external-link" target="_blank" rel="noopener">much stronger impact</a>.</p>
+<p>Even if your airline or bank’s customer support pages are top notch, hallucinations are not as easy to spot, because to make sure that the answers are accurate the user needs to check the sources that the LLM is quoting… which voids the whole point of the generation step. And what if the user cannot read such pages at all? Maybe they speak a minority language, so they can’t read them. Also, LLMs tend to perform worse on languages other than English and hallucinate more often, exacerbating the problem where it’s already more acute.</p>
+<p><img src="https://www.zansara.dev/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser/air_canada.png" alt=""></p>
+<p><em>You are going to need a very good RAG system and a huge disclaimer to avoid <a href="https://www.theguardian.com/world/2024/feb/16/air-canada-chatbot-lawsuit" class="external-link" target="_blank" rel="noopener">this scenario</a>.</em></p>
+<h2 id="bots-that-provide-legal-or-medical-advice">
+ Bots that provide legal or medical advice
+ <a class="heading-link" href="#bots-that-provide-legal-or-medical-advice">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>The third case brings the exact same issues to a whole new level. In these scenarios, vanilla RAG is normally not enough.</p>
+<p>Laws and scientific articles are hard to read for the average person, require specialized knowledge to understand, and they need to be read in context: asking the user to check the sources that the LLM is quoting is just not possible. And while retrieval on this type of documents is feasible, its accuracy is not as high as on simple, straightforward text.</p>
+<p>Even worse, LLMs often have no reliable background knowledge on these topics, so their reply need to be strongly grounded by relevant documents for the answers to be correct and dependable. While a simple RAG implementation is still better than a vanilla reply from GPT-4, the results can be problematic in entirely different ways.</p>
+<p><img src="https://www.zansara.dev/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser/medical_questions.png" alt=""></p>
+<p><em><a href="https://www.sciencedirect.com/science/article/pii/S2949761223000366" class="external-link" target="_blank" rel="noopener">Research is being done</a>, but the results are not promising yet.</em></p>
+<h1 id="ok-but-what-can-we-do">
+ Ok, but what can we do?
+ <a class="heading-link" href="#ok-but-what-can-we-do">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Moving your simple PoC to real world use cases without reducing the quality of the response requires a deeper understanding of how the retrieval and the generation work together. You need to be able to measure your system’s performance, to analyze the causes of the failures, and to plan experiments to improve such metrics. Often you will need to complement it with other techniques that can improve its retrieval and generation abilities to reach the quality thresholds that makes such a system useful at all.</p>
+<p>In my upcoming talk at ODSC East “RAG, the bad parts (and the good!): building a deeper understanding of this hot LLM paradigm’s weaknesses, strengths, and limitations” we are going to cover all these topics:</p>
+<ul>
+<li>
+<p>how to <strong>measure the performance</strong> of your RAG applications, from simple metrics like F1 to more sophisticated approaches like Semantic Answer Similarity.</p>
+</li>
+<li>
+<p>how to <strong>identify if you’re dealing with a retrieval or a generation failure</strong> and where to look for a solution: is the problem in your documents content, in their size, in the way you chunk them or embed them? Or is the LLM that is causing the most trouble, maybe due to the way you are prompting it?</p>
+</li>
+<li>
+<p>what <strong>techniques can help you raise the quality of the answers</strong>, from simple prompt engineering tricks like few-shot prompting, all the way up to finetuning, self-correction loops and entailment checks.</p>
+</li>
+</ul>
+<p>Make sure to attend to the <a href="https://odsc.com/speakers/rag-the-bad-parts-and-the-good-building-a-deeper-understanding-of-this-hot-llm-paradigms-weaknesses-strengths-and-limitations/" class="external-link" target="_blank" rel="noopener">talk</a> to learn more about all these techniques and how to apply them in your projects.</p>
+<p class="fleuron"><a href="https://www.zansara.dev/posts/2024-05-06-teranoptia/">ažo</a></p>
+
+
+
+ Headless WiFi setup on Raspberry Pi OS "Bookworm" without the Raspberry Pi Imager
+ https://www.zansara.dev/posts/2024-01-06-raspberrypi-headless-bookworm-wifi-config/
+ Sat, 06 Jan 2024 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2024-01-06-raspberrypi-headless-bookworm-wifi-config/
+ <p>Setting up a Raspberry Pi headless without the Raspberry Pi Imager used to be a fairly simple process for the average Linux user, to the point where a how-to and a few searches on the Raspberry Pi forums would sort the process out. After flashing the image with <code>dd</code>, creating <code>ssh</code> in the boot partition and populating <code>wpa_supplicant.conf</code> was normally enough to get started.</p>
+<p>However with the <a href="https://www.raspberrypi.com/news/bookworm-the-new-version-of-raspberry-pi-os/" class="external-link" target="_blank" rel="noopener">recently released Raspberry Pi OS 12 “Bookworm”</a> this second step <a href="https://www.raspberrypi.com/documentation/computers/configuration.html#connect-to-a-wireless-network" class="external-link" target="_blank" rel="noopener">doesn’t work anymore</a> and the only recommendation that users receive is to “just use the Raspberry Pi Imager” (like <a href="https://github.com/raspberrypi/bookworm-feedback/issues/72" class="external-link" target="_blank" rel="noopener">here</a>).</p>
+<p>But what does the Imager really do to configure the OS? Is it really that complex that it requires downloading a dedicated installer?</p>
+<p>In this post I’m going to find out first how to get the OS connect to the WiFi without Imager, and then I’m going to dig a bit deeper to find out why such advice is given and how the Imager performs this configuration step.</p>
+<h1 id="network-manager">
+ Network Manager
+ <a class="heading-link" href="#network-manager">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>In the <a href="https://www.raspberrypi.com/news/bookworm-the-new-version-of-raspberry-pi-os/" class="external-link" target="_blank" rel="noopener">announcement</a> of the new OS release, one of the highlights is the move to <a href="https://networkmanager.dev/" class="external-link" target="_blank" rel="noopener">NetworkManager</a> as the default mechanism to deal with networking. While this move undoubtely brings many advantages, it is the reason why the classic technique of dropping a <code>wpa_supplicant.conf</code> file under <code>/etc/wpa_supplicant/</code> no longer works.</p>
+<p>The good news is that also NetworkManager can be manually configured with a text file. The file needs to be called <code>SSID.nmconnection</code> (replace <code>SSID</code> with your network’s SSID) and placed under <code>/etc/NetworkManager/system-connections/</code> in the Pi’s <code>rootfs</code> partition.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-toml" data-lang="toml"><span style="display:flex;"><span>[connection]
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>id=SSID
+</span></span><span style="display:flex;"><span>uuid= <span style="color:#8b949e;font-style:italic"># random UUID in the format 11111111-1111-1111-1111-111111111111</span>
+</span></span><span style="display:flex;"><span>type=wifi
+</span></span><span style="display:flex;"><span>autoconnect=<span style="color:#79c0ff">true</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>[wifi]
+</span></span><span style="display:flex;"><span>mode=infrastructure
+</span></span><span style="display:flex;"><span>ssid=SSID
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>[wifi-security]
+</span></span><span style="display:flex;"><span>auth-alg=open
+</span></span><span style="display:flex;"><span>key-mgmt=wpa-psk
+</span></span><span style="display:flex;"><span>psk=PASSWORD
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>[ipv4]
+</span></span><span style="display:flex;"><span>method=auto
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>[ipv6]
+</span></span><span style="display:flex;"><span>method=auto
+</span></span></code></pre></div><p>(replace <code>SSID</code> and <code>PASSWORD</code> with your wifi network’s SSID and password). <a href="https://developer-old.gnome.org/NetworkManager/stable/nm-settings-keyfile.html" class="external-link" target="_blank" rel="noopener">Here</a> you can find the full syntax for this file.</p>
+<p>You’ll need also to configure its access rights as:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>sudo chmod -R <span style="color:#a5d6ff">600</span> <path-to-rootfs>/etc/NetworkManager/system-connections/SSID.nmconnection
+</span></span><span style="display:flex;"><span>sudo chown -R root:root <path-to-rootfs>/etc/NetworkManager/system-connections/SSID.nmconnection
+</span></span></code></pre></div><p>Once this is done, let’s not forget to create an empty <code>ssh</code> file in the <code>bootfs</code> partition to enable the SSH server:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>touch <path-to-bootfs>/ssh
+</span></span></code></pre></div><p>and, as it was already the case in Bullseye to <a href="https://www.raspberrypi.com/news/raspberry-pi-bullseye-update-april-2022/" class="external-link" target="_blank" rel="noopener">configure the default user</a> with <code>userconfig.txt</code>:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>echo <span style="color:#a5d6ff">'mypassword'</span> | openssl passwd -6 -stdin | awk <span style="color:#a5d6ff">'{print "myuser:"$1}'</span> > <path-to-bootfs>/userconfig.txt
+</span></span></code></pre></div><p>So far it doesn’t seem too complicated. However, interestingly, this is <strong>not</strong> what the Raspberry Pi Imager does, because if you use it to flash the image and check the result, these files are nowhere to be found. Is there a better way to go about this?</p>
+<h1 id="raspberry-pi-imager">
+ Raspberry Pi Imager
+ <a class="heading-link" href="#raspberry-pi-imager">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>To find out what the Imager does, my first idea was to have a peek at its <a href="https://github.com/raspberrypi/rpi-imager" class="external-link" target="_blank" rel="noopener">source code</a>. Being a Qt application the source might be quite intimidating, but with a some searching it’s possible to locate this interesting <a href="https://github.com/raspberrypi/rpi-imager/blob/6f6a90adbb88c135534d5f20cc2a10f167ea43a3/src/imagewriter.cpp#L1214" class="external-link" target="_blank" rel="noopener">snippet</a>:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-cpp" data-lang="cpp"><span style="display:flex;"><span><span style="color:#ff7b72">void</span> ImageWriter<span style="color:#ff7b72;font-weight:bold">::</span>setImageCustomization(<span style="color:#ff7b72">const</span> QByteArray <span style="color:#ff7b72;font-weight:bold">&</span>config, <span style="color:#ff7b72">const</span> QByteArray <span style="color:#ff7b72;font-weight:bold">&</span>cmdline, <span style="color:#ff7b72">const</span> QByteArray <span style="color:#ff7b72;font-weight:bold">&</span>firstrun, <span style="color:#ff7b72">const</span> QByteArray <span style="color:#ff7b72;font-weight:bold">&</span>cloudinit, <span style="color:#ff7b72">const</span> QByteArray <span style="color:#ff7b72;font-weight:bold">&</span>cloudinitNetwork)
+</span></span><span style="display:flex;"><span>{
+</span></span><span style="display:flex;"><span> _config <span style="color:#ff7b72;font-weight:bold">=</span> config;
+</span></span><span style="display:flex;"><span> _cmdline <span style="color:#ff7b72;font-weight:bold">=</span> cmdline;
+</span></span><span style="display:flex;"><span> _firstrun <span style="color:#ff7b72;font-weight:bold">=</span> firstrun;
+</span></span><span style="display:flex;"><span> _cloudinit <span style="color:#ff7b72;font-weight:bold">=</span> cloudinit;
+</span></span><span style="display:flex;"><span> _cloudinitNetwork <span style="color:#ff7b72;font-weight:bold">=</span> cloudinitNetwork;
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> qDebug() <span style="color:#ff7b72;font-weight:bold"><<</span> <span style="color:#a5d6ff">"Custom config.txt entries:"</span> <span style="color:#ff7b72;font-weight:bold"><<</span> config;
+</span></span><span style="display:flex;"><span> qDebug() <span style="color:#ff7b72;font-weight:bold"><<</span> <span style="color:#a5d6ff">"Custom cmdline.txt entries:"</span> <span style="color:#ff7b72;font-weight:bold"><<</span> cmdline;
+</span></span><span style="display:flex;"><span> qDebug() <span style="color:#ff7b72;font-weight:bold"><<</span> <span style="color:#a5d6ff">"Custom firstuse.sh:"</span> <span style="color:#ff7b72;font-weight:bold"><<</span> firstrun;
+</span></span><span style="display:flex;"><span> qDebug() <span style="color:#ff7b72;font-weight:bold"><<</span> <span style="color:#a5d6ff">"Cloudinit:"</span> <span style="color:#ff7b72;font-weight:bold"><<</span> cloudinit;
+</span></span><span style="display:flex;"><span>}
+</span></span></code></pre></div><p>I’m no C++ expert, but this function tells me a few things:</p>
+<ol>
+<li>The Imager writes the configuration in these files: <code>config.txt</code>, <code>cmdline.txt</code>, <code>firstuse.sh</code> (we’ll soon figure out this is a typo: the file is actually called <code>firstrun.sh</code>).</li>
+<li>It also prepares a “Cloudinit” configuration file, but it’s unclear if it writes it and where</li>
+<li>The content of these files is printed to the console as debug output.</li>
+</ol>
+<p>So let’s enable the debug logs and see what they produce:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>rpi-imager --debug
+</span></span></code></pre></div><p>The console stays quiet until I configure the user, password, WiFi and so on in the Imager, at which point it starts printing all the expected configuration files to the console.</p>
+<details>
+<summary><i>Click here to see the full output</i></summary>
+<pre tabindex="0"><code>Custom config.txt entries: ""
+Custom cmdline.txt entries: " cfg80211.ieee80211_regdom=PT"
+Custom firstuse.sh: "#!/bin/bash
+
+set +e
+
+CURRENT_HOSTNAME=`cat /etc/hostname | tr -d " \ \
+\\r"`
+if [ -f /usr/lib/raspberrypi-sys-mods/imager_custom ]; then
+ /usr/lib/raspberrypi-sys-mods/imager_custom set_hostname raspberrypi
+else
+ echo raspberrypi >/etc/hostname
+ sed -i "s/127.0.1.1.*$CURRENT_HOSTNAME/127.0.1.1\ raspberrypi/g" /etc/hosts
+fi
+FIRSTUSER=`getent passwd 1000 | cut -d: -f1`
+FIRSTUSERHOME=`getent passwd 1000 | cut -d: -f6`
+if [ -f /usr/lib/raspberrypi-sys-mods/imager_custom ]; then
+ /usr/lib/raspberrypi-sys-mods/imager_custom enable_ssh
+else
+ systemctl enable ssh
+fi
+if [ -f /usr/lib/userconf-pi/userconf ]; then
+ /usr/lib/userconf-pi/userconf 'myuser' '<hash-of-the-user-password>'
+else
+ echo "$FIRSTUSER:"'<hash-of-the-user-password>' | chpasswd -e
+ if [ "$FIRSTUSER" != "myuser" ]; then
+ usermod -l "myuser" "$FIRSTUSER"
+ usermod -m -d "/home/myuser" "myuser"
+ groupmod -n "myuser" "$FIRSTUSER"
+ if grep -q "^autologin-user=" /etc/lightdm/lightdm.conf ; then
+ sed /etc/lightdm/lightdm.conf -i -e "s/^autologin-user=.*/autologin-user=myuser/"
+ fi
+ if [ -f /etc/systemd/system/getty@tty1.service.d/autologin.conf ]; then
+ sed /etc/systemd/system/getty@tty1.service.d/autologin.conf -i -e "s/$FIRSTUSER/myuser/"
+ fi
+ if [ -f /etc/sudoers.d/010_pi-nopasswd ]; then
+ sed -i "s/^$FIRSTUSER /myuser /" /etc/sudoers.d/010_pi-nopasswd
+ fi
+ fi
+fi
+if [ -f /usr/lib/raspberrypi-sys-mods/imager_custom ]; then
+ /usr/lib/raspberrypi-sys-mods/imager_custom set_wlan 'MY-SSID' 'MY-PASSWORD' 'PT'
+else
+cat >/etc/wpa_supplicant/wpa_supplicant.conf <<'WPAEOF'
+country=PT
+ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
+ap_scan=1
+
+update_config=1
+network={
+ ssid="MY-SSID"
+ psk=MY-PASSWORD
+}
+
+WPAEOF
+ chmod 600 /etc/wpa_supplicant/wpa_supplicant.conf
+ rfkill unblock wifi
+ for filename in /var/lib/systemd/rfkill/*:wlan ; do
+ echo 0 > $filename
+ done
+fi
+if [ -f /usr/lib/raspberrypi-sys-mods/imager_custom ]; then
+ /usr/lib/raspberrypi-sys-mods/imager_custom set_keymap 'us'
+ /usr/lib/raspberrypi-sys-mods/imager_custom set_timezone 'Europe/Lisbon'
+else
+ rm -f /etc/localtime
+ echo "Europe/Lisbon" >/etc/timezone
+ dpkg-reconfigure -f noninteractive tzdata
+cat >/etc/default/keyboard <<'KBEOF'
+XKBMODEL="pc105"
+XKBLAYOUT="us"
+XKBVARIANT=""
+XKBOPTIONS=""
+
+KBEOF
+ dpkg-reconfigure -f noninteractive keyboard-configuration
+fi
+rm -f /boot/firstrun.sh
+sed -i 's| systemd.run.*||g' /boot/cmdline.txt
+exit 0
+"
+
+Cloudinit: "hostname: raspberrypi
+manage_etc_hosts: true
+packages:
+- avahi-daemon
+apt:
+ conf: |
+ Acquire {
+ Check-Date "false";
+ };
+
+users:
+- name: myuser
+ groups: users,adm,dialout,audio,netdev,video,plugdev,cdrom,games,input,gpio,spi,i2c,render,sudo
+ shell: /bin/bash
+ lock_passwd: false
+ passwd: <hash-of-the-user-password>
+
+ssh_pwauth: true
+
+timezone: Europe/Lisbon
+runcmd:
+- localectl set-x11-keymap "us" pc105
+- setupcon -k --force || true
+
+
+"
+</code></pre></details>
+<p>Among these the most interesting file is <code>firstrun.sh</code>, which we can quickly locate in the <code>bootfs</code> partition. Here is its content:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#8b949e;font-weight:bold;font-style:italic">#!/bin/bash
+</span></span></span><span style="display:flex;"><span><span style="color:#8b949e;font-weight:bold;font-style:italic"></span>
+</span></span><span style="display:flex;"><span>set +e
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#79c0ff">CURRENT_HOSTNAME</span><span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">`</span>cat /etc/hostname | tr -d <span style="color:#a5d6ff">" \ \
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">\\r"</span><span style="color:#a5d6ff">`</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">if</span> <span style="color:#ff7b72;font-weight:bold">[</span> -f /usr/lib/raspberrypi-sys-mods/imager_custom <span style="color:#ff7b72;font-weight:bold">]</span>; <span style="color:#ff7b72">then</span>
+</span></span><span style="display:flex;"><span> /usr/lib/raspberrypi-sys-mods/imager_custom set_hostname raspberrypi
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">else</span>
+</span></span><span style="display:flex;"><span> echo raspberrypi >/etc/hostname
+</span></span><span style="display:flex;"><span> sed -i <span style="color:#a5d6ff">"s/127.0.1.1.*</span><span style="color:#79c0ff">$CURRENT_HOSTNAME</span><span style="color:#a5d6ff">/127.0.1.1\ raspberrypi/g"</span> /etc/hosts
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">fi</span>
+</span></span><span style="display:flex;"><span><span style="color:#79c0ff">FIRSTUSER</span><span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">`</span>getent passwd <span style="color:#a5d6ff">1000</span> | cut -d: -f1<span style="color:#a5d6ff">`</span>
+</span></span><span style="display:flex;"><span><span style="color:#79c0ff">FIRSTUSERHOME</span><span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">`</span>getent passwd <span style="color:#a5d6ff">1000</span> | cut -d: -f6<span style="color:#a5d6ff">`</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">if</span> <span style="color:#ff7b72;font-weight:bold">[</span> -f /usr/lib/raspberrypi-sys-mods/imager_custom <span style="color:#ff7b72;font-weight:bold">]</span>; <span style="color:#ff7b72">then</span>
+</span></span><span style="display:flex;"><span> /usr/lib/raspberrypi-sys-mods/imager_custom enable_ssh
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">else</span>
+</span></span><span style="display:flex;"><span> systemctl enable ssh
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">fi</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">if</span> <span style="color:#ff7b72;font-weight:bold">[</span> -f /usr/lib/userconf-pi/userconf <span style="color:#ff7b72;font-weight:bold">]</span>; <span style="color:#ff7b72">then</span>
+</span></span><span style="display:flex;"><span> /usr/lib/userconf-pi/userconf <span style="color:#a5d6ff">'myuser'</span> <span style="color:#a5d6ff">'<hash-of-the-user-password>'</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">else</span>
+</span></span><span style="display:flex;"><span> echo <span style="color:#a5d6ff">"</span><span style="color:#79c0ff">$FIRSTUSER</span><span style="color:#a5d6ff">:"</span><span style="color:#a5d6ff">'<hash-of-the-user-password>'</span> | chpasswd -e
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">if</span> <span style="color:#ff7b72;font-weight:bold">[</span> <span style="color:#a5d6ff">"</span><span style="color:#79c0ff">$FIRSTUSER</span><span style="color:#a5d6ff">"</span> !<span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"myuser"</span> <span style="color:#ff7b72;font-weight:bold">]</span>; <span style="color:#ff7b72">then</span>
+</span></span><span style="display:flex;"><span> usermod -l <span style="color:#a5d6ff">"myuser"</span> <span style="color:#a5d6ff">"</span><span style="color:#79c0ff">$FIRSTUSER</span><span style="color:#a5d6ff">"</span>
+</span></span><span style="display:flex;"><span> usermod -m -d <span style="color:#a5d6ff">"/home/myuser"</span> <span style="color:#a5d6ff">"myuser"</span>
+</span></span><span style="display:flex;"><span> groupmod -n <span style="color:#a5d6ff">"myuser"</span> <span style="color:#a5d6ff">"</span><span style="color:#79c0ff">$FIRSTUSER</span><span style="color:#a5d6ff">"</span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">if</span> grep -q <span style="color:#a5d6ff">"^autologin-user="</span> /etc/lightdm/lightdm.conf ; <span style="color:#ff7b72">then</span>
+</span></span><span style="display:flex;"><span> sed /etc/lightdm/lightdm.conf -i -e <span style="color:#a5d6ff">"s/^autologin-user=.*/autologin-user=myuser/"</span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">fi</span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">if</span> <span style="color:#ff7b72;font-weight:bold">[</span> -f /etc/systemd/system/getty@tty1.service.d/autologin.conf <span style="color:#ff7b72;font-weight:bold">]</span>; <span style="color:#ff7b72">then</span>
+</span></span><span style="display:flex;"><span> sed /etc/systemd/system/getty@tty1.service.d/autologin.conf -i -e <span style="color:#a5d6ff">"s/</span><span style="color:#79c0ff">$FIRSTUSER</span><span style="color:#a5d6ff">/myuser/"</span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">fi</span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">if</span> <span style="color:#ff7b72;font-weight:bold">[</span> -f /etc/sudoers.d/010_pi-nopasswd <span style="color:#ff7b72;font-weight:bold">]</span>; <span style="color:#ff7b72">then</span>
+</span></span><span style="display:flex;"><span> sed -i <span style="color:#a5d6ff">"s/^</span><span style="color:#79c0ff">$FIRSTUSER</span><span style="color:#a5d6ff"> /myuser /"</span> /etc/sudoers.d/010_pi-nopasswd
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">fi</span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">fi</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">fi</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">if</span> <span style="color:#ff7b72;font-weight:bold">[</span> -f /usr/lib/raspberrypi-sys-mods/imager_custom <span style="color:#ff7b72;font-weight:bold">]</span>; <span style="color:#ff7b72">then</span>
+</span></span><span style="display:flex;"><span> /usr/lib/raspberrypi-sys-mods/imager_custom set_wlan <span style="color:#a5d6ff">'MY-SSID'</span> <span style="color:#a5d6ff">'MY-PASSWORD'</span> <span style="color:#a5d6ff">'PT'</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">else</span>
+</span></span><span style="display:flex;"><span>cat >/etc/wpa_supplicant/wpa_supplicant.conf <span style="color:#a5d6ff"><<'WPAEOF'
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">country=PT
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">ap_scan=1
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">update_config=1
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">network={
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> ssid="MY-SSID"
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> psk=MY-PASSWORD
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">WPAEOF</span>
+</span></span><span style="display:flex;"><span> chmod <span style="color:#a5d6ff">600</span> /etc/wpa_supplicant/wpa_supplicant.conf
+</span></span><span style="display:flex;"><span> rfkill unblock wifi
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">for</span> filename in /var/lib/systemd/rfkill/*:wlan ; <span style="color:#ff7b72">do</span>
+</span></span><span style="display:flex;"><span> echo <span style="color:#a5d6ff">0</span> > <span style="color:#79c0ff">$filename</span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">done</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">fi</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">if</span> <span style="color:#ff7b72;font-weight:bold">[</span> -f /usr/lib/raspberrypi-sys-mods/imager_custom <span style="color:#ff7b72;font-weight:bold">]</span>; <span style="color:#ff7b72">then</span>
+</span></span><span style="display:flex;"><span> /usr/lib/raspberrypi-sys-mods/imager_custom set_keymap <span style="color:#a5d6ff">'us'</span>
+</span></span><span style="display:flex;"><span> /usr/lib/raspberrypi-sys-mods/imager_custom set_timezone <span style="color:#a5d6ff">'Europe/Lisbon'</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">else</span>
+</span></span><span style="display:flex;"><span> rm -f /etc/localtime
+</span></span><span style="display:flex;"><span> echo <span style="color:#a5d6ff">"Europe/Lisbon"</span> >/etc/timezone
+</span></span><span style="display:flex;"><span> dpkg-reconfigure -f noninteractive tzdata
+</span></span><span style="display:flex;"><span>cat >/etc/default/keyboard <span style="color:#a5d6ff"><<'KBEOF'
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">XKBMODEL="pc105"
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">XKBLAYOUT="us"
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">XKBVARIANT=""
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">XKBOPTIONS=""
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">KBEOF</span>
+</span></span><span style="display:flex;"><span> dpkg-reconfigure -f noninteractive keyboard-configuration
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">fi</span>
+</span></span><span style="display:flex;"><span>rm -f /boot/firstrun.sh
+</span></span><span style="display:flex;"><span>sed -i <span style="color:#a5d6ff">'s| systemd.run.*||g'</span> /boot/cmdline.txt
+</span></span><span style="display:flex;"><span>exit <span style="color:#a5d6ff">0</span>
+</span></span></code></pre></div><details>
+<summary> <i>Side note: how does the OS know that it should run this file on its first boot? </i></summary>
+<p>Imager also writes a file called <code>cmdline.txt</code> in the boot partition, which contains the following:</p>
+<pre tabindex="0"><code>console=serial0,115200 console=tty1 root=PARTUUID=57c84f67-02 rootfstype=ext4 fsck.repair=yes rootwait quiet init=/usr/lib/raspberrypi-sys-mods/firstboot cfg80211.ieee80211_regdom=PT systemd.run=/boot/firstrun.sh systemd.run_success_action=reboot systemd.unit=kernel-command-line.target
+</code></pre><p>Note the reference to <code>/boot/firstrun.sh</code>. If you plan to implement your own <code>firstrun.sh</code> file and want to change its name, don’t forget to modify this line as well.</p>
+</details>
+<p>That’s a lot of Bash in one go, but upon inspection one can spot a recurring pattern. For example, when setting the hostname, it does this:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#ff7b72">if</span> <span style="color:#ff7b72;font-weight:bold">[</span> -f /usr/lib/raspberrypi-sys-mods/imager_custom <span style="color:#ff7b72;font-weight:bold">]</span>; <span style="color:#ff7b72">then</span>
+</span></span><span style="display:flex;"><span> /usr/lib/raspberrypi-sys-mods/imager_custom set_hostname raspberrypi
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">else</span>
+</span></span><span style="display:flex;"><span> echo raspberrypi >/etc/hostname
+</span></span><span style="display:flex;"><span> sed -i <span style="color:#a5d6ff">"s/127.0.1.1.*</span><span style="color:#79c0ff">$CURRENT_HOSTNAME</span><span style="color:#a5d6ff">/127.0.1.1\ raspberrypi/g"</span> /etc/hosts
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">fi</span>
+</span></span></code></pre></div><p>The script clearly messages that there is a “preferred” way to set the hostname: to use <code>/usr/lib/raspberrypi-sys-mods/imager_custom set_hostname [NAME]</code>. Only if this executable is not available, then it falls back to the “traditional” way of setting the hostname by editing <code>/etc/hosts</code>.</p>
+<p>The same patterns repeat a few times to perform the following operations:</p>
+<ul>
+<li>set the hostname (<code>/usr/lib/raspberrypi-sys-mods/imager_custom set_hostname [NAME]</code>)</li>
+<li>enable ssh (<code>/usr/lib/raspberrypi-sys-mods/imager_custom enable_ssh</code>)</li>
+<li>configure the user (<code>/usr/lib/userconf-pi/userconf [USERNAME] [HASHED-PASSWORD]</code>)</li>
+<li>configure the WiFi (<code>/usr/lib/raspberrypi-sys-mods/imager_custom set_wlan [MY-SSID [MY-PASSWORD] [2-LETTER-COUNTRY-CODE]</code>)</li>
+<li>set the keyboard layout (<code>/usr/lib/raspberrypi-sys-mods/imager_custom set_keymap [CODE]</code>)</li>
+<li>set the timezone (<code>/usr/lib/raspberrypi-sys-mods/imager_custom set_timezone [TIMEZONE-NAME]</code>)</li>
+</ul>
+<p>It seems like using <code>raspberrypi-sys-mods</code> to configure the OS at the first boot is the way to go in this RPi OS version, and it might be true in future versions as well. There are <a href="https://github.com/RPi-Distro/raspberrypi-sys-mods/issues/82#issuecomment-1779109991" class="external-link" target="_blank" rel="noopener">hints</a> that the Raspberry PI OS team is going to move to <a href="https://cloudinit.readthedocs.io/en/latest/index.html" class="external-link" target="_blank" rel="noopener"><code>cloud-init</code></a> in the near future, but for now this seems to be the way that the initial setup is done.</p>
+<h1 id="raspberrypi-sys-mods">
+ raspberrypi-sys-mods
+ <a class="heading-link" href="#raspberrypi-sys-mods">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>So let’s check out what <code>raspberrypi-sys-mods</code> do! The source code can be found here: <a href="https://github.com/RPi-Distro/raspberrypi-sys-mods" class="external-link" target="_blank" rel="noopener">raspberrypi-sys-mods</a>.</p>
+<p>Given that we’re interested in the WiFi configuration, let’s head straight to the <code>imager_custom</code> script (<a href="https://github.com/RPi-Distro/raspberrypi-sys-mods/blob/2e256445b65995f62db80e6a267313275cad51e4/usr/lib/raspberrypi-sys-mods/imager_custom#L97" class="external-link" target="_blank" rel="noopener">here</a>), where we discover that it’s a Bash script which does this:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#79c0ff">CONNFILE</span><span style="color:#ff7b72;font-weight:bold">=</span>/etc/NetworkManager/system-connections/preconfigured.nmconnection
+</span></span><span style="display:flex;"><span> <span style="color:#79c0ff">UUID</span><span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#ff7b72">$(</span>uuid -v4<span style="color:#ff7b72">)</span>
+</span></span><span style="display:flex;"><span> cat <span style="color:#a5d6ff"><<- EOF >${CONNFILE}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> [connection]
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> id=preconfigured
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> uuid=${UUID}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> type=wifi
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> [wifi]
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> mode=infrastructure
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> ssid=${SSID}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> hidden=${HIDDEN}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> [ipv4]
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> method=auto
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> [ipv6]
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> addr-gen-mode=default
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> method=auto
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> [proxy]
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> EOF</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">if</span> <span style="color:#ff7b72;font-weight:bold">[</span> ! -z <span style="color:#a5d6ff">"</span><span style="color:#a5d6ff">${</span><span style="color:#79c0ff">PASS</span><span style="color:#a5d6ff">}</span><span style="color:#a5d6ff">"</span> <span style="color:#ff7b72;font-weight:bold">]</span>; <span style="color:#ff7b72">then</span>
+</span></span><span style="display:flex;"><span> cat <span style="color:#a5d6ff"><<- EOF >>${CONNFILE}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> [wifi-security]
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> key-mgmt=wpa-psk
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> psk=${PASS}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> EOF</span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">fi</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> <span style="color:#8b949e;font-style:italic"># NetworkManager will ignore nmconnection files with incorrect permissions,</span>
+</span></span><span style="display:flex;"><span> <span style="color:#8b949e;font-style:italic"># to prevent Wi-Fi credentials accidentally being world-readable.</span>
+</span></span><span style="display:flex;"><span> chmod <span style="color:#a5d6ff">600</span> <span style="color:#a5d6ff">${</span><span style="color:#79c0ff">CONNFILE</span><span style="color:#a5d6ff">}</span>
+</span></span></code></pre></div><p>So after all this searching, we’re back to square one. This utility is doing exactly what we’ve done at the start: it writes a NetworkManager configuration file called <code>preconfigured.nmconnection</code> and it fills it in with the information that we’ve provided to the Imager, then changes the permissions to make sure NetworkManager can use it.</p>
+<h1 id="conclusion">
+ Conclusion
+ <a class="heading-link" href="#conclusion">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>It would be great if the Raspberry Pi OS team would expand their documentation to include this information, so that users aren’t left wondering what makes the RPi Imager so special and whether their manual setup is the right way to go or rather a hack that is likely to break. For now it seems like there is one solid good approach to this problem, and we are going to see what is going to change in the next version of the Raspberry Pi OS.</p>
+<p>On this note you should remember that doing a manual configuration of NetworkManager, using the Imager, or using <code>raspberrypi-sys-mods</code> may be nearly identical right now, but when choosing which approach to use for your project you should also keep in mind the maintenance burden that this decision brings.</p>
+<p>Doing a manual configuration is easier on many levels, but only if you don’t intend to support other versions of RPi OS. If you do, or if you expect to migrate when a new version comes out, you should consider doing something similar to what the Imager does: use a <code>firstrun.sh</code> file that tries to use <code>raspberrypi-sys-mods</code> and falls back to a manual configuration only if that executable is missing. That is likely to make migrations easier if the Raspberry Pi OS team should choose once again to modify the way that headless setups work.</p>
+<p class="fleuron"><a href="https://www.zansara.dev/posts/2024-05-06-teranoptia/">FŻ</a></p>
+
+
+
+ DataHour: Optimizing LLMs with Retrieval Augmented Generation and Haystack 2.0
+ https://www.zansara.dev/talks/2023-12-15-datahour-rag/
+ Fri, 15 Dec 2023 00:00:00 +0000
+
+ https://www.zansara.dev/talks/2023-12-15-datahour-rag/
+ <p><a href="https://drive.google.com/file/d/1OkFr4u9ZOraJRF406IQgQh4YC8GLHbzA/view?usp=drive_link" class="external-link" target="_blank" rel="noopener">Recording</a>, <a href="https://drive.google.com/file/d/1n1tbiUW2wZPGC49WK9pYEIZlZuCER-hu/view?usp=sharing" class="external-link" target="_blank" rel="noopener">slides</a>, <a href="https://drive.google.com/file/d/17FXuS7X70UF02IYmOr-yEDQYg_gp9cFv/view?usp=sharing" class="external-link" target="_blank" rel="noopener">Colab</a>, <a href="https://gist.github.com/ZanSara/6075d418c1494e780f7098db32bc6cf6" class="external-link" target="_blank" rel="noopener">gist</a>. All the material can also be found on <a href="https://community.analyticsvidhya.com/c/datahour/optimizing-llms-with-retrieval-augmented-generation-and-haystack-2-0" class="external-link" target="_blank" rel="noopener">Analytics Vidhya’s community</a> and on <a href="https://drive.google.com/drive/folders/1KwCEDTCsm9hrRaFUPHpzdTpVsOJSnvGk?usp=drive_link" class="external-link" target="_blank" rel="noopener">my backup</a>.</p>
+<hr>
+
+
+<div class='iframe-wrapper'>
+<iframe src="https://drive.google.com/file/d/1OkFr4u9ZOraJRF406IQgQh4YC8GLHbzA/preview" width=100% height=100% allow="autoplay"></iframe>
+</div>
+
+
+<p>In this hour-long workshop organized by <a href="https://www.analyticsvidhya.com/" class="external-link" target="_blank" rel="noopener">Analytics Vidhya</a> I give an overview of what RAG is, what problems it solves, and how it works.</p>
+<p>After a brief introduction to Haystack, I show in practice how to use Haystack 2.0 to assemble a Pipeline that performs RAG on a local database and then on the Web with a simple change.</p>
+<p>I also mention how to use and implement custom Haystack components, and share a lot of resources on the topic of RAG and Haystack 2.0.</p>
+<p>This was my most popular talk to date, with over a hundred attendees watching live and several questions.</p>
+<p>Other resources mentioned in the talk are:</p>
+<ul>
+<li><a href="https://haystack.deepset.ai/blog/customizing-rag-to-summarize-hacker-news-posts-with-haystack2?utm_campaign=developer-relations&utm_source=data-hour-event&utm_medium=webinar" class="external-link" target="_blank" rel="noopener">Blog post about custom components</a></li>
+<li><a href="https://haystack.deepset.ai/tutorials/28_structured_output_with_loop?utm_campaign=developer-relations&utm_source=data-hour-event&utm_medium=webinar" class="external-link" target="_blank" rel="noopener">LLM structured output example</a></li>
+<li><a href="https://haystack.deepset.ai/advent-of-haystack?utm_campaign=developer-relations&utm_source=data-hour-event&utm_medium=webinar" class="external-link" target="_blank" rel="noopener">Advent of Haystack</a></li>
+</ul>
+
+
+
+
+ Pointer[183]: Haystack, creare LLM Applications in modo facile
+ https://www.zansara.dev/talks/2023-12-15-pointerpodcast-haystack/
+ Fri, 15 Dec 2023 00:00:00 +0000
+
+ https://www.zansara.dev/talks/2023-12-15-pointerpodcast-haystack/
+ <p><a href="https://pointerpodcast.it/p/pointer183-haystack-creare-llm-applications-in-modo-facile-con-stefano-fiorucci-e-sara-zanzottera" class="external-link" target="_blank" rel="noopener">Episode link</a>. Backup recording <a href="https://drive.google.com/file/d/1BOoAhfvWou_J4J7RstgKAHPs3Pre2YAw/view?usp=sharing" class="external-link" target="_blank" rel="noopener">here</a>.</p>
+<hr>
+<p><em>The podcast was recorded in Italian for <a href="https://pointerpodcast.it" class="external-link" target="_blank" rel="noopener">PointerPodcast</a> with <a href="https://www.linkedin.com/in/luca-corbucci-b6156a123/" class="external-link" target="_blank" rel="noopener">Luca Corbucci</a>, <a href="https://www.linkedin.com/in/eugenio-paluello-851b3280/" class="external-link" target="_blank" rel="noopener">Eugenio Paluello</a> and <a href="https://www.linkedin.com/in/stefano-fiorucci/" class="external-link" target="_blank" rel="noopener">Stefano Fiorucci</a>.</em></p>
+<hr>
+
+
+<div>
+<audio controls src="https://hosting.pointerpodcast.it/records/pointer183.mp3" style="width: 100%"></audio>
+</div>
+
+
+<p>Per concludere in grande stile il 2023, in questa puntata ci occupiamo delle LLM che sono state un argomento centrale della scena tech dell’anno che sta per terminare. Abbiamo invitato due esperti del settore, Sara Zanzottera e Stefano Fiorucci.</p>
+<p>Entrambi i nostri ospiti lavorano per deepset come NLP Engineer. Deepset è l’azienda produttrice di Haystack uno dei framework opensource per LLM più noti, che ha da poco raggiunto la versione 2.0 beta. Proprio Haystack è stato uno degli argomenti di cui ci siamo occupati con i nostri ospiti, cercando di capirne le potenzialità.</p>
+<p>Ma è possibile riuscire a lavorare ad un framework di questo tipo rimanendo anche aggiornati sulle novità di un mondo in costante evoluzione? Questa è una delle tante domande a cui Sara e Stefano hanno risposto. Vi interessa il mondo delle LLM? Non perdetevi questa puntata!</p>
+
+
+
+
+ Brekeke
+ https://www.zansara.dev/projects/brekeke/
+ Fri, 01 Dec 2023 00:00:00 +0000
+
+ https://www.zansara.dev/projects/brekeke/
+ <p>With the rise of more and more powerful LLMs, I am experimenting with different ways to interact with them in ways that don’t necessarily involve a laptop, a keyboard or a screen.</p>
+<p>I codenamed all of these experiments “Brekeke”, the sound frogs make in Hungarian (don’t ask why). The focus of these experiments is mostly small home automation tasks and run on a swarm of Raspberry Pis.</p>
+
+
+
+
+ The World of Web RAG
+ https://www.zansara.dev/posts/2023-11-09-haystack-series-simple-web-rag/
+ Thu, 09 Nov 2023 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2023-11-09-haystack-series-simple-web-rag/
+ <p><em>Last updated: 18/01/2023</em></p>
+<hr>
+<p>In an earlier post of the Haystack 2.0 series, we’ve seen how to build RAG and indexing pipelines. An application that uses these two pipelines is practical if you have an extensive, private collection of documents and need to perform RAG on such data only. However, in many cases, you may want to get data from the Internet: from news outlets, documentation pages, and so on.</p>
+<p>In this post, we will see how to build a Web RAG application: a RAG pipeline that can search the Web for the information needed to answer your questions.</p>
+<div class="notice info">
+ <div class="notice-content">💡 <em>Do you want to see the code in action? Check out the <a href="https://colab.research.google.com/drive/1dGMPxReo730j7_zQDZOu-0SGf-pk4XDL?usp=sharing" class="external-link" target="_blank" rel="noopener">Colab notebook</a> or the <a href="https://gist.github.com/ZanSara/0907a8f3ae19f62998cc061ed6e8ce53" class="external-link" target="_blank" rel="noopener">gist</a>.</em></div>
+</div>
+
+<div class="notice warning">
+ <div class="notice-content"><i>⚠️ <strong>Warning:</strong></i> <em>This code was tested on <code>haystack-ai==2.0.0b5</code>. Haystack 2.0 is still unstable, so later versions might introduce breaking changes without notice until Haystack 2.0 is officially released. The concepts and components, however, stay the same.</em></div>
+</div>
+
+<h1 id="searching-the-web">
+ Searching the Web
+ <a class="heading-link" href="#searching-the-web">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>As we’ve seen <a href="https://www.zansara.dev/posts/2023-10-27-haystack-series-rag" >earlier</a>, a Haystack RAG Pipeline is made of three components: a Retriever, a PromptBuilder, and a Generator, and looks like this:</p>
+<p><img src="https://www.zansara.dev/posts/2023-11-09-haystack-series-simple-web-rag/bm25-rag-pipeline.png" alt="BM25 RAG Pipeline"></p>
+<p>To make this pipeline use the Web as its data source, we need to change the retriever with a component that does not look into a local document store for information but can search the web.</p>
+<p>Haystack 2.0 already provides a search engine component called <code>SerperDevWebSearch</code>. It uses <a href="https://serper.dev/" class="external-link" target="_blank" rel="noopener">SerperDev’s API</a> to query popular search engines and return two types of data: a list of text snippets coming from the search engine’s preview boxes and a list of links, which point to the top search results.</p>
+<p>To begin, let’s see how to use this component in isolation.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.websearch</span> <span style="color:#ff7b72">import</span> SerperDevWebSearch
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>question <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"What's the official language of the Republic of Rose Island?"</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>search <span style="color:#ff7b72;font-weight:bold">=</span> SerperDevWebSearch(api_key<span style="color:#ff7b72;font-weight:bold">=</span>serperdev_api_key)
+</span></span><span style="display:flex;"><span>results <span style="color:#ff7b72;font-weight:bold">=</span> search<span style="color:#ff7b72;font-weight:bold">.</span>run(query<span style="color:#ff7b72;font-weight:bold">=</span>question)
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># "documents": [</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Document(content='Esperanto', meta={'title': 'Republic of Rose Island - Wikipedia', 'link': 'https://en.wikipedia.org/wiki/Republic_of_Rose_Island'}),</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Document(content="The Republic of Rose Island was a short-lived micronation on a man-made platform in the Adriatic Sea. It's a story that few people knew of until recently, ...", meta={'title': 'Rose Island - The story of a micronation', 'link': 'https://www.rose-island.co/', 'imageUrl': 'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQiRCfTO6OwFS32SX37S-7OadDZCNK6Fy_NZVGsci2gcIS-zcinhOcGhgU&s', 'position': 1},</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ...</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ], </span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># "links": [</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'https://www.rose-island.co/',</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'https://www.defactoborders.org/places/rose-island',</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ...</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ]</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span></code></pre></div><p><code>SerperDevWebSearch</code> is a component with a simple interface. Starting from its output, we can see that it returns not one but two different values in the returned dictionary: <code>documents</code> and <code>links</code>.</p>
+<p><code>links</code> is the most straightforward and represents the top results that Google found relevant for the input query. It’s a list of strings, each containing a URL. You can configure the number of links to return with the <code>top_k</code> init parameter.</p>
+<p><code>documents</code> instead is a list of already fully formed Document objects. The content of these objects corresponds to the “answer boxes” that Google often returns together with its search results. Given that these code snippets are usually clean and short pieces of text, they’re perfect to be fed directly to an LLM without further processing.</p>
+<p>Other than expecting an API key as an init parameter and <code>top_k</code> to control the number of results, <code>SerperDevWebSearch</code> also accepts an <code>allowed_domains</code> parameter, which lets you configure the domains Google is allowed to look into during search, and <code>search_params</code>, a more generic dictionary input that lets you pass any additional search parameter SerperDev’s API understand.</p>
+<h1 id="a-minimal-web-rag-pipeline">
+ A Minimal Web RAG Pipeline
+ <a class="heading-link" href="#a-minimal-web-rag-pipeline">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p><code>SerperDevWebSearch</code> is actually the bare minimum we need to be able to build our very first Web RAG Pipeline. All we need to do is replace our original example’s Retriever with our search component.</p>
+<p>This is the result:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack</span> <span style="color:#ff7b72">import</span> Pipeline
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.builders</span> <span style="color:#ff7b72">import</span> PromptBuilder
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.generators</span> <span style="color:#ff7b72">import</span> OpenAIGenerator
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>template <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"""
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Question: {{ question }}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Google Search Answer Boxes:
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">{</span><span style="color:#a5d6ff">% f</span><span style="color:#a5d6ff">or document in documents %}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> {{ document.content }}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">{</span><span style="color:#a5d6ff">% e</span><span style="color:#a5d6ff">ndfor %}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Please reformulate the information above to
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">answer the user's question.
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">"""</span>
+</span></span><span style="display:flex;"><span>pipe <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"search"</span>, SerperDevWebSearch(api_key<span style="color:#ff7b72;font-weight:bold">=</span>serperdev_api_key))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"prompt_builder"</span>, PromptBuilder(template<span style="color:#ff7b72;font-weight:bold">=</span>template))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"llm"</span>, OpenAIGenerator(api_key<span style="color:#ff7b72;font-weight:bold">=</span>api_key))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"search.documents"</span>, <span style="color:#a5d6ff">"prompt_builder.documents"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"prompt_builder"</span>, <span style="color:#a5d6ff">"llm"</span>)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>question <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"What's the official language of the Republic of Rose Island?"</span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>run({
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"search"</span>: {<span style="color:#a5d6ff">"query"</span>: question},
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"prompt_builder"</span>: {<span style="color:#a5d6ff">"question"</span>: question}
+</span></span><span style="display:flex;"><span>})
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'llm': {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'replies': [</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># "The official language of the Republic of Rose Island is Esperanto. This artificial language was chosen by the residents of Rose Island as their national language when they declared independence in 1968. However, it's important to note that despite having their own language, government, currency, and postal service, Rose Island was never officially recognized as an independent nation by any country."</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ],</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'metadata': [...]</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span></code></pre></div><p><img src="https://www.zansara.dev/posts/2023-11-09-haystack-series-simple-web-rag/minimal-web-rag-pipeline.png" alt="Minimal Web RAG Pipeline"></p>
+<p>This solution is already quite effective for simple questions because Google does most of the heavy lifting of reading the content of the top results, extracting the relevant snippets, and packaging them up in a way that is really easy to access and understand by the model.</p>
+<p>However, there are situations in which this approach is not sufficient. For example, for highly technical or nuanced questions, the answer box does not provide enough context for the LLM to elaborate and grasp the entire scope of the discussion. In these situations, we may need to turn to the second output of <code>SerperDevWebSearch</code>: the links.</p>
+<h1 id="fetching-urls">
+ Fetching URLs
+ <a class="heading-link" href="#fetching-urls">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Haystack offers components to read the content of a URL: it’s <code>LinkContentFetcher</code>. Let’s see this component in action.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.fetchers</span> <span style="color:#ff7b72">import</span> LinkContentFetcher
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>fetcher <span style="color:#ff7b72;font-weight:bold">=</span> LinkContentFetcher()
+</span></span><span style="display:flex;"><span>fetcher<span style="color:#ff7b72;font-weight:bold">.</span>run(urls<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"https://en.wikipedia.org/wiki/Republic_of_Rose_Island"</span>])
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># "streams": [</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ByteStream(data=b"<DOCTYPE html>\n<...")</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ]</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span></code></pre></div><p>First, let’s notice that <code>LinkContentFetcher</code> outputs a list of <code>ByteStream</code> objects. <code>ByteStream</code> is a Haystack abstraction that makes handling binary streams and files equally easy. When a component produces <code>ByteStream</code> as output, you can directly pass these objects to a Converter component that can extract its textual content without saving such binary content to a file.</p>
+<p>These features come in handy to connect <code>LinkContentFetcher</code> to a component we’ve already met before: <code>HTMLToDocument</code>.</p>
+<h1 id="processing-the-page">
+ Processing the page
+ <a class="heading-link" href="#processing-the-page">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>In a <a href="https://www.zansara.dev/posts/2023-11-05-haystack-series-minimal-indexing" >previous post</a>, we’ve seen how Haystack can convert web pages into clean Documents ready to be stored in a Document Store. We will reuse many of the components we have discussed there, so if you missed it, make sure to check it out.</p>
+<p>From the pipeline in question, we’re interested in three of its components: <code>HTMLToDocument</code>, <code>DocumentCleaner</code>, and <code>DocumentSplitter</code>. Once the search component returns the links and <code>LinkContentFetcher</code> downloaded their content, we can connect it to <code>HTMLToDocument</code> to extract the text and <code>DocumentCleaner</code> and <code>DocumentSplitter</code> to clean and chunk the content, respectively. These documents then can go to the <code>PromptBuilder</code>, resulting in a pipeline such as this:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>template <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"""
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Question: {{ question }}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Context:
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">{</span><span style="color:#a5d6ff">% f</span><span style="color:#a5d6ff">or document in documents %}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> {{ document.content }}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">{</span><span style="color:#a5d6ff">% e</span><span style="color:#a5d6ff">ndfor %}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Please reformulate the information above to answer the user's question.
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">"""</span>
+</span></span><span style="display:flex;"><span>pipe <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"search"</span>, SerperDevWebSearch(api_key<span style="color:#ff7b72;font-weight:bold">=</span>serperdev_api_key))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"fetcher"</span>, LinkContentFetcher())
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"converter"</span>, HTMLToDocument())
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"cleaner"</span>, DocumentCleaner())
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"splitter"</span>, DocumentSplitter(split_by<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"sentence"</span>, split_length<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">3</span>))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"prompt_builder"</span>, PromptBuilder(template<span style="color:#ff7b72;font-weight:bold">=</span>template))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"llm"</span>, OpenAIGenerator(api_key<span style="color:#ff7b72;font-weight:bold">=</span>api_key))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"search.links"</span>, <span style="color:#a5d6ff">"fetcher"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"fetcher"</span>, <span style="color:#a5d6ff">"converter"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"converter"</span>, <span style="color:#a5d6ff">"cleaner"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"cleaner"</span>, <span style="color:#a5d6ff">"splitter"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"splitter"</span>, <span style="color:#a5d6ff">"prompt_builder.documents"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"prompt_builder"</span>, <span style="color:#a5d6ff">"llm"</span>)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>question <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"What's the official language of the Republic of Rose Island?"</span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>run({
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"search"</span>: {<span style="color:#a5d6ff">"query"</span>: question},
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"prompt_builder"</span>: {<span style="color:#a5d6ff">"question"</span>: question}
+</span></span><span style="display:flex;"><span>})
+</span></span></code></pre></div><p><img src="https://www.zansara.dev/posts/2023-11-09-haystack-series-simple-web-rag/incorrect-web-rag-pipeline.png" alt="Incorrect Web RAG Pipeline"></p>
+<p>However, running this pipeline results in a crash.</p>
+<pre tabindex="0"><code>PipelineRuntimeError: llm raised 'InvalidRequestError: This model's maximum context
+length is 4097 tokens. However, your messages resulted in 4911 tokens. Please reduce
+the length of the messages.'
+</code></pre><p>Reading the error message reveals the issue right away: the LLM received too much text. And that’s to be expected because we just passed the entire content of several web pages to it.</p>
+<p>We need to find a way to filter only the most relevant documents from the long list that is generated by <code>DocumentSplitter</code>.</p>
+<h1 id="ranking-documents-on-the-fly">
+ Ranking Documents on the fly
+ <a class="heading-link" href="#ranking-documents-on-the-fly">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Retrievers are optimized to use the efficient retrieval engines of document stores to sift quickly through vast collections of Documents. However, Haystack also provides smaller, standalone components that work very well on shorter lists and don’t require a full-blown vector database engine to function.</p>
+<p>These components are called rankers. One example of such a component is <code>TransformersSimilarityRanker</code>: a ranker that uses a model from the <code>transformers</code> library to rank Documents by their similarity to a given query.</p>
+<p>Let’s see how it works:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.rankers</span> <span style="color:#ff7b72">import</span> TransformersSimilarityRanker
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>ranker <span style="color:#ff7b72;font-weight:bold">=</span> TransformersSimilarityRanker()
+</span></span><span style="display:flex;"><span>ranker<span style="color:#ff7b72;font-weight:bold">.</span>warm_up()
+</span></span><span style="display:flex;"><span>ranker<span style="color:#ff7b72;font-weight:bold">.</span>run(
+</span></span><span style="display:flex;"><span> query<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"What's the official language of the Republic of Rose Island?"</span>,
+</span></span><span style="display:flex;"><span> documents<span style="color:#ff7b72;font-weight:bold">=</span>documents,
+</span></span><span style="display:flex;"><span> top_k<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">1</span>
+</span></span><span style="display:flex;"><span> )
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'documents': [</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Document(content="Island under construction\nRepublic of Rose Island\nThe Republic of Rose Island ( Esperanto : Respubliko de la Insulo de la Rozoj; Italian : Repubblica dell'Isola delle Rose) was a short-lived micronation on a man-made platform in the Adriatic Sea , 11 kilometres (6.8\xa0mi) off the coast of the province of Rimini , Italy, built by Italian engineer Giorgio Rosa, who made himself its president and declared it an independent state on 1 May 1968. [1] [2] Rose Island had its own government, currency, post office, and commercial establishments, and the official language was Esperanto .", meta={'source_id': '03bfe5f7b7a7ec623e854d2bc5eb36ba3cdf06e1e2771b3a529eeb7e669431b6'}, score=7.594357490539551)</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ]</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span></code></pre></div><p>This component has a feature we haven’t encountered before: the <code>warm_up()</code> method.</p>
+<p>Components that need to initialize heavy resources, such as a language model, always perform this operation after initializing them in the <code>warm_up()</code> method. When they are used in a Pipeline, <code>Pipeline.run()</code> takes care of calling <code>warm_up()</code> on all components before running; when used standalone, users need to call <code>warm_up()</code> explicitly to prepare the object to run.</p>
+<p><code>TransformersSimilarityRanker</code> accepts a few parameters. When initialized, it accepts a <code>model_name_or_path</code> with the HuggingFace ID of the model to use for ranking: this value defaults to <code>cross-encoder/ms-marco-MiniLM-L-6-v2</code>. It also takes <code>token</code>, to allow users to download private models from the Models Hub, <code>device</code>, to let them leverage PyTorch’s ability to select the hardware to run on, and <code>top_k</code>, the maximum number of documents to return. <code>top_k</code>, as we see above, can also be passed to <code>run()</code>, and the latter overcomes the former if both are set. This value defaults to 10.</p>
+<p>Let’s also put this component in the pipeline: its place is between the splitter and the prompt builder.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>template <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"""
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Question: {{ question }}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Context:
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">{</span><span style="color:#a5d6ff">% f</span><span style="color:#a5d6ff">or document in documents %}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> {{ document.content }}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">{</span><span style="color:#a5d6ff">% e</span><span style="color:#a5d6ff">ndfor %}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Please reformulate the information above to answer the user's question.
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">"""</span>
+</span></span><span style="display:flex;"><span>pipe <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"search"</span>, SerperDevWebSearch(api_key<span style="color:#ff7b72;font-weight:bold">=</span>serperdev_api_key))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"fetcher"</span>, LinkContentFetcher())
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"converter"</span>, HTMLToDocument())
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"cleaner"</span>, DocumentCleaner())
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"splitter"</span>, DocumentSplitter(split_by<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"sentence"</span>, split_length<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">3</span>))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"ranker"</span>, TransformersSimilarityRanker())
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"prompt_builder"</span>, PromptBuilder(template<span style="color:#ff7b72;font-weight:bold">=</span>template))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"llm"</span>, OpenAIGenerator(api_key<span style="color:#ff7b72;font-weight:bold">=</span>api_key))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"search.links"</span>, <span style="color:#a5d6ff">"fetcher"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"fetcher"</span>, <span style="color:#a5d6ff">"converter"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"converter"</span>, <span style="color:#a5d6ff">"cleaner"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"cleaner"</span>, <span style="color:#a5d6ff">"splitter"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"splitter"</span>, <span style="color:#a5d6ff">"ranker"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"ranker"</span>, <span style="color:#a5d6ff">"prompt_builder.documents"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"prompt_builder"</span>, <span style="color:#a5d6ff">"llm"</span>)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>question <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"What's the official language of the Republic of Rose Island?"</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>run({
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"search"</span>: {<span style="color:#a5d6ff">"query"</span>: question},
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"ranker"</span>: {<span style="color:#a5d6ff">"query"</span>: question},
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"prompt_builder"</span>: {<span style="color:#a5d6ff">"question"</span>: question}
+</span></span><span style="display:flex;"><span>})
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'llm': {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'replies': [</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'The official language of the Republic of Rose Island was Esperanto.'</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ],</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'metadata': [...]</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span></code></pre></div><p><img src="https://www.zansara.dev/posts/2023-11-09-haystack-series-simple-web-rag/unfiltered-web-rag-pipeline.png" alt="Unfiltered Web RAG Pipeline"></p>
+<p>Note how the ranker needs to know the question to compare the documents, just like the search and prompt builder components do. So, we need to pass the value to the pipeline’s <code>run()</code> call.</p>
+<h1 id="filtering-file-types">
+ Filtering file types
+ <a class="heading-link" href="#filtering-file-types">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>The pipeline we just built works great in most cases. However, it may occasionally fail if the search component happens to return some URL that does not point to a web page but, for example, directly to a video, a PDF, or a PPTX.</p>
+<p>Haystack does offer some facilities to deal with these file types, but we will see these converters in another post. For now, let’s only filter those links out to prevent <code>HTMLToDocument</code> from crashing.</p>
+<p>This task could be approached with Haystack in several ways, but the simplest in this scenario is to use a component that would typically be used for a slightly different purpose. This component is called <code>FileTypeRouter</code>.</p>
+<p><code>FileTypeRouter</code> is designed to route different files to their appropriate converters by checking their mime type. It does so by inspecting the content or the extension of the files it receives in input and producing an output dictionary with a separate list for each identified type.</p>
+<p>However, we can also conveniently use this component as a filter. Let’s see how!</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.routers</span> <span style="color:#ff7b72">import</span> FileTypeRouter
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>router <span style="color:#ff7b72;font-weight:bold">=</span> FileTypeRouter(mime_types<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"text/html"</span>])
+</span></span><span style="display:flex;"><span>router<span style="color:#ff7b72;font-weight:bold">.</span>run(sources<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Republic_of_Rose_Island.txt"</span>, <span style="color:#a5d6ff">"Republic_of_Rose_Island.html"</span>])
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns defaultdict(list,</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># {'unclassified': [PosixPath('Republic_of_Rose_Island.txt')],</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'text/html': [PosixPath('Republic_of_Rose_Island.html')]})</span>
+</span></span></code></pre></div><p><code>FileTypeRouter</code> must always be initialized with the list of mime types it is supposed to handle. Not only that, but this component can also deal with files that do not match any of the expected mime types by putting them all under the <code>unclassified</code> category.</p>
+<p>By putting this component between <code>LinkContentFetcher</code> and <code>HTMLToDocument</code>, we can make it forward along the pipeline only the files that match the <code>text/html</code> mime type and silently discard all others.</p>
+<p>Notice how, in the pipeline below, I explicitly connect the <code>text/html</code> output only:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>template <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"""
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Question: {{ question }}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Google Search Answer Boxes:
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">{</span><span style="color:#a5d6ff">% f</span><span style="color:#a5d6ff">or document in documents %}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> {{ document.content }}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">{</span><span style="color:#a5d6ff">% e</span><span style="color:#a5d6ff">ndfor %}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Please reformulate the information above to answer the user's question.
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">"""</span>
+</span></span><span style="display:flex;"><span>pipe <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"search"</span>, SerperDevWebSearch(api_key<span style="color:#ff7b72;font-weight:bold">=</span>serperdev_api_key))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"fetcher"</span>, LinkContentFetcher())
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"filter"</span>, FileTypeRouter(mime_types<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"text/html"</span>]))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"converter"</span>, HTMLToDocument())
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"cleaner"</span>, DocumentCleaner())
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"splitter"</span>, DocumentSplitter(split_by<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"sentence"</span>, split_length<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">3</span>))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"ranker"</span>, TransformersSimilarityRanker())
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"prompt_builder"</span>, PromptBuilder(template<span style="color:#ff7b72;font-weight:bold">=</span>template))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"llm"</span>, OpenAIGenerator(api_key<span style="color:#ff7b72;font-weight:bold">=</span>api_key))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"search.links"</span>, <span style="color:#a5d6ff">"fetcher"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"fetcher"</span>, <span style="color:#a5d6ff">"filter"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"filter.text/html"</span>, <span style="color:#a5d6ff">"converter"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"converter"</span>, <span style="color:#a5d6ff">"cleaner"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"cleaner"</span>, <span style="color:#a5d6ff">"splitter"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"splitter"</span>, <span style="color:#a5d6ff">"ranker"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"ranker"</span>, <span style="color:#a5d6ff">"prompt_builder.documents"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"prompt_builder"</span>, <span style="color:#a5d6ff">"llm"</span>)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>question <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"What's the official language of the Republic of Rose Island?"</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>run({
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"search"</span>: {<span style="color:#a5d6ff">"query"</span>: question},
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"ranker"</span>: {<span style="color:#a5d6ff">"query"</span>: question},
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"prompt_builder"</span>: {<span style="color:#a5d6ff">"question"</span>: question}
+</span></span><span style="display:flex;"><span>})
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'llm': {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'replies': [</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'The official language of the Republic of Rose Island was Esperanto.'</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ],</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'metadata': [...]</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span></code></pre></div><p><img src="https://www.zansara.dev/posts/2023-11-09-haystack-series-simple-web-rag/html-web-rag-pipeline.png" alt="HTML-only Web RAG Pipeline"></p>
+<p>With this last addition, we added quite a bit of robustness to our pipeline, making it less likely to fail.</p>
+<h1 id="wrapping-up">
+ Wrapping up
+ <a class="heading-link" href="#wrapping-up">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Web RAG is a use case that can be expanded to cover many use cases, resulting in very complex pipelines. Haystack helps make sense of their complexity by pipeline graphs and detailed error messages in case of mismatch connections. However, pipelines this large can become overwhelming, especially when more branches are added.</p>
+<p>In one of our next posts, we will see how to cover such use cases while keeping the resulting complexity as low as possible.</p>
+<hr>
+<p><em>Previous: <a href="https://www.zansara.dev/posts/2023-11-05-haystack-series-minimal-indexing" >Indexing data for RAG applications</a></em></p>
+<p><em>See the entire series here: <a href="https://www.zansara.dev/series/haystack-2.0-series/" >Haystack 2.0 series</a></em></p>
+<p><small><em>Cover image from <a href="https://commons.wikimedia.org/wiki/File:Isola_delle_Rose_1968.jpg" class="external-link" target="_blank" rel="noopener">Wikipedia</a></em></small></p>
+
+
+
+
+ Indexing data for RAG applications
+ https://www.zansara.dev/posts/2023-11-05-haystack-series-minimal-indexing/
+ Sun, 05 Nov 2023 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2023-11-05-haystack-series-minimal-indexing/
+ <p><em>Last updated: 18/01/2023</em></p>
+<hr>
+<p>In the <a href="https://www.zansara.dev/posts/2023-10-27-haystack-series-rag" >previous post</a> of the Haystack 2.0 series, we saw how to build RAG pipelines using a generator, a prompt builder, and a retriever with its document store. However, the content of our document store wasn’t extensive, and populating one with clean, properly formatted data is not an easy task. How can we approach this problem?</p>
+<p>In this post, I will show you how to use Haystack 2.0 to create large amounts of documents from a few web pages and write them a document store that you can then use for retrieval.</p>
+<div class="notice info">
+ <div class="notice-content">💡 <em>Do you want to see the code in action? Check out the <a href="https://colab.research.google.com/drive/155CtcumiK5w3wX6FWyM1dG3OqnhwnCqy?usp=sharing" class="external-link" target="_blank" rel="noopener">Colab notebook</a> or the <a href="https://gist.github.com/ZanSara/ba7efd241c61ccfd12ed48195e23bb34" class="external-link" target="_blank" rel="noopener">gist</a>.</em></div>
+</div>
+
+<div class="notice warning">
+ <div class="notice-content"><i>⚠️ <strong>Warning:</strong></i> <em>This code was tested on <code>haystack-ai==2.0.0b5</code>. Haystack 2.0 is still unstable, so later versions might introduce breaking changes without notice until Haystack 2.0 is officially released. The concepts and components, however, stay the same.</em></div>
+</div>
+
+<h1 id="the-task">
+ The task
+ <a class="heading-link" href="#the-task">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>In Haystack’s terminology, the process of extracting information from a group of files and storing the data in a document store is called “indexing”. The process includes, at the very minimum, reading the content of a file, generating a Document object containing all its text, and then storing it in a document store.</p>
+<p>However, indexing pipelines often do more than this. They can process more than one file type, like .txt, .pdf, .docx, .html, audio, video, and images. Having many file types to convert, they route each file to the proper converter based on its type. Files tend to contain way more text than a normal LLM can chew, so they need to split those huge Documents into smaller chunks. Also, the converters are not perfect at reading text from the files, so they need to clean the data from artifacts such as page numbers, headers, footers, and so on. On top of all of this, if you plan to use a retriever that is based on embedding similarity, your indexing pipeline will also need to embed all documents before writing them into the store.</p>
+<p>Sounds like a lot of work!</p>
+<p>In this post, we will focus on the preprocessing part of the pipeline: cleaning, splitting, and writing documents. I will talk about the other functionalities of indexing pipelines, such as document embedding and multiple file types routing, in later posts.</p>
+<h1 id="converting-files">
+ Converting files
+ <a class="heading-link" href="#converting-files">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>As we’ve just seen, the most important task of this pipeline is to convert files into Documents. Haystack provides several converters for this task: at the time of writing, it supports:</p>
+<ul>
+<li>Raw text files (<code>TextFileToDocument</code>)</li>
+<li>HTML files, so web pages in general (<code>HTMLToDocument</code>)</li>
+<li>PDF files, by extracting text natively (<code>PyPDFToDocument</code>)</li>
+<li>Image files, PDFs with images, and Office files with images, by OCR (<code>AzureOCRDocumentConverter</code>)</li>
+<li>Audio files, doing transcription with Whisper either locally (<code>LocalWhisperTranscriber</code>) or remotely using OpenAI’s hosted models (<code>RemoteWhisperTranscriber</code>)</li>
+<li>A ton of <a href="https://tika.apache.org/2.9.1/formats.html" class="external-link" target="_blank" rel="noopener">other formats</a>, such as Microsoft’s Office formats, thanks to <a href="https://tika.apache.org/" class="external-link" target="_blank" rel="noopener">Apache Tika</a> (<code>TikaDocumentConverter</code>)</li>
+</ul>
+<p>For this example, let’s assume we have a collection of web pages downloaded from the Internet. These pages are our only source of information and contain all we want our RAG application to know about.</p>
+<p>In this case, our converter of choice is <code>HTMLToDocument</code>. <code>HTMLToDocument</code> is a Haystack component that understands HTML and can filter all the markup away, leaving only meaningful text. Remember that this is a file converter, not a URL fetcher: it can only process local files, such as a website crawl. Haystack provides some components to fetch web pages, but we will see them later.</p>
+<p>Here is how you can use this converter:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.converters</span> <span style="color:#ff7b72">import</span> HTMLToDocument
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>path <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"Republic_of_Rose_Island.html"</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>converter <span style="color:#ff7b72;font-weight:bold">=</span> HTMLToDocument()
+</span></span><span style="display:flex;"><span>converter<span style="color:#ff7b72;font-weight:bold">.</span>run(sources<span style="color:#ff7b72;font-weight:bold">=</span>[path])
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {"documents": [Document(content="The Republic of Rose Isla...")]}</span>
+</span></span></code></pre></div><p><code>HTMLToDocument</code> is a straightforward component that offers close to no parameters to customize its behavior. Of its API, one notable feature is its input type: this converter can take paths to local files in the form of strings or <code>Path</code> objects, but it also accepts <code>ByteStream</code> objects.</p>
+<p><code>ByteStream</code> is a handy Haystack abstraction that makes handling binary streams easier. If a component accepts <code>ByteStream</code> as input, you don’t necessarily have to save your web pages to file before passing them to this converter. This allows components that retrieve large files from the Internet to pipe their output directly into this component without saving the data to disk first, which can save a lot of time.</p>
+<h1 id="cleaning-the-text">
+ Cleaning the text
+ <a class="heading-link" href="#cleaning-the-text">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>With <code>HTMLToDocument</code>, we can convert whole web pages into large Document objects. The converter typically does a decent job of filtering out the markup. Still, it’s not always perfect. To compensate for these occasional issues, Haystack offers a component called <code>DocumentCleaner</code> that can remove noise from the text of the documents.</p>
+<p>Just like any other component, <code>DocumentCleaner</code> is straightforward to use:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.preprocessors</span> <span style="color:#ff7b72">import</span> DocumentCleaner
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>cleaner <span style="color:#ff7b72;font-weight:bold">=</span> DocumentCleaner()
+</span></span><span style="display:flex;"><span>cleaner<span style="color:#ff7b72;font-weight:bold">.</span>run(documents<span style="color:#ff7b72;font-weight:bold">=</span>documents)
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {"documents": [Document(content=...), Document(content=...), ...]}</span>
+</span></span></code></pre></div><p>The effectiveness of <code>DocumentCleaner</code> depends a lot on the type of converter you use. Some flags, such as <code>remove_empty_lines</code> and <code>remove_extra_whitespace</code>, are minor fixes that can come in handy but usually have little impact on the quality of the results when used in a RAG pipeline. They can, however, make a vast difference for Extractive QA pipelines.</p>
+<p>Other parameters, like <code>remove_substrings</code> or <code>remove_regex</code>, work very well but need manual inspection and iteration from a human to get right. For example, for Wikipedia pages, we could use these parameters to remove all instances of the word <code>"Wikipedia"</code>, which are undoubtedly many and irrelevant.</p>
+<p>Finally, <code>remove_repeated_substrings</code> is a convenient method that removes headers and footers from long text, for example, books and articles. However, it works only for PDFs and, to a limited degree, for text files because it relies on the presence of form feed characters (<code>\f</code>), which are rarely present in web pages.</p>
+<h1 id="splitting-the-text">
+ Splitting the text
+ <a class="heading-link" href="#splitting-the-text">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Now that the text is cleaned up, we can move onto a more exciting step: text splitting.</p>
+<p>So far, each Document stored the content of an entire file. If a file was a whole book with hundreds of pages, a single Document would contain hundreds of thousands of words, which is clearly too much for an LLM to make sense of. Such a large Document is also challenging for Retrievers to understand because it contains so much text that it looks relevant to every possible question. To populate our document store with data that can be used effectively by a RAG pipeline, we need to chunk this data into much smaller Documents.</p>
+<p>That’s where <code>DocumentSplitter</code> comes into play.</p>
+<div class="notice info">
+ <div class="notice-content"><p>💡 <em>With LLMs in a race to offer the <a href="https://magic.dev/blog/ltm-1" class="external-link" target="_blank" rel="noopener">largest context window</a> and research showing that such a chase is <a href="https://arxiv.org/abs/2307.03172" class="external-link" target="_blank" rel="noopener">counterproductive</a>, there is no general consensus about how splitting Documents for RAG impacts the LLM’s performance.</em></p>
+<p><em>What you need to keep in mind is that splitting implies a tradeoff. Huge documents will always be slightly relevant for every question, but they will bring a lot of context, which may or may not confuse the model. On the other hand, tiny Documents are much more likely to be retrieved only for questions they’re highly relevant for, but they might provide too little context for the LLM to really understand their meaning.</em></p>
+<p><em>Tweaking the size of your Documents for the specific LLM you’re using and the topic of your documents is one way to optimize your RAG pipeline, so be ready to experiment with different Document sizes before committing to one.</em></p></div>
+</div>
+
+<p>How is it used?</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.preprocessors.text_document_splitter</span> <span style="color:#ff7b72">import</span> DocumentSplitter
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>text_splitter <span style="color:#ff7b72;font-weight:bold">=</span> DocumentSplitter(split_by<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"sentence"</span>, split_length<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">5</span>)
+</span></span><span style="display:flex;"><span>text_splitter<span style="color:#ff7b72;font-weight:bold">.</span>run(documents<span style="color:#ff7b72;font-weight:bold">=</span>documents)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {"documents": [Document(content=...), Document(content=...), ...]}</span>
+</span></span></code></pre></div><p><code>DocumentSplitter</code> lets you configure the approximate size of the chunks you want to generate with three parameters: <code>split_by</code>, <code>split_length</code>, and <code>split_overlap</code>.</p>
+<p><code>split_by</code> defines the unit to use when splitting some text. For now, the options are <code>word</code>, <code>sentence</code>, and <code>passage</code> (paragraph), but we will soon add other options.</p>
+<p><code>split_length</code> is the number of the units defined above each document should include. For example, if the unit is <code>sentence</code>, <code>split_length=10</code> means that all your Documents will contain 10 sentences worth of text (except usually for the last document, which may have less). If the unit was <code>word</code>, it would instead contain 10 words.</p>
+<p><code>split_overlap</code> is the amount of units that should be included from the previous Document. For example, if the unit is <code>sentence</code> and the length is <code>10</code>, setting <code>split_overlap=2</code> means that the last two sentences of the first document will also be present at the start of the second, which will include only 8 new sentences for a total of 10. Such repetition carries over to the end of the text to split.</p>
+<h1 id="writing-to-the-store">
+ Writing to the store
+ <a class="heading-link" href="#writing-to-the-store">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Once all of this is done, we can finally move on to the last step of our journey: writing the Documents into our document store. We first create the document store:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.document_stores.in_memory</span> <span style="color:#ff7b72">import</span> InMemoryDocumentStore
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>document_store <span style="color:#ff7b72;font-weight:bold">=</span> InMemoryDocumentStore()
+</span></span></code></pre></div><p>and then use <code>DocumentWriter</code> to actually write the documents in:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.writers</span> <span style="color:#ff7b72">import</span> DocumentWriter
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>writer <span style="color:#ff7b72;font-weight:bold">=</span> DocumentWriter(document_store<span style="color:#ff7b72;font-weight:bold">=</span>document_store)
+</span></span><span style="display:flex;"><span>writer<span style="color:#ff7b72;font-weight:bold">.</span>run(documents<span style="color:#ff7b72;font-weight:bold">=</span>documents_with_embeddings)
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {"documents_written": 120}</span>
+</span></span></code></pre></div><p>If you’ve read my <a href="https://www.zansara.dev/posts/2023-10-27-haystack-series-rag" >previous post</a> about RAG pipelines, you may wonder: why use <code>DocumentWriter</code> when we could call the <code>.write_documents()</code> method of our document store?</p>
+<p>In fact, the two methods are fully equivalent: <code>DocumentWriter</code> does nothing more than calling the <code>.write_documents()</code> method of the document store. The difference is that <code>DocumentWriter</code> is the way to go if you are using a Pipeline, which is what we’re going to do next.</p>
+<h1 id="putting-it-all-together">
+ Putting it all together
+ <a class="heading-link" href="#putting-it-all-together">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>We finally have all the components we need to go from a list of web pages to a document store populated with clean and short Document objects. Let’s build a Pipeline to sum up this process:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack</span> <span style="color:#ff7b72">import</span> Pipeline
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>document_store <span style="color:#ff7b72;font-weight:bold">=</span> InMemoryDocumentStore()
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipeline <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"converter"</span>, HTMLToDocument())
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"cleaner"</span>, DocumentCleaner())
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"splitter"</span>, DocumentSplitter(split_by<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"sentence"</span>, split_length<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">5</span>))
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"writer"</span>, DocumentWriter(document_store<span style="color:#ff7b72;font-weight:bold">=</span>document_store))
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"converter"</span>, <span style="color:#a5d6ff">"cleaner"</span>)
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"cleaner"</span>, <span style="color:#a5d6ff">"splitter"</span>)
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"splitter"</span>, <span style="color:#a5d6ff">"writer"</span>)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>draw(<span style="color:#a5d6ff">"simple-indexing-pipeline.png"</span>)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>run({<span style="color:#a5d6ff">"converter"</span>: {<span style="color:#a5d6ff">"sources"</span>: file_names}})
+</span></span></code></pre></div><p><img src="https://www.zansara.dev/posts/2023-11-05-haystack-series-minimal-indexing/simple-indexing-pipeline.png" alt="Indexing Pipeline"></p>
+<p>That’s it! We now have a fully functional indexing pipeline that can take a list of web pages and convert them into Documents that our RAG pipeline can use. As long as the RAG pipeline reads from the same store we are writing the Documents to, we can add as many Documents as we need to keep the chatbot’s answers up to date without having to touch the RAG pipeline.</p>
+<p>To try it out, we only need to take the RAG pipeline we built in <a href="https://www.zansara.dev/posts/2023-10-27-haystack-series-rag" >my previous post</a> and connect it to the same document store we just populated:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.generators</span> <span style="color:#ff7b72">import</span> OpenAIGenerator
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.builders.prompt_builder</span> <span style="color:#ff7b72">import</span> PromptBuilder
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.retrievers.in_memory</span> <span style="color:#ff7b72">import</span> InMemoryBM25Retriever
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>template <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"""
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Given the following information, answer the question: {{ question }}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">{</span><span style="color:#a5d6ff">% f</span><span style="color:#a5d6ff">or document in documents %}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> {{ document.content }}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">{</span><span style="color:#a5d6ff">% e</span><span style="color:#a5d6ff">ndfor %}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">"""</span>
+</span></span><span style="display:flex;"><span>pipe <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"retriever"</span>, InMemoryBM25Retriever(document_store<span style="color:#ff7b72;font-weight:bold">=</span>document_store))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"prompt_builder"</span>, PromptBuilder(template<span style="color:#ff7b72;font-weight:bold">=</span>template))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"llm"</span>, OpenAIGenerator(api_key<span style="color:#ff7b72;font-weight:bold">=</span>api_key))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"retriever"</span>, <span style="color:#a5d6ff">"prompt_builder.documents"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"prompt_builder"</span>, <span style="color:#a5d6ff">"llm"</span>)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>question <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"Is there any documentary about the story of Rose Island? Can you tell me something about that?"</span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>run({
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"retriever"</span>: {<span style="color:#a5d6ff">"query"</span>: question},
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"prompt_builder"</span>: {<span style="color:#a5d6ff">"question"</span>: question}
+</span></span><span style="display:flex;"><span>})
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'llm': {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'replies': [</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'Yes, there is a documentary about the story of Rose Island. It is </span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># called "Rose Island" and was released on Netflix on 8 December 2020. </span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># The documentary follows the true story of Giorgio Rosa, an Italian </span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># engineer who built his own island in the Adriatic sea in the late </span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 1960s. The island housed a restaurant, bar, souvenir shop, and even </span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># a post office. Rosa\'s goal was to have his self-made structure </span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># recognized as an independent state, leading to a battle with the </span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Italian authorities. The film depicts the construction of the island </span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># and Rosa\'s refusal to dismantle it despite government demands. The </span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># story of Rose Island was relatively unknown until the release of the </span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># documentary. The film showcases the technology Rosa invented to build </span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># the island and explores themes of freedom and resilience.'</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ],</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'metadata': [...]</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span></code></pre></div><p>And suddenly, our chatbot knows everything about Rose Island without us having to feed the data to the document store by hand.</p>
+<h1 id="wrapping-up">
+ Wrapping up
+ <a class="heading-link" href="#wrapping-up">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h1>
+<p>Indexing pipelines can be powerful tools, even in their simplest form, like the one we just built. However, it doesn’t end here: Haystack offers many more facilities to extend what’s possible with indexing pipelines, like doing web searches, downloading files from the web, processing many other file types, and so on.</p>
+<p>We will see how soon, so stay tuned!</p>
+<hr>
+<p><em>Next: <a href="https://www.zansara.dev/posts/2023-11-09-haystack-series-simple-web-rag" >The World of Web RAG</a></em></p>
+<p><em>Previous: <a href="https://www.zansara.dev/posts/2023-10-27-haystack-series-rag" >RAG Pipelines from scratch</a></em></p>
+<p><em>See the entire series here: <a href="https://www.zansara.dev/series/haystack-2.0-series/" >Haystack 2.0 series</a></em></p>
+<p><small><em>Cover image from <a href="https://bertolamifineart.bidinside.com/en/lot/126352/1968-insula-de-la-rozoj-o-isola-delle-/" class="external-link" target="_blank" rel="noopener">this website.</a></em></small></p>
+
+
+
+
+ RAG Pipelines from scratch
+ https://www.zansara.dev/posts/2023-10-27-haystack-series-rag/
+ Fri, 27 Oct 2023 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2023-10-27-haystack-series-rag/
+ <p><em>Last updated: 18/01/2023 - Read it on the <a href="https://haystack.deepset.ai/blog/rag-pipelines-from-scratch" class="external-link" target="_blank" rel="noopener">Haystack Blog</a>.</em></p>
+<hr>
+<p>Retrieval Augmented Generation (RAG) is quickly becoming an essential technique to make LLMs more reliable and effective at answering any question, regardless of how specific. To stay relevant in today’s NLP landscape, Haystack must enable it.</p>
+<p>Let’s see how to build such applications with Haystack 2.0, from a direct call to an LLM to a fully-fledged, production-ready RAG pipeline that scales. At the end of this post, we will have an application that can answer questions about world countries based on data stored in a private database. At that point, the knowledge of the LLM will be only limited by the content of our data store, and all of this can be accomplished without fine-tuning language models.</p>
+<div class="notice info">
+ <div class="notice-content">💡 <em>I recently gave a talk about RAG applications in Haystack 2.0, so if you prefer videos to blog posts, you can find the recording <a href="https://zansara.dev/talks/2023-10-12-office-hours-rag-pipelines/" class="external-link" target="_blank" rel="noopener">here</a>. Keep in mind that the code shown might be outdated.</em></div>
+</div>
+
+<h2 id="what-is-rag">
+ What is RAG?
+ <a class="heading-link" href="#what-is-rag">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>The idea of Retrieval Augmented Generation was first defined in a <a href="https://arxiv.org/abs/2005.11401" class="external-link" target="_blank" rel="noopener">paper</a> by Meta in 2020. It was designed to solve a few of the inherent limitations of seq2seq models (language models that, given a sentence, can finish writing it for you), such as:</p>
+<ul>
+<li>Their internal knowledge, as vast as it may be, will always be limited and at least slightly out of date.</li>
+<li>They work best on generic topics rather than niche and specific areas unless they’re fine-tuned on purpose, which is a costly and slow process.</li>
+<li>All models, even those with subject-matter expertise, tend to “hallucinate”: they confidently produce false statements backed by apparently solid reasoning.</li>
+<li>They cannot reliably cite their sources or tell where their knowledge comes from, which makes fact-checking their replies nontrivial.</li>
+</ul>
+<p>RAG solves these issues of “grounding” the LLM to reality by providing some relevant, up-to-date, and trusted information to the model together with the question. In this way, the LLM doesn’t need to draw information from its internal knowledge, but it can base its replies on the snippets provided by the user.</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-27-haystack-series-rag/rag-paper-image.png" alt="RAG Paper diagram" title="A visual representation of RAG from the original paper"></p>
+<p>As you can see in the image above (taken directly from the original paper), a system such as RAG is made of two parts: one that finds text snippets that are relevant to the question asked by the user and a generative model, usually an LLM, that rephrases the snippets into a coherent answer for the question.</p>
+<p>Let’s build one of these with Haystack 2.0!</p>
+<div class="notice info">
+ <div class="notice-content">💡 <em>Do you want to see this code in action? Check out the Colab notebook <a href="https://colab.research.google.com/drive/1FkDNS3hTO4oPXHFbXQcldls0kf-KTq-r?usp=sharing" class="external-link" target="_blank" rel="noopener">here</a> or the gist <a href="https://gist.github.com/ZanSara/0af1c2ac6c71d0a723c179cc6ec1ac41" class="external-link" target="_blank" rel="noopener">here</a></em>.</div>
+</div>
+
+<div class="notice warning">
+ <div class="notice-content">⚠️ <strong>Warning:</strong> <em>This code was tested on <code>haystack-ai==2.0.0b5</code>. Haystack 2.0 is still unstable, so later versions might introduce breaking changes without notice until Haystack 2.0 is officially released. The concepts and components however stay the same.</em></div>
+</div>
+
+<h2 id="generators-haystacks-llm-components">
+ Generators: Haystack’s LLM components
+ <a class="heading-link" href="#generators-haystacks-llm-components">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>As every NLP framework that deserves its name, Haystack supports LLMs in different ways. The easiest way to query an LLM in Haystack 2.0 is through a Generator component: depending on which LLM and how you intend to query it (chat, text completion, etc…), you should pick the appropriate class.</p>
+<p>We’re going to use <code>gpt-3.5-turbo</code> (the model behind ChatGPT) for these examples, so the component we need is <a href="https://docs.haystack.deepset.ai/v2.0/docs/openaigenerator" class="external-link" target="_blank" rel="noopener"><code>OpenAIGenerator</code></a>. Here is all the code required to use it to query OpenAI’s <code>gpt-3.5-turbo</code> :</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.generators</span> <span style="color:#ff7b72">import</span> OpenAIGenerator
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>generator <span style="color:#ff7b72;font-weight:bold">=</span> OpenAIGenerator(api_key<span style="color:#ff7b72;font-weight:bold">=</span>api_key)
+</span></span><span style="display:flex;"><span>generator<span style="color:#ff7b72;font-weight:bold">.</span>run(prompt<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"What's the official language of France?"</span>)
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {"replies": ['The official language of France is French.']}</span>
+</span></span></code></pre></div><p>You can select your favorite OpenAI model by specifying a <code>model_name</code> at initialization, for example, <code>gpt-4</code>. It also supports setting an <code>api_base_url</code> for private deployments, a <code>streaming_callback</code> if you want to see the output generated live in the terminal, and optional <code>kwargs</code> to let you pass whatever other parameter the model understands, such as the number of answers (<code>n</code>), the temperature (<code>temperature</code>), etc.</p>
+<p>Note that in this case, we’re passing the API key to the component’s constructor. This is unnecessary: <code>OpenAIGenerator</code> can read the value from the <code>OPENAI_API_KEY</code> environment variable and also from the <code>api_key</code> module variable of <a href="https://github.com/openai/openai-python#usage" class="external-link" target="_blank" rel="noopener"><code>openai</code>’s SDK</a>.</p>
+<p>Right now, Haystack supports HuggingFace models through the <a href="https://docs.haystack.deepset.ai/v2.0/docs/huggingfacelocalgenerator" class="external-link" target="_blank" rel="noopener"><code>HuggingFaceLocalGenerator</code></a> and <a href="https://docs.haystack.deepset.ai/v2.0/docs/huggingfacetgigenerator" class="external-link" target="_blank" rel="noopener"><code>HuggingFaceTGIGenerator</code></a> components, and many more LLMs are coming soon.</p>
+<h2 id="promptbuilder-structured-prompts-from-templates">
+ PromptBuilder: structured prompts from templates
+ <a class="heading-link" href="#promptbuilder-structured-prompts-from-templates">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Let’s imagine that our LLM-powered application also comes with some pre-defined questions that the user can select instead of typing in full. For example, instead of asking them to type <code>What's the official language of France?</code>, we let them select <code>Tell me the official languages</code> from a list, and they simply need to type “France” (or “Wakanda” for a change - our chatbot needs some challenges too).</p>
+<p>In this scenario, we have two pieces of the prompt: a variable (the country name, like “France”) and a prompt template, which in this case is <code>"What's the official language of {{ country }}?"</code></p>
+<p>Haystack offers a component that can render variables into prompt templates: it’s called <a href="https://docs.haystack.deepset.ai/v2.0/docs/promptbuilder" class="external-link" target="_blank" rel="noopener"><code>PromptBuilder</code></a>. As the generators we’ve seen before, also <code>PromptBuilder</code> is nearly trivial to initialize and use.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.builders</span> <span style="color:#ff7b72">import</span> PromptBuilder
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>prompt_builder <span style="color:#ff7b72;font-weight:bold">=</span> PromptBuilder(template<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"What's the official language of {{ country }}?"</span>)
+</span></span><span style="display:flex;"><span>prompt_builder<span style="color:#ff7b72;font-weight:bold">.</span>run(country<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"France"</span>)
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {'prompt': "What's the official language of France?"}</span>
+</span></span></code></pre></div><p>Note how we defined a variable, <code>country</code>, by wrapping its name in double curly brackets. PromptBuilder lets you define any input variable that way: if the prompt template was <code>"What's the official language of {{ nation }}?"</code>, the <code>run()</code> method of <code>PromptBuilder</code> would have expected a <code>nation</code> input.</p>
+<p>This syntax comes from <a href="https://jinja.palletsprojects.com/en/3.0.x/intro/" class="external-link" target="_blank" rel="noopener">Jinja2</a>, a popular templating library for Python. If you have ever used Flask, Django, or Ansible, you will feel at home with <code>PromptBuilder</code>. Instead, if you never heard of any of these libraries, you can check out the <a href="https://jinja.palletsprojects.com/en/3.0.x/templates/" class="external-link" target="_blank" rel="noopener">syntax</a> on Jinja’s documentation. Jinja has a powerful templating language and offers way more features than you’ll ever need in prompt templates, ranging from simple if statements and for loops to object access through dot notation, nesting of templates, variables manipulation, macros, full-fledged import and encapsulation of templates, and more.</p>
+<h2 id="a-simple-generative-pipeline">
+ A Simple Generative Pipeline
+ <a class="heading-link" href="#a-simple-generative-pipeline">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>With these two components, we can assemble a minimal pipeline to see how they work together. Connecting them is trivial: <code>PromptBuilder</code> generates a <code>prompt</code> output, and <code>OpenAIGenerator</code> expects an input with the same name and type.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack</span> <span style="color:#ff7b72">import</span> Pipeline
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.generators</span> <span style="color:#ff7b72">import</span> OpenAIGenerator
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.builders</span> <span style="color:#ff7b72">import</span> PromptBuilder
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipe <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"prompt_builder"</span>, PromptBuilder(template<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"What's the official language of {{ country }}?"</span>))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"llm"</span>, OpenAIGenerator(api_key<span style="color:#ff7b72;font-weight:bold">=</span>api_key))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"prompt_builder"</span>, <span style="color:#a5d6ff">"llm"</span>)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>run({<span style="color:#a5d6ff">"prompt_builder"</span>: {<span style="color:#a5d6ff">"country"</span>: <span style="color:#a5d6ff">"France"</span>}})
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {"llm": {"replies": ['The official language of France is French.'] }}</span>
+</span></span></code></pre></div><p>Here is the pipeline graph:</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-27-haystack-series-rag/simple-llm-pipeline.png" alt="Simple LLM pipeline"></p>
+<h2 id="make-the-llm-cheat">
+ Make the LLM cheat
+ <a class="heading-link" href="#make-the-llm-cheat">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Building the Generative part of a RAG application was very simple! So far, we only provided the question to the LLM, but no information to base its answers on. Nowadays, LLMs possess a lot of general knowledge, so questions about famous countries such as France or Germany are easy for them to reply to correctly. However, when using an app about world countries, some users may be interested in knowing more about obscure or defunct microstates that don’t exist anymore. In this case, ChatGPT is unlikely to provide the correct answer without any help.</p>
+<p>For example, let’s ask our pipeline something <em>really</em> obscure.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>run({<span style="color:#a5d6ff">"prompt_builder"</span>: {<span style="color:#a5d6ff">"country"</span>: <span style="color:#a5d6ff">"the Republic of Rose Island"</span>}})
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># "llm": {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># "replies": [</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'The official language of the Republic of Rose Island was Italian.'</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ]</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span></code></pre></div><p>The answer is an educated guess but is not accurate: although it was located just outside of Italy’s territorial waters, according to <a href="https://en.wikipedia.org/wiki/Republic_of_Rose_Island" class="external-link" target="_blank" rel="noopener">Wikipedia</a> the official language of this short-lived micronation was Esperanto.</p>
+<p>How can we get ChatGPT to reply to such a question correctly? One way is to make it “cheat” by providing the answer as part of the question. In fact, <code>PromptBuilder</code> is designed to serve precisely this use case.</p>
+<p>Here is our new, more advanced prompt:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>Given the following information, answer the question.
+</span></span><span style="display:flex;"><span>Context: {{ context }}
+</span></span><span style="display:flex;"><span>Question: {{ question }}
+</span></span></code></pre></div><p>Let’s build a new pipeline using this prompt!</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>context_template <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"""
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Given the following information, answer the question.
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Context: {{ context }}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Question: {{ question }}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">"""</span>
+</span></span><span style="display:flex;"><span>language_template <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"What's the official language of {{ country }}?"</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipe <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"context_prompt"</span>, PromptBuilder(template<span style="color:#ff7b72;font-weight:bold">=</span>context_template))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"language_prompt"</span>, PromptBuilder(template<span style="color:#ff7b72;font-weight:bold">=</span>language_template))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"llm"</span>, OpenAIGenerator(api_key<span style="color:#ff7b72;font-weight:bold">=</span>api_key))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"language_prompt"</span>, <span style="color:#a5d6ff">"context_prompt.question"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"context_prompt"</span>, <span style="color:#a5d6ff">"llm"</span>)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>run({
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"context_prompt"</span>: {<span style="color:#a5d6ff">"context"</span>: <span style="color:#a5d6ff">"Rose Island had its own government, currency, post office, and commercial establishments, and the official language was Esperanto."</span>}
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"language_prompt"</span>: {<span style="color:#a5d6ff">"country"</span>: <span style="color:#a5d6ff">"the Republic of Rose Island"</span>}
+</span></span><span style="display:flex;"><span>})
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># "llm": {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># "replies": [</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'The official language of the Republic of Rose Island is Esperanto.'</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ]</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span></code></pre></div><p>Let’s look at the graph of our Pipeline:</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-27-haystack-series-rag/double-promptbuilder-pipeline.png" alt="Double PromptBuilder pipeline"></p>
+<p>The beauty of <code>PromptBuilder</code> lies in its flexibility. It allows users to chain instances together to assemble complex prompts from simpler schemas: for example, we used the output of the first <code>PromptBuilder</code> as the value of <code>question</code> in the second prompt.</p>
+<p>However, in this specific scenario, we can build a simpler system by merging the two prompts into one.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>Given the following information, answer the question.
+</span></span><span style="display:flex;"><span>Context: {{ context }}
+</span></span><span style="display:flex;"><span>Question: What's the official language of {{ country }}?
+</span></span></code></pre></div><p>Using this new prompt, the resulting pipeline becomes again very similar to our first.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>template <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"""
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Given the following information, answer the question.
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Context: {{ context }}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Question: What's the official language of {{ country }}?
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">"""</span>
+</span></span><span style="display:flex;"><span>pipe <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"prompt_builder"</span>, PromptBuilder(template<span style="color:#ff7b72;font-weight:bold">=</span>template))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"llm"</span>, OpenAIGenerator(api_key<span style="color:#ff7b72;font-weight:bold">=</span>api_key))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"prompt_builder"</span>, <span style="color:#a5d6ff">"llm"</span>)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>run({
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"prompt_builder"</span>: {
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"context"</span>: <span style="color:#a5d6ff">"Rose Island had its own government, currency, post office, and commercial establishments, and the official language was Esperanto."</span>,
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"country"</span>: <span style="color:#a5d6ff">"the Republic of Rose Island"</span>
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span>})
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># "llm": {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># "replies": [</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'The official language of the Republic of Rose Island is Esperanto.'</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ]</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span></code></pre></div><p><img src="https://www.zansara.dev/posts/2023-10-27-haystack-series-rag/double-variable-promptbuilder-pipeline.png" alt="PromptBuilder with two inputs pipeline"></p>
+<h2 id="retrieving-the-context">
+ Retrieving the context
+ <a class="heading-link" href="#retrieving-the-context">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>For now, we’ve been playing with prompts, but the fundamental question remains unanswered: where do we get the correct text snippet for the question the user is asking? We can’t expect such information as part of the input: we need our system to be able to fetch this information independently, based uniquely on the query.</p>
+<p>Thankfully, retrieving relevant information from large <a href="https://en.wikipedia.org/wiki/Text_corpus" class="external-link" target="_blank" rel="noopener">corpora</a> (a technical term for extensive collections of data, usually text) is a task that Haystack excels at since its inception: the components that perform this task are called <a href="https://docs.haystack.deepset.ai/v2.0/docs/retrievers" class="external-link" target="_blank" rel="noopener">Retrievers</a>.</p>
+<p>Retrieval can be performed on different data sources: to begin, let’s assume we’re searching for data in a local database, which is the use case that most Retrievers are geared towards.</p>
+<p>Let’s create a small local database to store information about some European countries. Haystack offers a neat object for these small-scale demos: <code>InMemoryDocumentStore</code>. This document store is little more than a Python dictionary under the hood but provides the same exact API as much more powerful data stores and vector stores, such as <a href="https://github.com/deepset-ai/haystack-core-integrations/tree/main/document_stores/elasticsearch" class="external-link" target="_blank" rel="noopener">Elasticsearch</a> or <a href="https://haystack.deepset.ai/integrations/chroma-documentstore" class="external-link" target="_blank" rel="noopener">ChromaDB</a>. Keep in mind that the object is called “Document Store” and not simply “datastore” because what it stores is Haystack’s Document objects: a small dataclass that helps other components make sense of the data that they receive.</p>
+<p>So, let’s initialize an <code>InMemoryDocumentStore</code> and write some <code>Documents</code> into it.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.dataclasses</span> <span style="color:#ff7b72">import</span> Document
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.document_stores.in_memory</span> <span style="color:#ff7b72">import</span> InMemoryDocumentStore
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>documents <span style="color:#ff7b72;font-weight:bold">=</span> [
+</span></span><span style="display:flex;"><span> Document(content<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"German is the the official language of Germany."</span>),
+</span></span><span style="display:flex;"><span> Document(content<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"The capital of France is Paris, and its official language is French."</span>),
+</span></span><span style="display:flex;"><span> Document(content<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Italy recognizes a few official languages, but the most widespread one is Italian."</span>),
+</span></span><span style="display:flex;"><span> Document(content<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea."</span>)
+</span></span><span style="display:flex;"><span>]
+</span></span><span style="display:flex;"><span>docstore <span style="color:#ff7b72;font-weight:bold">=</span> InMemoryDocumentStore()
+</span></span><span style="display:flex;"><span>docstore<span style="color:#ff7b72;font-weight:bold">.</span>write_documents(documents<span style="color:#ff7b72;font-weight:bold">=</span>documents)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>docstore<span style="color:#ff7b72;font-weight:bold">.</span>filter_documents()
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns [</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Document(content="German is the the official language of Germany."), </span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Document(content="The capital of France is Paris, and its official language is French."),</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea."),</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Document(content="Italy recognizes a few official languages, but the most widespread one is Italian."),</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ]</span>
+</span></span></code></pre></div><p>Once the document store is set up, we can initialize a retriever. In Haystack 2.0, each document store comes with its own set of highly optimized retrievers: <code>InMemoryDocumentStore</code> offers two, one based on BM25 ranking and one based on embedding similarity.</p>
+<p>Let’s start with the BM25-based retriever, which is slightly easier to set up. Let’s first use it in isolation to see how it behaves.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.components.retrievers.in_memory</span> <span style="color:#ff7b72">import</span> InMemoryBM25Retriever
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>retriever <span style="color:#ff7b72;font-weight:bold">=</span> InMemoryBM25Retriever(document_store<span style="color:#ff7b72;font-weight:bold">=</span>docstore)
+</span></span><span style="display:flex;"><span>retriever<span style="color:#ff7b72;font-weight:bold">.</span>run(query<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Rose Island"</span>, top_k<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">1</span>)
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns [</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ]</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>retriever<span style="color:#ff7b72;font-weight:bold">.</span>run(query<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Rose Island"</span>, top_k<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">3</span>)
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns [</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Document(content="Italy recognizes a few official languages, but the most widespread one is Italian."),</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Document(content="The capital of France is Paris, and its official language is French."),</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ]</span>
+</span></span></code></pre></div><p>We see that <a href="https://docs.haystack.deepset.ai/v2.0/reference/retriever-api#inmemorybm25retriever" class="external-link" target="_blank" rel="noopener"><code>InMemoryBM25Retriever</code></a> accepts a few parameters. <code>query</code> is the question we want to find relevant documents for. In the case of BM25, the algorithm only searches for exact word matches. The resulting retriever is very fast, but it doesn’t fail gracefully: it can’t handle spelling mistakes, synonyms, or descriptions of an entity. For example, documents containing the word “cat” would be considered irrelevant against a query such as “felines”.</p>
+<p><code>top_k</code> controls the number of documents returned. We can see that in the first example, only one document is returned, the correct one. In the second, where <code>top_k = 3</code>, the retriever is forced to return three documents even if just one is relevant, so it picks the other two randomly. Although the behavior is not optimal, BM25 guarantees that if there is a document that is relevant to the query, it will be in the first position, so for now, we can use it with <code>top_k=1</code>.</p>
+<p>Retrievers also accepts a <code>filters</code> parameter, which lets you pre-filter the documents before retrieval. This is a powerful technique that comes useful in complex applications, but for now we have no use for it. I will talk more in detail about this topic, called metadata filtering, in a later post.</p>
+<p>Let’s now make use of this new component in our Pipeline.</p>
+<h2 id="our-first-rag-pipeline">
+ Our first RAG Pipeline
+ <a class="heading-link" href="#our-first-rag-pipeline">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>The retriever does not return a single string but a list of Documents. How do we put the content of these objects into our prompt template?</p>
+<p>It’s time to use Jinja’s powerful syntax to do some unpacking on our behalf.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>Given the following information, answer the question.
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>Context:
+</span></span><span style="display:flex;"><span>{% for document in documents %}
+</span></span><span style="display:flex;"><span> {{ document.content }}
+</span></span><span style="display:flex;"><span>{% endfor %}
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>Question: What's the official language of {{ country }}?
+</span></span></code></pre></div><p>Notice how, despite the slightly alien syntax for a Python programmer, what the template does is reasonably evident: it iterates over the documents and, for each of them, renders their <code>content</code> field.</p>
+<p>With all these pieces set up, we can finally put them all together.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>template <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"""
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Given the following information, answer the question.
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Context:
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">{</span><span style="color:#a5d6ff">% f</span><span style="color:#a5d6ff">or document in documents %}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> {{ document.content }}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">{</span><span style="color:#a5d6ff">% e</span><span style="color:#a5d6ff">ndfor %}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Question: What's the official language of {{ country }}?
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">"""</span>
+</span></span><span style="display:flex;"><span>pipe <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"retriever"</span>, InMemoryBM25Retriever(document_store<span style="color:#ff7b72;font-weight:bold">=</span>docstore))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"prompt_builder"</span>, PromptBuilder(template<span style="color:#ff7b72;font-weight:bold">=</span>template))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"llm"</span>, OpenAIGenerator(api_key<span style="color:#ff7b72;font-weight:bold">=</span>api_key))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"retriever"</span>, <span style="color:#a5d6ff">"prompt_builder.documents"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"prompt_builder"</span>, <span style="color:#a5d6ff">"llm"</span>)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>run({
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"retriever"</span>: {<span style="color:#a5d6ff">"query"</span>: country},
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"prompt_builder"</span>: {
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"country"</span>: <span style="color:#a5d6ff">"the Republic of Rose Island"</span>
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span>})
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># "llm": {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># "replies": [</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'The official language of the Republic of Rose Island is Esperanto.'</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ]</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span></code></pre></div><p><img src="https://www.zansara.dev/posts/2023-10-27-haystack-series-rag/bm25-rag-pipeline.png" alt="BM25 RAG Pipeline"></p>
+<p>Congratulations! We’ve just built our first, true-to-its-name RAG Pipeline.</p>
+<h2 id="scaling-up-elasticsearch">
+ Scaling up: Elasticsearch
+ <a class="heading-link" href="#scaling-up-elasticsearch">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>So, we now have our running prototype. What does it take to scale this system up for production workloads?</p>
+<p>Of course, scaling up a system to production readiness is no simple task that can be addressed in a paragraph. Still, we can start this journey with one component that can readily be improved: the document store.</p>
+<p><code>InMemoryDocumentStore</code> is clearly a toy implementation: Haystack supports much more performant document stores that make more sense to use in a production scenario. Since we have built our app with a BM25 retriever, let’s select <a href="https://haystack.deepset.ai/integrations/elasticsearch-document-store" class="external-link" target="_blank" rel="noopener">Elasticsearch</a> as our production-ready document store of choice.</p>
+<div class="notice warning">
+ <div class="notice-content">⚠️ <strong>Warning:</strong> <em>While ES is a valid document store to use in this scenario, nowadays if often makes more sense to choose a more specialized document store such as <a href="https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/weaviate" class="external-link" target="_blank" rel="noopener">Weaviate</a>, <a href="https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/qdrant" class="external-link" target="_blank" rel="noopener">Qdrant</a>, and so on. Check <a href="https://github.com/deepset-ai/haystack-core-integrations/tree/main" class="external-link" target="_blank" rel="noopener">this page</a> to see which document stores are currently supported for Haystack 2.0.</em></div>
+</div>
+
+<p>How do we use Elasticsearch on our pipeline? All it takes is to swap out <code>InMemoryDocumentStore</code> and <code>InMemoryBM25Retriever</code> with their Elasticsearch counterparts, which offer nearly identical APIs.</p>
+<p>First, let’s create the document store: we will need a slightly more complex setup to connect to the Elasticearch backend. In this example, we use Elasticsearch version 8.8.0, but every Elasticsearch 8 version should work.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">elasticsearch_haystack.document_store</span> <span style="color:#ff7b72">import</span> ElasticsearchDocumentStore
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>host <span style="color:#ff7b72;font-weight:bold">=</span> os<span style="color:#ff7b72;font-weight:bold">.</span>environ<span style="color:#ff7b72;font-weight:bold">.</span>get(<span style="color:#a5d6ff">"ELASTICSEARCH_HOST"</span>, <span style="color:#a5d6ff">"https://localhost:9200"</span>)
+</span></span><span style="display:flex;"><span>user <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"elastic"</span>
+</span></span><span style="display:flex;"><span>pwd <span style="color:#ff7b72;font-weight:bold">=</span> os<span style="color:#ff7b72;font-weight:bold">.</span>environ[<span style="color:#a5d6ff">"ELASTICSEARCH_PASSWORD"</span>] <span style="color:#8b949e;font-style:italic"># You need to provide this value</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>docstore <span style="color:#ff7b72;font-weight:bold">=</span> ElasticsearchDocumentStore(
+</span></span><span style="display:flex;"><span> hosts<span style="color:#ff7b72;font-weight:bold">=</span>[host],
+</span></span><span style="display:flex;"><span> basic_auth<span style="color:#ff7b72;font-weight:bold">=</span>(user, pwd),
+</span></span><span style="display:flex;"><span> ca_certs<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"/content/elasticsearch-8.8.0/config/certs/http_ca.crt"</span>
+</span></span><span style="display:flex;"><span>)
+</span></span></code></pre></div><p>Now, let’s write again our four documents into the store. In this case, we specify the duplicate policy, so if the documents were already present, they would be overwritten. All Haystack document stores offer three policies to handle duplicates: <code>FAIL</code> (the default), <code>SKIP</code>, and <code>OVERWRITE</code>.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">haystack.document_stores</span> <span style="color:#ff7b72">import</span> DuplicatePolicy
+</span></span><span style="display:flex;"><span>documents <span style="color:#ff7b72;font-weight:bold">=</span> [
+</span></span><span style="display:flex;"><span> Document(content<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"German is the the official language of Germany."</span>),
+</span></span><span style="display:flex;"><span> Document(content<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"The capital of France is Paris, and its official language is French."</span>),
+</span></span><span style="display:flex;"><span> Document(content<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Italy recognizes a few official languages, but the most widespread one is Italian."</span>),
+</span></span><span style="display:flex;"><span> Document(content<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea."</span>)
+</span></span><span style="display:flex;"><span>]
+</span></span><span style="display:flex;"><span>docstore<span style="color:#ff7b72;font-weight:bold">.</span>write_documents(documents<span style="color:#ff7b72;font-weight:bold">=</span>documents, policy<span style="color:#ff7b72;font-weight:bold">=</span>DuplicatePolicy<span style="color:#ff7b72;font-weight:bold">.</span>OVERWRITE)
+</span></span></code></pre></div><p>Once this is done, we are ready to build the same pipeline as before, but using <code>ElasticsearchBM25Retriever</code>.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">elasticsearch_haystack.bm25_retriever</span> <span style="color:#ff7b72">import</span> ElasticsearchBM25Retriever
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>template <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"""
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Given the following information, answer the question.
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Context:
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">{</span><span style="color:#a5d6ff">% f</span><span style="color:#a5d6ff">or document in documents %}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> {{ document.content }}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">{</span><span style="color:#a5d6ff">% e</span><span style="color:#a5d6ff">ndfor %}
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">Question: What's the official language of {{ country }}?
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff">"""</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipe <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"retriever"</span>, ElasticsearchBM25Retriever(document_store<span style="color:#ff7b72;font-weight:bold">=</span>docstore))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"prompt_builder"</span>, PromptBuilder(template<span style="color:#ff7b72;font-weight:bold">=</span>template))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"llm"</span>, OpenAIGenerator(api_key<span style="color:#ff7b72;font-weight:bold">=</span>api_key))
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"retriever"</span>, <span style="color:#a5d6ff">"prompt_builder.documents"</span>)
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"prompt_builder"</span>, <span style="color:#a5d6ff">"llm"</span>)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>draw(<span style="color:#a5d6ff">"elasticsearch-rag-pipeline.png"</span>)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>country <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">"the Republic of Rose Island"</span>
+</span></span><span style="display:flex;"><span>pipe<span style="color:#ff7b72;font-weight:bold">.</span>run({
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"retriever"</span>: {<span style="color:#a5d6ff">"query"</span>: country},
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"prompt_builder"</span>: {<span style="color:#a5d6ff">"country"</span>: country}
+</span></span><span style="display:flex;"><span>})
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># "llm": {</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># "replies": [</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># 'The official language of the Republic of Rose Island is Esperanto.'</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># ]</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># }</span>
+</span></span></code></pre></div><p><img src="https://www.zansara.dev/posts/2023-10-27-haystack-series-rag/elasticsearch-rag-pipeline.png" alt="Elasticsearch RAG Pipeline"></p>
+<p>That’s it! We’re now running the same pipeline over a production-ready Elasticsearch instance.</p>
+<h2 id="wrapping-up">
+ Wrapping up
+ <a class="heading-link" href="#wrapping-up">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>In this post, we’ve detailed some fundamental components that make RAG applications possible with Haystack: Generators, the PromptBuilder, and Retrievers. We’ve seen how they can all be used in isolation and how you can make Pipelines out of them to achieve the same goal. Last, we’ve experimented with some of the (very early!) features that make Haystack 2.0 production-ready and easy to scale up from a simple demo with minimal changes.</p>
+<p>However, this is just the start of our journey into RAG. Stay tuned!</p>
+<hr>
+<p><em>Next: <a href="https://www.zansara.dev/posts/2023-11-05-haystack-series-minimal-indexing" >Indexing data for RAG applications</a></em></p>
+<p><em>Previous: <a href="https://www.zansara.dev/posts/2023-10-26-haystack-series-canals" >Canals: a new concept of Pipeline</a></em></p>
+<p><em>See the entire series here: <a href="https://www.zansara.dev/series/haystack-2.0-series/" >Haystack 2.0 series</a></em></p>
+<p><small><em>Cover image from <a href="https://it.wikipedia.org/wiki/File:Isoladellerose.jpg" class="external-link" target="_blank" rel="noopener">Wikipedia</a></em></small></p>
+
+
+
+
+ A New Approach to Haystack Pipelines
+ https://www.zansara.dev/posts/2023-10-26-haystack-series-canals/
+ Thu, 26 Oct 2023 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2023-10-26-haystack-series-canals/
+ <p><em>Updated on 21/12/2023</em></p>
+<hr>
+<p>As we have seen in <a href="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/" class="external-link" target="_blank" rel="noopener">the previous episode of this series</a>, Haystack’s Pipeline is a powerful concept that comes with its set of benefits and shortcomings. In Haystack 2.0, the pipeline was one of the first items that we focused our attention on, and it was the starting point of the entire rewrite.</p>
+<p>What does this mean in practice? Let’s look at what Haystack Pipelines in 2.0 will be like, how they differ from their 1.x counterparts, and the pros and cons of this new paradigm.</p>
+<h2 id="new-use-cases">
+ New Use Cases
+ <a class="heading-link" href="#new-use-cases">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>I’ve already written <a href="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/" class="external-link" target="_blank" rel="noopener">at length</a> about what made the original Pipeline concept so powerful and its weaknesses. Pipelines were overly effective for the use cases we could conceive while developing them, but they didn’t generalize well on unforeseen situations.</p>
+<p>For a long time, Haystack was able to afford not focusing on use cases that didn’t fit its architecture, as I have mentioned in my <a href="https://www.zansara.dev/posts/2023-10-11-haystack-series-why/" class="external-link" target="_blank" rel="noopener">previous post</a> about the reasons for the rewrite. The pipeline was then more than sufficient for its purposes.</p>
+<p>However, the situation flipped as LLMs and Generative AI “entered” the scene abruptly at the end of 2022 (although it’s certainly been around for longer). Our <code>Pipeline</code> although useable and still quite powerful in many LLM use-cases, seemingly overfit the original use-cases it was designed for.</p>
+<p>Let’s take one of these use cases and see where it leads us.</p>
+<h2 id="rag-pipelines">
+ RAG Pipelines
+ <a class="heading-link" href="#rag-pipelines">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Let’s take one typical example: <a href="https://www.deepset.ai/blog/llms-retrieval-augmentation" class="external-link" target="_blank" rel="noopener">retrieval augmented generation</a>, or RAG for short. This technique has been used since the very early days of the Generative AI boom as an easy way to strongly <a href="https://haystack.deepset.ai/blog/generative-vs-extractive-models" class="external-link" target="_blank" rel="noopener">reduce hallucinations</a> and improve the alignment of LLMs. The basic idea is: instead of asking directly a question, such as <code>"What's the capital of France?"</code>, we send to the model a more complex prompt, that includes both the question and the answer. Such a prompt might be:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>Given the following paragraph, answer the question.
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>Paragraph: France is a unitary semi-presidential republic with its capital in Paris,
+</span></span><span style="display:flex;"><span>the country's largest city and main cultural and commercial centre; other major urban
+</span></span><span style="display:flex;"><span>areas include Marseille, Lyon, Toulouse, Lille, Bordeaux, Strasbourg and Nice.
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>Question: What's the capital of France?
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>Answer:
+</span></span></code></pre></div><p>In this situation, the task of the LLM becomes far easier: instead of drawing facts from its internal knowledge, which might be lacking, inaccurate, or out-of-date, the model can use the paragraph’s content to answer the question, improving the model’s performance significantly.</p>
+<p>We now have a new problem, though. How can we provide the correct snippets of text to the LLM? This is where the “retrieval” keyword comes up.</p>
+<p>One of Haystack’s primary use cases had been <a href="https://huggingface.co/tasks/question-answering" class="external-link" target="_blank" rel="noopener">Extractive Question Answering</a>: a system where a Retriever component searches a Document Store (such as a vector or SQL database) for snippets of text that are the most relevant to a given question. It then sends such snippets to a Reader (an extractive model), which highlights the keywords that answer the original question.</p>
+<p>By replacing a Reader model with an LLM, we get a Retrieval Augmented Generation Pipeline. Easy!</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-26-haystack-series-canals/gen-vs-ext-qa-pipeline.png" alt="Generative vs Extractive QA Pipeline Graph"></p>
+<p>So far, everything checks out. Supporting RAG with Haystack feels not only possible but natural. Let’s take this simple example one step forward: what if, instead of getting the data from a document store, I want to retrieve data from the Internet?</p>
+<h2 id="web-rag">
+ Web RAG
+ <a class="heading-link" href="#web-rag">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>At first glance, the task may not seem daunting. We surely need a special Retriever that, instead of searching through a DB, searches through the Internet using a search engine. But the core concepts stay the same, and so, we assume, should the pipeline’s graph. The end result should be something like this:</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-26-haystack-series-canals/initial-web-rag-pipeline.png" alt="Initial Web RAG Pipeline Graph"></p>
+<p>However, the problem doesn’t end there. Search engines return links, which need to be accessed, and the content of the webpage downloaded. Such pages may be extensive and contain artifacts, so the resulting text needs to be cleaned, reduced into paragraphs, potentially embedded by a retrieval model, ranked against the original query, and only the top few resulting pieces of text need to be passed over to the LLM. Just by including these minimal requirements, our pipeline already looks like this:</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-26-haystack-series-canals/linear-web-rag-pipeline.png" alt="Linear Web RAG Pipeline Graph"></p>
+<p>And we still need to consider that URLs may reference not HTML pages but PDFs, videos, zip files, and so on. We need file converters, zip extractors, audio transcribers, and so on.</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-26-haystack-series-canals/multifile-web-rag-pipeline.png" alt="Multiple File Type Web RAG Pipeline Graph"></p>
+<p>You may notice how this use case moved quickly from looking like a simple query pipeline into a strange overlap of a query and an indexing pipeline. As we’ve learned in the previous post, indexing pipelines have their own set of quirks, one of which is that they can’t simultaneously process files of different types. But we can only expect the Search Engine to retrieve HTML files or PDFs if we filter them out on purpose, which makes the pipeline less effective. In fact, a pipeline that can read content from different file types, such as the one above, can’t really be made to work.</p>
+<p>And what if, on top of this, we need to cache the resulting documents to reduce latency? What if I wanted to get the results from Google’s page 2, but only if the content of page 1 did not answer our question? At this point, the pipeline is hard to imagine, let alone draw.</p>
+<p>Although Web RAG is somewhat possible in Haystack, it stretches far beyond what the pipeline was designed to handle. Can we do better?</p>
+<h2 id="pinpointing-the-issue">
+ Pinpointing the issue
+ <a class="heading-link" href="#pinpointing-the-issue">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>When we went back to the drawing board to address these concerns, the first step was pinpointing the issue.</p>
+<p>The root problem, as we realized, is that Haystack Pipelines treats each component as a locomotive treats its wagons. They all look the same from the pipeline’s perspective, they can all be connected in any order, and they all go from A to B rolling over the same pair of rails, passing all through the same stations.</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-26-haystack-series-canals/train.png" alt="Cargo Train"></p>
+<p>In Haystack 1, components are designed to serve the pipeline’s needs first. A good component is identical to all the others, provides the exact interface the pipeline requires, and can be connected to any other in any order. The components are awkward to use outside of a pipeline due to the same <code>run()</code> method that makes the pipeline so ergonomic. Why does the Ranker, which needs only a query and a list of Documents to operate, also accept <code>file_paths</code> and <code>meta</code> in its <code>run()</code> method? It does so uniquely to satisfy the pipeline’s requirements, which in turn only exist to make all components forcefully compatible with each other.</p>
+<p>Just like a locomotive, the pipeline pushes the components over the input data one by one. When seen in this light, it’s painfully obvious why the indexing pipeline we’ve seen earlier can’t work: the “pipeline train” can only go on one branch at a time. Component trains can’t split mid-execution. They are designed to all see the same data all the time. Even when branching happens, all branches always see the same data. Sending different wagons onto different rails is not possible by design.</p>
+<h2 id="breaking-it-down">
+ Breaking it down
+ <a class="heading-link" href="#breaking-it-down">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>The issue’s core is more evident when seen in this light. The pipeline is the only object that drives the execution, while components tend to be as passive and uniform as possible. This approach doesn’t scale: components are fundamentally different, and asking them to all appear equal forces them to hide their differences, making bugs and odd behavior more likely. As the number of components to handle grows, their variety will increase regardless, so the pipeline must always be aware of all the possibilities to manage them and progressively add edge cases that rapidly increase its complexity.</p>
+<p>Therefore, the pipeline rewrite for Haystack 2.0 focused on one core principle: the components will define and drive the execution process. There is no locomotive anymore: every component needs to find its way, such as grabbing the data they need from the producers and sending their results to whoever needs them by declaring the proper connections. In the railway metaphor, it’s like adding a steering wheel to each container: the result is a truck, and the resulting system looks now like a highway.</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-26-haystack-series-canals/highway.png" alt="Highway"></p>
+<p>Just as railways are excellent at going from A to B when you only need to take a few well-known routes and never another, highways are unbeatable at reaching every possible destination with the same effort, even though they need a driver for each wagon. A “highway” Pipeline requires more work from the Components’ side, but it frees them to go wherever they need to with a precision that a “railway” pipeline cannot accomplish.</p>
+<h2 id="the-structure-of-haystack-20">
+ The Structure of Haystack 2.0
+ <a class="heading-link" href="#the-structure-of-haystack-20">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>By design, pipelines in Haystack 2.0 is not geared toward specific NLP use cases, but it’s a minimal, generic <a href="https://en.wikipedia.org/wiki/Extract,_transform,_load" class="external-link" target="_blank" rel="noopener">ETL</a>-like class.</p>
+<p>At its core, Haystack 2.0 builds upon these two fundamental concepts:</p>
+<ul>
+<li>
+<p>The <code>Component</code> protocol, a well-defined API that Python classes need to respect to be understood by the pipeline.</p>
+</li>
+<li>
+<p>The <code>Pipeline</code> object, the graph resolver and execution engine that also performs validation and provides a few utilities on top.</p>
+</li>
+</ul>
+<p>Let’s explore these two concepts one by one.</p>
+<h3 id="the-pipeline-api">
+ The Pipeline API
+ <a class="heading-link" href="#the-pipeline-api">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>The new <code>Pipeline</code> object may remind vaguely of Haystack’s original pipeline, and using one should feel very familiar. For example, this is how you assemble a simple Pipeline that performs two additions in Haystack 2.0.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">canals</span> <span style="color:#ff7b72">import</span> Pipeline
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">sample_components</span> <span style="color:#ff7b72">import</span> AddFixedValue
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Create the Pipeline object</span>
+</span></span><span style="display:flex;"><span>pipeline <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Add the components - note the missing`inputs` parameter</span>
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"add_one"</span>, AddFixedValue(add<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">1</span>))
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_component(<span style="color:#a5d6ff">"add_two"</span>, AddFixedValue(add<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">2</span>))
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Connect them together</span>
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>connect(<span style="color:#a5d6ff">"add_one.result"</span>, <span style="color:#a5d6ff">"add_two.value"</span>)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Draw the pipeline</span>
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>draw(<span style="color:#a5d6ff">"two_additions_pipeline.png"</span>)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># Run the pipeline</span>
+</span></span><span style="display:flex;"><span>results <span style="color:#ff7b72;font-weight:bold">=</span> pipeline<span style="color:#ff7b72;font-weight:bold">.</span>run({<span style="color:#a5d6ff">"add_one"</span>: {<span style="color:#a5d6ff">"value"</span>: <span style="color:#a5d6ff">1</span>}})
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>print(results)
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># prints '{"add_two": {"result": 4}}'</span>
+</span></span></code></pre></div><p>Creating the pipeline requires no special attention: however, you can now pass a <code>max_loops_allowed</code> parameter, to limit looping when it’s a risk. On the contrary, old Haystack 1.x Pipelines did not support loops at all.</p>
+<p>Next, components are added by calling the <code>Pipeline.add_component(name, component)</code> method. This is also subject to very similar requirements to the previous <code>pipeline.add_node</code>:</p>
+<ul>
+<li>Every component needs a unique name.</li>
+<li>Some are reserved (for now, only <code>_debug</code>).</li>
+<li>Instances are not reusable.</li>
+<li>The object needs to be a component.
+However, we no longer connect the components to each other using this function because, although it is possible to implement in principle, it feels more awkward to use in the case of loops.</li>
+</ul>
+<p>Consequently, we introduced a new method, <code>Pipeline.connect()</code>. This method follows the syntax <code>("producer_component.output_name_", "consumer_component.input_name")</code>: so we don’t simply line up two components one after the other, but we connect one of their outputs to one of their inputs in an explicit manner.</p>
+<p>This change allows pipelines to perform a much more careful validation of such connections. As we will discover soon, pipeline components in Haystack 2.0 must declare the type of their inputs and outputs. In this way, pipelines not only can make sure that the inputs and outputs exist for the given component, but they can also check whether their types match and can explain connection failures in great detail. For example, if there were a type mismatch, <code>Pipeline.connect()</code> will return an error such as:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-markdown" data-lang="markdown"><span style="display:flex;"><span>Cannot connect 'greeter.greeting' with 'add_two.value': their declared input and output
+</span></span><span style="display:flex;"><span>types do not match.
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>greeter:
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">-</span> greeting: str
+</span></span><span style="display:flex;"><span>add_two:
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">-</span> value: int (available)
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">-</span> add: Optional[int] (available)
+</span></span></code></pre></div><p>Once the components are connected together, the resulting pipeline can be drawn. Pipeline drawings in Haystack 2.0 show far more details than their predecessors because the components are forced to share much more information about what they need to run, the types of these variables, and so on. The pipeline above draws the following image:</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-26-haystack-series-canals/two_additions_pipeline.png" alt="A Pipeline making two additions"></p>
+<p>You can see how the components classes, their inputs and outputs, and all the connections are named and typed.</p>
+<p>So, how do you run such a pipeline? By just providing a dictionary of input values. Each starting component should have a small dictionary with all the necessary inputs. In the example above, we pass <code>1</code> to the <code>value</code> input of <code>add_one</code>. The results mirror the input’s structure: <code>add_two</code> is at the end of the pipeline, so the pipeline will return a dictionary where under the <code>add_two</code> key there is a dictionary: <code>{"result": 4}</code>.</p>
+<p>By looking at the diagram, you may have noticed that these two components have optional inputs. They’re not necessary for the pipeline to run, but they can be used to dynamically control the behavior of these components. In this case, <code>add</code> controls the “fixed value” this component adds to its primary input. For example:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>run({<span style="color:#a5d6ff">"add_one"</span>: {<span style="color:#a5d6ff">"value"</span>: <span style="color:#a5d6ff">1</span>, <span style="color:#a5d6ff">"add"</span>: <span style="color:#a5d6ff">2</span>}})
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns '{"add_two": {"result": 5}}'</span>
+</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>run({<span style="color:#a5d6ff">"add_one"</span>: {<span style="color:#a5d6ff">"value"</span>: <span style="color:#a5d6ff">1</span>}, <span style="color:#a5d6ff">"add_two"</span>: {<span style="color:#a5d6ff">"add"</span>: <span style="color:#a5d6ff">10</span>}})
+</span></span><span style="display:flex;"><span><span style="color:#8b949e;font-style:italic"># returns '{"add_two": {"result": 12}}'</span>
+</span></span></code></pre></div><p>One evident difficulty of this API is that it might be challenging to understand what to provide to the run method for each component. This issue has also been considered: the pipeline offers a <code>Pipeline.inputs()</code> method that returns a structured representation of all the expected input. For our pipeline, it looks like:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>{
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"add_one"</span>: {
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"value"</span>: {
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"type"</span>: int,
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"is_optional"</span>: <span style="color:#79c0ff">False</span>
+</span></span><span style="display:flex;"><span> },
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"add"</span>: {
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"type"</span>: typing<span style="color:#ff7b72;font-weight:bold">.</span>Optional[int],
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"is_optional"</span>: <span style="color:#79c0ff">True</span>
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> },
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"add_two"</span>: {
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"add"</span>: {
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"type"</span>: typing<span style="color:#ff7b72;font-weight:bold">.</span>Optional[int],
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"is_optional"</span>: <span style="color:#79c0ff">True</span>
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span>}
+</span></span></code></pre></div><h2 id="the-component-api">
+ The Component API
+ <a class="heading-link" href="#the-component-api">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Now that we covered the Pipeline’s API, let’s have a look at what it takes for a Python class to be treated as a pipeline component.</p>
+<p>You are going to need:</p>
+<ul>
+<li>
+<p><strong>A <code>@component</code> decorator</strong>. All component classes must be decorated with the <code>@component</code> decorator. This allows a pipeline to discover and validate them.</p>
+</li>
+<li>
+<p><strong>A <code>run()</code> method</strong>. This is the method where the main functionality of the component should be carried out. It’s invoked by <code>Pipeline.run()</code> and has a few constraints, which we will describe later.</p>
+</li>
+<li>
+<p><strong>A <code>@component.output_types()</code> decorator for the <code>run()</code> method</strong>. This allows the pipeline to validate the connections between components.</p>
+</li>
+<li>
+<p>Optionally, <strong>a <code>warm_up()</code> method</strong>. It can be used to defer the loading of a heavy resource (think a local LLM or an embedding model) to the warm-up stage that occurs right before the first execution of the pipeline. Components that use <code>warm_up()</code> can be added to a Pipeline and connected before the heavy operations are carried out. In this way, the validation that a <code>Pipeline</code> performs can happen before resources are wasted.</p>
+</li>
+</ul>
+<p>To summarize, a minimal component can look like this:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">canals</span> <span style="color:#ff7b72">import</span> component
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#d2a8ff;font-weight:bold">@component</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">class</span> <span style="color:#f0883e;font-weight:bold">Double</span>:
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> <span style="color:#d2a8ff;font-weight:bold">@component.output_types</span>(result<span style="color:#ff7b72;font-weight:bold">=</span>int)
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">def</span> <span style="color:#d2a8ff;font-weight:bold">run</span>(self, value: int):
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">return</span> {<span style="color:#a5d6ff">"result"</span>: value <span style="color:#ff7b72;font-weight:bold">*</span> <span style="color:#a5d6ff">2</span>}
+</span></span></code></pre></div><h3 id="pipeline-validation">
+ Pipeline Validation
+ <a class="heading-link" href="#pipeline-validation">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>Note how the <code>run()</code> method has a few peculiar features. One is that all the method parameters need to be typed: if <code>value</code> was not declared as <code>value: int</code>, the pipeline would raise an exception demanding for typing.</p>
+<p>This is the way components declare to the pipeline which inputs they expect and of which type: this is the first half of the information needed to perform the validation that <code>Pipeline.connect()</code> carries out.</p>
+<p>The other half of the information comes from the <code>@component.output_types</code> decorator. Pipelines demand that components declare how many outputs the component will produce and of what type. One may ask why not rely on typing for the outputs, just as we’ve done for the inputs. So why not simply declare components as:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#d2a8ff;font-weight:bold">@component</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">class</span> <span style="color:#f0883e;font-weight:bold">Double</span>:
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">def</span> <span style="color:#d2a8ff;font-weight:bold">run</span>(self, value: int) <span style="color:#ff7b72;font-weight:bold">-></span> int:
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">return</span> value <span style="color:#ff7b72;font-weight:bold">*</span> <span style="color:#a5d6ff">2</span>
+</span></span></code></pre></div><p>For <code>Double</code>, this is a legitimate solution. However, let’s see an example with another component called <code>CheckParity</code>: if a component’s input value is even, it sends it unchanged over the <code>even</code> output, while if it’s odd, it will send it over the <code>odd</code> output. The following clearly doesn’t work: we’re not communicating anywhere to Canals which output is even and which one is odd.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#d2a8ff;font-weight:bold">@component</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">class</span> <span style="color:#f0883e;font-weight:bold">CheckParity</span>:
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">def</span> <span style="color:#d2a8ff;font-weight:bold">run</span>(self, value: int) <span style="color:#ff7b72;font-weight:bold">-></span> int:
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">if</span> value <span style="color:#ff7b72;font-weight:bold">%</span> <span style="color:#a5d6ff">2</span> <span style="color:#ff7b72;font-weight:bold">==</span> <span style="color:#a5d6ff">0</span>:
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">return</span> value
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">return</span> value
+</span></span></code></pre></div><p>How about this instead?</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#d2a8ff;font-weight:bold">@component</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">class</span> <span style="color:#f0883e;font-weight:bold">CheckParity</span>:
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">def</span> <span style="color:#d2a8ff;font-weight:bold">run</span>(self, value: int) <span style="color:#ff7b72;font-weight:bold">-></span> Dict[str, int]:
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">if</span> value <span style="color:#ff7b72;font-weight:bold">%</span> <span style="color:#a5d6ff">2</span> <span style="color:#ff7b72;font-weight:bold">==</span> <span style="color:#a5d6ff">0</span>:
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">return</span> {<span style="color:#a5d6ff">"even"</span>: value}
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">return</span> {<span style="color:#a5d6ff">"odd"</span>: value}
+</span></span></code></pre></div><p>This approach carries all the information required. However, such information is only available after the <code>run()</code> method is called. Unless we parse the method to discover all return statements and their keys (which is not always possible), pipelines cannot know all the keys the return dictionary may have. So, it can’t validate the connections when <code>Pipeline.connect()</code> is called.</p>
+<p>The decorator bridges the gap by allowing the class to declare in advance what outputs it will produce and of which type. Pipeline trusts this information to be correct and validates the connections accordingly.</p>
+<p>Okay, but what if the component is very dynamic? The output type may depend on the input type. Perhaps the number of inputs depends on some initialization parameter. In these cases, pipelines allow components to declare the inputs and output types in their init method as such:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#d2a8ff;font-weight:bold">@component</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">class</span> <span style="color:#f0883e;font-weight:bold">HighlyDynamicComponent</span>:
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">def</span> __init__(self, <span style="color:#ff7b72;font-weight:bold">...</span>):
+</span></span><span style="display:flex;"><span> component<span style="color:#ff7b72;font-weight:bold">.</span>set_input_types(self, input_name<span style="color:#ff7b72;font-weight:bold">=</span>input_type, <span style="color:#ff7b72;font-weight:bold">...</span>)
+</span></span><span style="display:flex;"><span> component<span style="color:#ff7b72;font-weight:bold">.</span>set_output_types(self, output_name<span style="color:#ff7b72;font-weight:bold">=</span>output_type, <span style="color:#ff7b72;font-weight:bold">...</span>)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">def</span> <span style="color:#d2a8ff;font-weight:bold">run</span>(self, <span style="color:#ff7b72;font-weight:bold">**</span>kwargs):
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72;font-weight:bold">...</span>
+</span></span></code></pre></div><p>Note that there’s no more typing on <code>run()</code>, and the decorator is gone. The information provided in the init method is sufficient for the pipeline to validate the connections.</p>
+<p>One more feature of the inputs and output declarations relates to optional and variadic values. Pipelines in Haystack 2.0 support this both through a mix of type checking and signature inspection. For example, let’s have a look at how the <code>AddFixedValue</code> we’ve seen earlier looks like:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">typing</span> <span style="color:#ff7b72">import</span> Optional
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">canals</span> <span style="color:#ff7b72">import</span> component
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#d2a8ff;font-weight:bold">@component</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">class</span> <span style="color:#f0883e;font-weight:bold">AddFixedValue</span>:
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"""
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> Adds two values together.
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> """</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">def</span> __init__(self, add: int <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">1</span>):
+</span></span><span style="display:flex;"><span> self<span style="color:#ff7b72;font-weight:bold">.</span>add <span style="color:#ff7b72;font-weight:bold">=</span> add
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> <span style="color:#d2a8ff;font-weight:bold">@component.output_types</span>(result<span style="color:#ff7b72;font-weight:bold">=</span>int)
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">def</span> <span style="color:#d2a8ff;font-weight:bold">run</span>(self, value: int, add: Optional[int] <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#79c0ff">None</span>):
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"""
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> Adds two values together.
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> """</span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">if</span> add <span style="color:#ff7b72;font-weight:bold">is</span> <span style="color:#79c0ff">None</span>:
+</span></span><span style="display:flex;"><span> add <span style="color:#ff7b72;font-weight:bold">=</span> self<span style="color:#ff7b72;font-weight:bold">.</span>add
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">return</span> {<span style="color:#a5d6ff">"result"</span>: value <span style="color:#ff7b72;font-weight:bold">+</span> add}
+</span></span></code></pre></div><p>You can see that <code>add</code>, the optional parameter we met before, has a default value. Adding a default value to a parameter in the <code>run()</code> signature tells the pipeline that the parameter itself is optional, so the component can run even if that specific input doesn’t receive any value from the pipeline’s input or other components.</p>
+<p>Another component that generalizes the sum operation is <code>Sum</code>, which instead looks like this:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">canals</span> <span style="color:#ff7b72">import</span> component
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">from</span> <span style="color:#ff7b72">canals.component.types</span> <span style="color:#ff7b72">import</span> Variadic
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#d2a8ff;font-weight:bold">@component</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">class</span> <span style="color:#f0883e;font-weight:bold">Sum</span>:
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"""
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> Adds all its inputs together.
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> """</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> <span style="color:#d2a8ff;font-weight:bold">@component.output_types</span>(total<span style="color:#ff7b72;font-weight:bold">=</span>int)
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">def</span> <span style="color:#d2a8ff;font-weight:bold">run</span>(self, values: Variadic[int]):
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"""
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> :param values: the values to sum
+</span></span></span><span style="display:flex;"><span><span style="color:#a5d6ff"> """</span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">return</span> {<span style="color:#a5d6ff">"total"</span>: sum(v <span style="color:#ff7b72">for</span> v <span style="color:#ff7b72;font-weight:bold">in</span> values <span style="color:#ff7b72">if</span> v <span style="color:#ff7b72;font-weight:bold">is</span> <span style="color:#ff7b72;font-weight:bold">not</span> <span style="color:#79c0ff">None</span>)}
+</span></span></code></pre></div><p>In this case, we used the special type <code>Variadic</code> to tell the pipeline that the <code>values</code> input can receive data from multiple producers, instead of just one. Therefore, <code>values</code> is going to be a list type, but it can be connected to single <code>int</code> outputs, making it a valuable aggregator.</p>
+<h2 id="serialization">
+ Serialization
+ <a class="heading-link" href="#serialization">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Just like old Haystack Pipelines, the new pipelines can be serialized. However, this feature suffered from similar problems plaguing the execution model, so it was changed radically.</p>
+<p>The original pipeline gathered intrusive information about each of its components when initialized, leveraging the shared <code>BaseComponent</code> class. Conversely, the <code>Pipeline</code> delegates the serialization process entirely to its components.</p>
+<p>If a component wishes to be serializable, it must provide two additional methods, <code>to_dict</code> and <code>from_dict</code>, which perform serialization and deserialization to a dictionary. The pipeline limits itself to calling each of its component’s methods, collecting their output, grouping them together with some limited extra information (such as the connections between them), and returning the result.</p>
+<p>For example, if <code>AddFixedValue</code> were serializable, its serialized version could look like this:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>{
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"type"</span>: <span style="color:#a5d6ff">"AddFixedValue"</span>,
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"init_parameters"</span>: {
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"add"</span>: <span style="color:#a5d6ff">1</span>
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span>}
+</span></span></code></pre></div><p>The entire pipeline we used above would end up as follows:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>{
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"max_loops_allowed"</span>: <span style="color:#a5d6ff">100</span>,
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"components"</span>: {
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"add_one"</span>: {
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"type"</span>: <span style="color:#a5d6ff">"AddFixedValue"</span>,
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"init_parameters"</span>: {
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"add"</span>: <span style="color:#a5d6ff">1</span>
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> },
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"add_two"</span>: {
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"type"</span>: <span style="color:#a5d6ff">"AddFixedValue"</span>,
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"init_parameters"</span>: {
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"add"</span>: <span style="color:#a5d6ff">2</span>
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> },
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"connections"</span>: [
+</span></span><span style="display:flex;"><span> {
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"sender"</span>: <span style="color:#a5d6ff">"add_one.result"</span>,
+</span></span><span style="display:flex;"><span> <span style="color:#a5d6ff">"receiver"</span>: <span style="color:#a5d6ff">"add_two.value"</span>,
+</span></span><span style="display:flex;"><span> }
+</span></span><span style="display:flex;"><span> ]
+</span></span><span style="display:flex;"><span>}
+</span></span></code></pre></div><p>Notice how the components are free to perform serialization in the way they see fit. The only requirement imposed by the <code>Pipeline</code> is the presence of two top-level keys, <code>type</code> and <code>init_parameters</code>, which are necessary for the pipeline to deserialize each component into the correct class.</p>
+<p>This is useful, especially if the component’s state includes some non-trivial values, such as objects, API keys, or other special values. Pipeline no longer needs to know how to serialize everything the Components may contain: the task is fully delegated to them, which always knows best what needs to be done.</p>
+<h2 id="but-do-we-need-any-of-this">
+ But… do we need any of this?
+ <a class="heading-link" href="#but-do-we-need-any-of-this">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Having done a tour of the new <code>Pipeline</code> features, one might have noticed one detail. There’s a bit more work involved in using a Pipeline than there was before: you can’t just chain every component after every other. There are connections to be made, validation to perform, graphs to assemble, and so on.</p>
+<p>In exchange, the pipeline is now way more powerful than before. Sure, but so is a plain Python script. Do we <em>really</em> need the Pipeline object? And what do we need it for?</p>
+<ul>
+<li>
+<p><strong>Validation</strong>. While components normally validate their inputs and outputs, the pipeline does all the validation before the components run, even before loading heavy resources. This makes the whole system far less likely to fail at runtime for a simple input/output mismatch, which can be priceless for complex applications.</p>
+</li>
+<li>
+<p><strong>Serialization</strong>. Redistributing code is always tricky: redistributing a JSON file is much safer. Pipelines make it possible to represent complex systems in a readable JSON file that can be edited, shared, stored, deployed, and re-deployed on different backends at need.</p>
+</li>
+<li>
+<p><strong>Drawing</strong>: The new Pipeline offers a way to see your system clearly and automatically, which is often very handy for debugging, inspecting the system, and collaborating on the pipeline’s design.</p>
+</li>
+<li>
+<p>On top of this, the pipeline abstraction promotes flatter API surfaces by discouraging components nesting one within the other and providing easy-to-use, single-responsibility components that are easy to reason about.</p>
+</li>
+</ul>
+<p>Having said all of this, however, we don’t believe that the pipeline design makes Haystack win or lose. Pipelines are just a bonus on top of what provides the real value: a broad set of components that reliably perform well-defined tasks. That’s why the Component API does not make the <code>run()</code> method awkward to use outside of a Pipeline: calling <code>Sum.run(values=[1, 2, 3])</code> feels Pythonic outside of a pipeline and always will.</p>
+<p>In the following posts, I will explore the world of Haystack components, starting from our now familiar use cases: RAG Pipelines.</p>
+<hr>
+<p><em>Next: <a href="https://www.zansara.dev/posts/2023-10-27-haystack-series-rag" >RAG Pipelines from scratch</a></em></p>
+<p><em>Previous: <a href="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline" >Haystack’s Pipeline</a></em></p>
+<p><em>See the entire series here: <a href="https://www.zansara.dev/series/haystack-2.0-series/" >Haystack 2.0 series</a></em></p>
+
+
+
+
+ Haystack's Pipeline - A Deep Dive
+ https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/
+ Sun, 15 Oct 2023 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/
+ <p>If you’ve ever looked at Haystack before, you must have come across the <a href="https://docs.haystack.deepset.ai/docs/pipelines" class="external-link" target="_blank" rel="noopener">Pipeline</a>, one of the most prominent concepts of the framework. However, this abstraction is by no means an obvious choice when it comes to NLP libraries. Why did we adopt this concept, and what does it bring us?</p>
+<p>In this post, I go into all the details of how the Pipeline abstraction works in Haystack now, why it works this way, and its strengths and weaknesses. This deep dive into the current state of the framework is also a premise for the next episode, where I will explain how Haystack 2.0 addresses this version’s shortcomings.</p>
+<p>If you think you already know how Haystack Pipelines work, give this post a chance: I might manage to change your mind.</p>
+<h2 id="a-bit-of-history">
+ A Bit Of History
+ <a class="heading-link" href="#a-bit-of-history">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Interestingly, in the very first releases of Haystack, Pipelines were not a thing. Version 0.1.0 was released with a simpler object, the <a href="https://github.com/deepset-ai/haystack/blob/d2c77f307788899eb562d3cb6e42c69b968b9f2a/haystack/__init__.py#L16" class="external-link" target="_blank" rel="noopener">Finder</a>, that did little more than gluing together a <a href="https://docs.haystack.deepset.ai/docs/retriever" class="external-link" target="_blank" rel="noopener">Retriever</a> and a <a href="https://docs.haystack.deepset.ai/docs/reader" class="external-link" target="_blank" rel="noopener">Reader</a>, the two fundamental building blocks of a <a href="https://docs.haystack.deepset.ai/docs/glossary#semantic-search" class="external-link" target="_blank" rel="noopener">semantic search</a> application.</p>
+<p>In the next few months, however, the capabilities of language models expanded to enable many more use cases. One hot topic was <a href="https://haystack.deepset.ai/blog/hybrid-retrieval" class="external-link" target="_blank" rel="noopener">hybrid retrieval</a>: a system composed of two different Retrievers, an optional <a href="https://docs.haystack.deepset.ai/docs/ranker" class="external-link" target="_blank" rel="noopener">Ranker</a>, and an optional Reader. This kind of application clearly didn’t fit the Finder’s design, so in <a href="https://github.com/deepset-ai/haystack/releases/tag/v0.6.0" class="external-link" target="_blank" rel="noopener">version 0.6.0</a> the <a href="https://docs.haystack.deepset.ai/docs/pipelines" class="external-link" target="_blank" rel="noopener">Pipeline</a> object was introduced: a new abstraction that helped users build applications as a graph of components.</p>
+<p>Pipeline’s API was a huge step forward from Finder. It instantly enabled seemingly endless combinations of components, unlocked almost all use cases conceivable, and became a foundational Haystack concept meant to stay for a very long time. In fact, the API offered by the first version of Pipeline changed very little since its initial release.</p>
+<p>This is the snippet included in the release notes of version 0.6.0 to showcase hybrid retrieval. Does it look familiar?</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>p <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>p<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>es_retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"ESRetriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Query"</span>])
+</span></span><span style="display:flex;"><span>p<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>dpr_retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"DPRRetriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Query"</span>])
+</span></span><span style="display:flex;"><span>p<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>JoinDocuments(join_mode<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"concatenate"</span>), name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"JoinResults"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"ESRetriever"</span>, <span style="color:#a5d6ff">"DPRRetriever"</span>])
+</span></span><span style="display:flex;"><span>p<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>reader, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"QAReader"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"JoinResults"</span>])
+</span></span><span style="display:flex;"><span>res <span style="color:#ff7b72;font-weight:bold">=</span> p<span style="color:#ff7b72;font-weight:bold">.</span>run(query<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"What did Einstein work on?"</span>, top_k_retriever<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">1</span>)
+</span></span></code></pre></div><h2 id="a-powerful-abstraction">
+ A Powerful Abstraction
+ <a class="heading-link" href="#a-powerful-abstraction">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>One fascinating aspect of this Pipeline model is the simplicity of its user-facing API. In almost all examples, you see only two or three methods used:</p>
+<ul>
+<li><code>add_node</code>: to add a component to the graph and connect it to the others.</li>
+<li><code>run</code>: to run the Pipeline from start to finish.</li>
+<li><code>draw</code>: to draw the graph of the Pipeline to an image.</li>
+</ul>
+<p>At this level, users don’t need to know what kind of data the components need to function, what they produce, or even what the components <em>do</em>: all they need to know is the place they must occupy in the graph for the system to work.</p>
+<p>For example, as long as the users know that their hybrid retrieval pipeline should look more or less like this (note: this is the output of <code>Pipeline.draw()</code>), translating it into a Haystack Pipeline object using a few <code>add_node</code> calls is mostly straightforward.</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/hybrid-retrieval.png" alt="Hybrid Retrieval"></p>
+<p>This fact is reflected by the documentation of the various components as well. For example, this is how the documentation page for Ranker opens:</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/ranker-docs.png" alt="Ranker Documentation"></p>
+<p>Note how the first information about this component is <em>where to place it</em>. Right after, it specifies its inputs and outputs, even though it’s not immediately clear why we need this information, and then lists which specific classes can cover the role of a Ranker.</p>
+<p>The message is clear: all Ranker classes are functionally interchangeable, and as long as you place them correctly in the Pipeline, they will fulfill the function of Ranker as you expect them to. Users don’t need to understand what distinguishes <code>CohereRanker</code> from <code>RecentnessReranker</code> unless they want to: the documentation promises that you can swap them safely, and thanks to the Pipeline abstraction, this statement mostly holds true.</p>
+<h2 id="ready-made-pipelines">
+ Ready-made Pipelines
+ <a class="heading-link" href="#ready-made-pipelines">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>But how can the users know which sort of graph they have to build?</p>
+<p>Most NLP applications are made by a relatively limited number of high-level components: Retriever, Readers, Rankers, plus the occasional Classifier, Translator, or Summarizer. Systems requiring something more than these components used to be really rare, at least when talking about “query” pipelines (more on this later).</p>
+<p>Therefore, at this level of abstraction, there are just a few graph topologies possible. Better yet, they could each be mapped to high-level use cases such as semantic search, language-agnostic document search, hybrid retrieval, and so on.</p>
+<p>But the crucial point is that, in most cases, tailoring the application did not require any changes to the graph’s shape. Users only need to identify their use case, find an example or a tutorial defining the shape of the Pipeline they need, and then swap the single components with other instances from the same category until they find the best combination for their exact requirements.</p>
+<p>This workflow was evident and encouraged: it was the philosophy behind Finder as well, and from version 0.6.0, Haystack immediately provided what are called “<a href="https://docs.haystack.deepset.ai/docs/ready_made_pipelines" class="external-link" target="_blank" rel="noopener">Ready-made Pipelines</a>”: objects that initialized the graph on the user’s behalf, and expected as input the components to place in each point of the graph: for example a Reader and a Retriever, in case of simple Extractive QA.</p>
+<p>With this further abstraction on top of Pipeline, creating an NLP application became an action that doesn’t even require the user to be aware of the existence of the graph. In fact:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>pipeline <span style="color:#ff7b72;font-weight:bold">=</span> ExtractiveQAPipeline(reader, retriever)
+</span></span></code></pre></div><p>is enough to get your Extractive QA applications ready to answer your questions. And you can do so with just another line.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>answers <span style="color:#ff7b72;font-weight:bold">=</span> pipeline<span style="color:#ff7b72;font-weight:bold">.</span>run(query<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"What did Einstein work on?"</span>)
+</span></span></code></pre></div><h2 id="flexibility-powered-by-dags">
+ “Flexibility powered by DAGs”
+ <a class="heading-link" href="#flexibility-powered-by-dags">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>This abstraction is extremely powerful for the use cases that it was designed for. There are a few layers of ease of use vs. customization the user can choose from depending on their expertise, which help them progress from a simple ready-made Pipeline to fully custom graphs.</p>
+<p>However, the focus was oriented so much on the initial stages of the user’s journey that power-users’ needs were sometimes forgotten. Such issues didn’t show immediately, but quickly added friction as soon as the users tried to customize their system beyond the examples from the tutorials and the documentation.</p>
+<p>For an example of these issues, let’s talk about pipelines with branches. Here are two small, apparently very similar pipelines.</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/branching-query-pipelines.png" alt="Query Classification vs Hybrid Retrieval"></p>
+<p>The first Pipeline represents the Hybrid Retrieval use case we’ve met with before. Here, the Query node sends its outputs to both retrievers, and they both produce some output. For the Reader to make sense of this data, we need a Join node that merges the two lists into one and a Ranker that takes the lists and sorts them again by similarity to the query. Ranker then sends the rearranged list to the Reader.</p>
+<p>The second Pipeline instead performs a simpler form of Hybrid Retrieval. Here, the Query node sends its outputs to a Query Classifier, which then triggers only one of the two retrievers, the one that is expected to perform better on it. The triggered Retriever then sends its output directly to the Reader, which doesn’t need to know which Retriever the data comes from. So, in this case, we don’t need the Join node.</p>
+<p>The two pipelines are built as you would expect, with a bunch of <code>add_node</code> calls. You can even run them with the same identical code, which is the same code needed for every other Pipeline we’ve seen so far.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>pipeline_1 <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>pipeline_1<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>sparse_retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"SparseRetriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Query"</span>])
+</span></span><span style="display:flex;"><span>pipeline_1<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>dense_retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"DenseRetriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Query"</span>])
+</span></span><span style="display:flex;"><span>pipeline_1<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>join_documents, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"JoinDocuments"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"SparseRetriever"</span>, <span style="color:#a5d6ff">"DenseRetriever"</span>])
+</span></span><span style="display:flex;"><span>pipeline_1<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>rerank, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Ranker"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"JoinDocuments"</span>])
+</span></span><span style="display:flex;"><span>pipeline_1<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>reader, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Reader"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"SparseRetriever"</span>, <span style="color:#a5d6ff">"DenseRetriever"</span>])
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>answers <span style="color:#ff7b72;font-weight:bold">=</span> pipeline_1<span style="color:#ff7b72;font-weight:bold">.</span>run(query<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"What did Einstein work on?"</span>)
+</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>pipeline_2 <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>pipeline_2<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>query_classifier, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"QueryClassifier"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Query"</span>])
+</span></span><span style="display:flex;"><span>pipeline_2<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>sparse_retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"DPRRetriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"QueryClassifier"</span>])
+</span></span><span style="display:flex;"><span>pipeline_2<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>dense_retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"ESRetriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"QueryClassifier"</span>])
+</span></span><span style="display:flex;"><span>pipeline_2<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>reader, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Reader"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"SparseRetriever"</span>, <span style="color:#a5d6ff">"DenseRetriever"</span>])
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>answers <span style="color:#ff7b72;font-weight:bold">=</span> pipeline_2<span style="color:#ff7b72;font-weight:bold">.</span>run(query<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"What did Einstein work on?"</span>)
+</span></span></code></pre></div><p>Both pipelines run as you would expect them to. Hooray! Pipelines can branch and join!</p>
+<p>Now, let’s take the first Pipeline and customize it further.</p>
+<p>For example, imagine we want to expand language support to include French. The dense Retriever has no issues handling several languages as long as we select a multilingual model; however, the sparse Retriever needs the keywords to match, so we must translate the queries to English to find some relevant documents in our English-only knowledge base.</p>
+<p>Here is what the Pipeline ends up looking like. Language Classifier sends all French queries over <code>output_1</code> and all English queries over <code>output_2</code>. In this way, the query passes through the Translator node only if it is written in French.</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/multilingual-hybrid-retrieval.png" alt="Multilingual Hybrid Retrieval"></p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>pipeline <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>language_classifier, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"LanguageClassifier"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Query"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>translator, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Translator"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"LanguageClassifier.output_1"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>sparse_retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"SparseRetriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Translator"</span>, <span style="color:#a5d6ff">"LanguageClassifier.output_2"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>dense_retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"DenseRetriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"LanguageClassifier.output_1"</span>, <span style="color:#a5d6ff">"LanguageClassifier.output_2"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>join_documents, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"JoinDocuments"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"SparseRetriever"</span>, <span style="color:#a5d6ff">"DenseRetriever"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>rerank, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Ranker"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"JoinDocuments"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>reader, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Reader"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Ranker"</span>])
+</span></span></code></pre></div><p>But… wait. Let’s look again at the graph and at the code. DenseRetriever should receive <em>two</em> inputs from Language Classifier: both <code>output_1</code> and <code>output_2</code>, because it can handle both languages. What’s going on? Is this a bug in <code>draw()</code>?</p>
+<p>Thanks to the <code>debug=True</code> parameter of <code>Pipeline.run()</code>, we start inspecting what each node saw during the execution, and we realize quickly that our worst fears are true: this is a bug in the Pipeline implementation. The underlying library powering the Pipeline’s graphs takes the definition of Directed Acyclic Graphs very seriously and does not allow two nodes to be connected by more than one edge. There are, of course, other graph classes supporting this case, but Haystack happens to use the wrong one.</p>
+<p>Interestingly, Pipeline doesn’t even notice the problem and does not fail. It runs as the drawing suggests: when the query happens to be in French, only the sparse Retriever will process it.</p>
+<p>Clearly, this is not good for us.</p>
+<p>Well, let’s look for a workaround. Given that we’re Haystack power users by now, we realize that we can use a Join node with a single input as a “no-op” node. If we put it along one of the edges, that edge won’t directly connect Language Classifier and Dense Retriever, so the bug should be solved.</p>
+<p>So here is our current Pipeline:</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/multilingual-hybrid-retrieval-with-noop.png" alt="Multilingual Hybrid Retrieval with No-Op Joiner"></p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>pipeline <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>language_classifier, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"LanguageClassifier"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Query"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>translator, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Translator"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"LanguageClassifier.output_1"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>sparse_retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"SparseRetriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Translator"</span>, <span style="color:#a5d6ff">"LanguageClassifier.output_2"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>no_op_join, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"NoOpJoin"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"LanguageClassifier.output_1"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>dense_retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"DenseRetriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"NoOpJoin"</span>, <span style="color:#a5d6ff">"LanguageClassifier.output_2"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>join_documents, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"JoinDocuments"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"SparseRetriever"</span>, <span style="color:#a5d6ff">"DenseRetriever"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>rerank, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Ranker"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"JoinDocuments"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>reader, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Reader"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Ranker"</span>])
+</span></span></code></pre></div><p>Great news: the Pipeline now runs as we expect! However, when we run a French query, the results are better but still surprisingly bad.</p>
+<p>What now? Is the dense Retriever still not running? Is the Translation node doing a poor job?</p>
+<p>Some debugging later, we realize that the Translator is amazingly good and the Retrievers are both running. But we forgot another piece of the puzzle: Ranker needs the query to be in the same language as the documents. It requires the English version of the query, just like the sparse Retriever does. However, right now, it receives the original French query, and that’s the reason for the lack of performance. We soon realize that this is very important also for the Reader.</p>
+<p>So… how does the Pipeline pass the query down to the Ranker?</p>
+<p>Until this point, we didn’t need to know how exactly values are passed from one component to the next. We didn’t need to care about their inputs and outputs at all: Pipeline was doing all this dirty work for us. Suddenly, we need to tell the Pipeline which query to pass to the Ranker and we have no idea how to do that.</p>
+<p>Worse yet. There is <em>no way</em> to reliably do that. The documentation seems to blissfully ignore the topic, docstrings give us no pointers, and looking at <a href="https://github.com/deepset-ai/haystack/blob/aaee03aee87e96acd8791b9eff999055a8203237/haystack/pipelines/base.py#L483" class="external-link" target="_blank" rel="noopener">the routing code of Pipeline</a> we quickly get dizzy and cut the chase. We dig through the Pipeline API several times until we’re confident that there’s nothing that can help.</p>
+<p>Well, there must be at least some workaround. Maybe we can forget about this issue by rearranging the nodes.</p>
+<p>One easy way out is to translate the query for both retrievers instead of only for the sparse one. This solution also eliminates the NoOpJoin node we introduced earlier, so it doesn’t sound too bad.</p>
+<p>The Pipeline looks like this now.</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/multilingual-hybrid-retrieval-two-translators.png" alt="Multilingual Hybrid Retrieval with two Translators"></p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>pipeline <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>language_classifier, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"LanguageClassifier"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Query"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>translator, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Translator"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"LanguageClassifier.output_1"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>sparse_retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"SparseRetriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Translator"</span>, <span style="color:#a5d6ff">"LanguageClassifier.output_2"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>translator_2, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Translator2"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"LanguageClassifier.output_1"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>dense_retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"DenseRetriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Translator2"</span>, <span style="color:#a5d6ff">"LanguageClassifier.output_2"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>join_documents, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"JoinDocuments"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"SparseRetriever"</span>, <span style="color:#a5d6ff">"DenseRetriever"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>rerank, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Ranker"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"JoinDocuments"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>reader, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Reader"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Ranker"</span>])
+</span></span></code></pre></div><p>We now have two nodes that contain identical translator components. Given that they are stateless, we can surely place the same instance in both places, with different names, and avoid doubling its memory footprint just to work around a couple of Pipeline bugs. After all, Translator nodes use relatively heavy models for machine translation.</p>
+<p>This is what Pipeline replies as soon as we try.</p>
+<pre tabindex="0"><code>PipelineConfigError: Cannot add node 'Translator2'. You have already added the same
+instance to the Pipeline under the name 'Translator'.
+</code></pre><p>Okay, so it seems like we can’t re-use components in two places: there is an explicit check against this, for some reason. Alright, let’s rearrange <em>again</em> this Pipeline with this new constraint in mind.</p>
+<p>How about we first translate the query and then distribute it?</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/multilingual-hybrid-retrieval-translate-and-distribute.png" alt="Multilingual Hybrid Retrieval, translate-and-distribute"></p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>pipeline <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>language_classifier, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"LanguageClassifier"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Query"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>translator, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Translator"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"LanguageClassifier.output_1"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>sparse_retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"SparseRetriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Translator"</span>, <span style="color:#a5d6ff">"LanguageClassifier.output_2"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>dense_retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"DenseRetriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Translator"</span>, <span style="color:#a5d6ff">"LanguageClassifier.output_2"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>join_documents, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"JoinDocuments"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"SparseRetriever"</span>, <span style="color:#a5d6ff">"DenseRetriever"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>rerank, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Ranker"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"JoinDocuments"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>reader, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Reader"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Ranker"</span>])
+</span></span></code></pre></div><p>Looks neat: there is no way now for the original French query to reach Ranker now. Right?</p>
+<p>We run the pipeline again and soon realize that nothing has changed. The query received by Ranker is still in French, untranslated. Shuffling the order of the <code>add_node</code> calls and the names of the components in the <code>inputs</code> parameters seems to have no effect on the graph. We even try to connect Translator directly with Ranker in a desperate attempt to forward the correct value, but Pipeline now starts throwing obscure, apparently meaningless error messages like:</p>
+<pre tabindex="0"><code>BaseRanker.run() missing 1 required positional argument: 'documents'
+</code></pre><p>Isn’t Ranker receiving the documents from JoinDocuments? Where did they go?</p>
+<p>Having wasted far too much time on this relatively simple Pipeline, we throw the towel, go to Haystack’s Discord server, and ask for help.</p>
+<p>Soon enough, one of the maintainers shows up and promises a workaround ASAP. You’re skeptical at this point, but the workaround, in fact, exists.</p>
+<p>It’s just not very pretty.</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/multilingual-hybrid-retrieval-workaround.png" alt="Multilingual Hybrid Retrieval, working version"></p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>pipeline <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>language_classifier, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"LanguageClassifier"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Query"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>translator_workaround, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"TranslatorWorkaround"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"LanguageClassifier.output_2"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>sparse_retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"SparseRetriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"LanguageClassifier.output_1"</span>, <span style="color:#a5d6ff">"TranslatorWorkaround"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>dense_retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"DenseRetriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"LanguageClassifier.output_1"</span>, <span style="color:#a5d6ff">"TranslatorWorkaround"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>join_documents, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"JoinDocuments"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"SparseRetriever"</span>, <span style="color:#a5d6ff">"DenseRetriever"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>join_query_workaround, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"JoinQueryWorkaround"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"TranslatorWorkaround"</span>, <span style="color:#a5d6ff">"JoinDocuments"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>rerank, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Ranker"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"JoinQueryWorkaround"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>reader, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Reader"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Ranker"</span>])
+</span></span></code></pre></div><p>Note that you need two custom nodes: a wrapper for the Translator and a brand-new Join node.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">class</span> <span style="color:#f0883e;font-weight:bold">TranslatorWorkaround</span>(TransformersTranslator):
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> outgoing_edges <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#a5d6ff">1</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">def</span> <span style="color:#d2a8ff;font-weight:bold">run</span>(self, query):
+</span></span><span style="display:flex;"><span> results, edge <span style="color:#ff7b72;font-weight:bold">=</span> super()<span style="color:#ff7b72;font-weight:bold">.</span>run(query<span style="color:#ff7b72;font-weight:bold">=</span>query)
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">return</span> {<span style="color:#ff7b72;font-weight:bold">**</span>results, <span style="color:#a5d6ff">"documents"</span>: [] }, <span style="color:#a5d6ff">"output_1"</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">def</span> <span style="color:#d2a8ff;font-weight:bold">run_batch</span>(self, queries):
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">pass</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">class</span> <span style="color:#f0883e;font-weight:bold">JoinQueryWorkaround</span>(JoinNode):
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">def</span> <span style="color:#d2a8ff;font-weight:bold">run_accumulated</span>(self, inputs, <span style="color:#ff7b72;font-weight:bold">*</span>args, <span style="color:#ff7b72;font-weight:bold">**</span>kwargs):
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">return</span> {<span style="color:#a5d6ff">"query"</span>: inputs[<span style="color:#a5d6ff">0</span>]<span style="color:#ff7b72;font-weight:bold">.</span>get(<span style="color:#a5d6ff">"query"</span>, <span style="color:#79c0ff">None</span>), <span style="color:#a5d6ff">"documents"</span>: inputs[<span style="color:#a5d6ff">1</span>]<span style="color:#ff7b72;font-weight:bold">.</span>get(<span style="color:#a5d6ff">"documents"</span>, <span style="color:#79c0ff">None</span>)}, <span style="color:#a5d6ff">"output_1"</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">def</span> <span style="color:#d2a8ff;font-weight:bold">run_batch_accumulated</span>(self, inputs):
+</span></span><span style="display:flex;"><span> <span style="color:#ff7b72">pass</span>
+</span></span></code></pre></div><p>Along with this beautiful code, we also receive an explanation about how the <code>JoinQueryWorkaround</code> node works only for this specific Pipeline and is pretty hard to generalize, which is why it’s not present in Haystack right now. I’ll spare you the details: you will have an idea why by the end of this journey.</p>
+<p>Wanna play with this Pipeline yourself and try to make it work in another way? Check out the <a href="https://drive.google.com/file/d/18Gqfd0O828T71Gc-IHeU4v7OXwaPk7Fc/view?usp=sharing" class="external-link" target="_blank" rel="noopener">Colab</a> or the <a href="https://gist.github.com/ZanSara/33020a980f2f535e2529df4ca4e8f08a" class="external-link" target="_blank" rel="noopener">gist</a> and have fun.</p>
+<p>Having learned only that it’s better not to implement unusual branching patterns with Haystack unless you’re ready for a fight, let’s now turn to the indexing side of your application. We’ll stick to the basics this time.</p>
+<h2 id="indexing-pipelines">
+ Indexing Pipelines
+ <a class="heading-link" href="#indexing-pipelines">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>Indexing pipelines’ main goal is to transform files into Documents from which a query pipeline can later retrieve information. They mostly look like the following.</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/indexing-pipeline.png" alt="Indexing Pipeline"></p>
+<p>And the code looks just like how you would expect it.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>pipeline <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>file_type_classifier, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"FileTypeClassifier"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"File"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>text_converter, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"TextConverter"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"FileTypeClassifier.output_1"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>pdf_converter, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"PdfConverter"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"FileTypeClassifier.output_2"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>docx_converter, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"DocxConverter"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"FileTypeClassifier.output_4"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>join_documents, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"JoinDocuments"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"TextConverter"</span>, <span style="color:#a5d6ff">"PdfConverter"</span>, <span style="color:#a5d6ff">"DocxConverter"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>preprocessor, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Preprocessor"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"JoinDocuments"</span>])
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>document_store, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"DocumentStore"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Preprocessor"</span>])
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>pipeline<span style="color:#ff7b72;font-weight:bold">.</span>run(file_paths<span style="color:#ff7b72;font-weight:bold">=</span>paths)
+</span></span></code></pre></div><p>There is no surprising stuff here. The starting node is File instead of Query, which seems logical given that this Pipeline expects a list of files, not a query. There is a document store at the end which we didn’t use in query pipelines so far, but it’s not looking too strange. It’s all quite intuitive.</p>
+<p>Indexing pipelines are run by giving them the paths of the files to convert. In this scenario, more than one Converter may run, so we place a Join node before the PreProcessor to make sense of the merge. We make sure that the directory contains only files that we can convert, in this case, .txt, .pdf, and .docx, and then we run the code above.</p>
+<p>The code, however, fails.</p>
+<pre tabindex="0"><code>ValueError: Multiple non-default file types are not allowed at once.
+</code></pre><p>The more we look at the error, the less it makes sense. What are non-default file types? Why are they not allowed at once, and what can I do to fix that?</p>
+<p>We head for the documentation, where we find a lead.</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/filetypeclassifier-docs.png" alt="FileTypeClassifier documentation"></p>
+<p>So it seems like the File Classifier can only process the files if they’re all of the same type.</p>
+<p>After all we’ve been through with the Hybrid Retrieval pipelines, this sounds wrong. We know that Pipeline can run two branches at the same time. We’ve been doing it all the time just a moment ago. Why can’t FileTypeClassifier send data to two converters just like LanguageClassifier sends data to two retrievers?</p>
+<p>Turns out, this is <em>not</em> the same thing.</p>
+<p>Let’s compare the three pipelines and try to spot the difference.</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/all-branching-pipelines.png" alt="All branching pipelines, side by side"></p>
+<p>In the first case, Query sends the same identical value to both Retrievers. So, from the component’s perspective, there’s a single output being produced: the Pipeline takes care of copying it for all nodes connected to it.</p>
+<p>In the second case, QueryClassifier can send the query to either Retriever but never to both. So, the component can produce two different outputs, but at every run, it will always return just one.</p>
+<p>In the third case, FileTypeClassifier may need to produce two different outputs simultaneously: for example, one with a list of text files and one with a list of PDFs. And it turns out this can’t be done. This is, unfortunately, a well-known limitation of the Pipeline/BaseComponent API design.
+The output of a component is defined as a tuple, <code>(output_values, output_edge)</code>, and nodes can’t produce a list of these tuples to send different values to different nodes.</p>
+<p>That’s the end of the story. This time, there is no workaround. You must pass the files individually or forget about using a Pipeline for this task.</p>
+<h2 id="validation">
+ Validation
+ <a class="heading-link" href="#validation">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>On top of these challenges, other tradeoffs had to be taken for the API to look so simple at first impact. One of these is connection validation.</p>
+<p>Let’s imagine we quickly skimmed through a tutorial and got one bit of information wrong: we mistakenly believe that in an Extractive QA Pipeline, you need to place a Reader in front of a Retriever. So we sit down and write this.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>p <span style="color:#ff7b72;font-weight:bold">=</span> Pipeline()
+</span></span><span style="display:flex;"><span>p<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>reader, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Reader"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Query"</span>])
+</span></span><span style="display:flex;"><span>p<span style="color:#ff7b72;font-weight:bold">.</span>add_node(component<span style="color:#ff7b72;font-weight:bold">=</span>retriever, name<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"Retriever"</span>, inputs<span style="color:#ff7b72;font-weight:bold">=</span>[<span style="color:#a5d6ff">"Reader"</span>])
+</span></span></code></pre></div><p>Up to this point, running the script raises no error. Haystack is happy to connect these two components in this order. You can even <code>draw()</code> this Pipeline just fine.</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/swapped-retriever-reader.png" alt="Swapper Retriever/Reader Pipeline"></p>
+<p>Alright, so what happens when we run it?</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>res <span style="color:#ff7b72;font-weight:bold">=</span> p<span style="color:#ff7b72;font-weight:bold">.</span>run(query<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"What did Einstein work on?"</span>)
+</span></span></code></pre></div><pre tabindex="0"><code>BaseReader.run() missing 1 required positional argument: 'documents'
+</code></pre><p>This is the same error we’ve seen in the translating hybrid retrieval pipeline earlier, but fear not! Here, we can follow the suggestion of the error message by doing:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>res <span style="color:#ff7b72;font-weight:bold">=</span> p<span style="color:#ff7b72;font-weight:bold">.</span>run(query<span style="color:#ff7b72;font-weight:bold">=</span><span style="color:#a5d6ff">"What did Einstein work on?"</span>, documents<span style="color:#ff7b72;font-weight:bold">=</span>document_store<span style="color:#ff7b72;font-weight:bold">.</span>get_all_documents())
+</span></span></code></pre></div><p>And to our surprise, this Pipeline doesn’t crash. It just hangs there, showing an insanely slow progress bar, telling us that some inference is in progress. A few hours later, we kill the process and consider switching to another framework because this one is clearly very slow.</p>
+<p>What happened?</p>
+<p>The cause of this issue is the same that makes connecting Haystack components in a Pipeline so effortless, and it’s related to the way components and Pipeline communicate. If you check <code>Pipeline.run()</code>’s signature, you’ll see that it looks like this:</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#ff7b72">def</span> <span style="color:#d2a8ff;font-weight:bold">run</span>(
+</span></span><span style="display:flex;"><span> self,
+</span></span><span style="display:flex;"><span> query: Optional[str] <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#79c0ff">None</span>,
+</span></span><span style="display:flex;"><span> file_paths: Optional[List[str]] <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#79c0ff">None</span>,
+</span></span><span style="display:flex;"><span> labels: Optional[MultiLabel] <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#79c0ff">None</span>,
+</span></span><span style="display:flex;"><span> documents: Optional[List[Document]] <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#79c0ff">None</span>,
+</span></span><span style="display:flex;"><span> meta: Optional[Union[dict, List[dict]]] <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#79c0ff">None</span>,
+</span></span><span style="display:flex;"><span> params: Optional[dict] <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#79c0ff">None</span>,
+</span></span><span style="display:flex;"><span> debug: Optional[bool] <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#79c0ff">None</span>,
+</span></span><span style="display:flex;"><span>):
+</span></span></code></pre></div><p>which mirrors the <code>BaseComponent.run()</code> signature, the base class nodes have to inherit from.</p>
+<div class="highlight"><pre tabindex="0" style="color:#e6edf3;background-color:#0d1117;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#d2a8ff;font-weight:bold">@abstractmethod</span>
+</span></span><span style="display:flex;"><span><span style="color:#ff7b72">def</span> <span style="color:#d2a8ff;font-weight:bold">run</span>(
+</span></span><span style="display:flex;"><span> self,
+</span></span><span style="display:flex;"><span> query: Optional[str] <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#79c0ff">None</span>,
+</span></span><span style="display:flex;"><span> file_paths: Optional[List[str]] <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#79c0ff">None</span>,
+</span></span><span style="display:flex;"><span> labels: Optional[MultiLabel] <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#79c0ff">None</span>,
+</span></span><span style="display:flex;"><span> documents: Optional[List[Document]] <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#79c0ff">None</span>,
+</span></span><span style="display:flex;"><span> meta: Optional[dict] <span style="color:#ff7b72;font-weight:bold">=</span> <span style="color:#79c0ff">None</span>,
+</span></span><span style="display:flex;"><span>) <span style="color:#ff7b72;font-weight:bold">-></span> Tuple[Dict, str]:
+</span></span></code></pre></div><p>This match means a few things:</p>
+<ul>
+<li>
+<p>Every component can be connected to every other because their inputs are identical.</p>
+</li>
+<li>
+<p>Every component can only output the same variables received as input.</p>
+</li>
+<li>
+<p>It’s impossible to tell if it makes sense to connect two components because their inputs and outputs always match.</p>
+</li>
+</ul>
+<p>Take this with a grain of salt: the actual implementation is far more nuanced than what I just showed you, but the problem is fundamentally this: components are trying to be as compatible as possible with all others and they have no way to signal, to the Pipeline or to the users, that they’re meant to be connected only to some nodes and not to others.</p>
+<p>In addition to this problem, to respect the shared signature, components often take inputs that they don’t use. A Ranker only needs documents, so all the other inputs required by the run method signature go unused. What do components do with the values? It depends:</p>
+<ul>
+<li>Some have them in the signature and forward them unchanged.</li>
+<li>Some have them in the signature and don’t forward them.</li>
+<li>Some don’t have them in the signature, breaking the inheritance pattern, and Pipeline reacts by assuming that they should be added unchanged to the output dictionary.</li>
+</ul>
+<p>If you check closely the two workaround nodes for the Hybrid Retrieval pipeline we tried to build before, you’ll notice the fix entirely focuses on altering the routing of the unused parameters <code>query</code> and <code>documents</code> to make the Pipeline behave the way the user expects. However, this behavior does not generalize: a different pipeline would require another behavior, which is why the components behave differently in the first place.</p>
+<h2 id="wrapping-up">
+ Wrapping up
+ <a class="heading-link" href="#wrapping-up">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>I could go on for ages talking about the shortcomings of complex Pipelines, but I’d rather stop here.</p>
+<p>Along this journey into the guts of Haystack Pipelines, we’ve seen at the same time some beautiful APIs and the ugly consequences of their implementation. As always, there’s no free lunch: trying to over-simplify the interface will bite back as soon as the use cases become nontrivial.</p>
+<p>However, we believe that this concept has a huge potential and that this version of Pipeline can be improved a lot before the impact on the API becomes too heavy. In Haystack 2.0, armed with the experience we gained working with this implementation of Pipeline, we reimplemented it in a fundamentally different way, which will prevent many of these issues.</p>
+<p>In the next post, we’re going to see how.</p>
+<hr>
+<p><em>Next: <a href="https://www.zansara.dev/posts/2023-10-26-haystack-series-canals" >Canals: a new concept of Pipeline</a></em></p>
+<p><em>Previous: <a href="https://www.zansara.dev/posts/2023-10-11-haystack-series-why" >Why rewriting Haystack?!</a></em></p>
+<p><em>See the entire series here: <a href="https://www.zansara.dev/series/haystack-2.0-series/" >Haystack 2.0 series</a></em></p>
+
+
+
+
+ Office Hours: RAG Pipelines
+ https://www.zansara.dev/talks/2023-10-12-office-hours-rag-pipelines/
+ Thu, 12 Oct 2023 00:00:00 +0000
+
+ https://www.zansara.dev/talks/2023-10-12-office-hours-rag-pipelines/
+ <p><a href="https://drive.google.com/file/d/1UXGi4raiCQmrxOfOexL-Qh0CVbtiSm89/view?usp=drive_link" class="external-link" target="_blank" rel="noopener">Recording</a>, <a href="https://gist.github.com/ZanSara/5975901eea972c126f8e1c2341686dfb" class="external-link" target="_blank" rel="noopener">notebook</a>. All the material can also be found <a href="https://drive.google.com/drive/folders/17CIfoy6c4INs0O_X6YCa3CYXkjRvWm7X?usp=drive_link" class="external-link" target="_blank" rel="noopener">here</a>.</p>
+<hr>
+
+
+<div class='iframe-wrapper'>
+<iframe src="https://drive.google.com/file/d/1UXGi4raiCQmrxOfOexL-Qh0CVbtiSm89/preview" width="100%" height="100%" allow="autoplay"></iframe>
+</div>
+
+
+<p>In this <a href="https://discord.com/invite/VBpFzsgRVF" class="external-link" target="_blank" rel="noopener">Office Hours</a> I walk through the LLM support offered by Haystack 2.0 to this date: Generator, PromptBuilder, and how to connect them to different types of Retrievers to build Retrieval Augmented Generation (RAG) applications.</p>
+<p>In under 40 minutes we start from a simple query to ChatGPT up to a full pipeline that retrieves documents from the Internet, splits them into chunks and feeds them to an LLM to ground its replies.</p>
+<p>The talk indirectly shows also how Pipelines can help users compose these systems quickly, to visualize them, and helps them connect together different parts by producing verbose error messages.</p>
+
+
+
+
+ Why rewriting Haystack?!
+ https://www.zansara.dev/posts/2023-10-11-haystack-series-why/
+ Wed, 11 Oct 2023 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2023-10-11-haystack-series-why/
+ <p>Before even diving into what Haystack 2.0 is, how it was built, and how it works, let’s spend a few words about the whats and the whys.</p>
+<p>First of all, <em>what is</em> Haystack?</p>
+<p>And next, why on Earth did we decide to rewrite it from the ground up?</p>
+<h3 id="a-pioneer-framework">
+ A Pioneer Framework
+ <a class="heading-link" href="#a-pioneer-framework">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>Haystack is a relatively young framework, its initial release dating back to <a href="https://github.com/deepset-ai/haystack/releases/tag/0.1.0" class="external-link" target="_blank" rel="noopener">November 28th, 2019</a>. Back then, Natural Language Processing was a field that had just started moving its first step outside of research labs, and Haystack was one of the first libraries that promised enterprise-grade, production-ready NLP features. We were proud to enable use cases such as <a href="https://medium.com/deepset-ai/what-semantic-search-can-do-for-you-ea5b1e8dfa7f" class="external-link" target="_blank" rel="noopener">semantic search</a>, <a href="https://medium.com/deepset-ai/semantic-faq-search-with-haystack-6a03b1e13053" class="external-link" target="_blank" rel="noopener">FAQ matching</a>, document similarity, document summarization, machine translation, language-agnostic search, and so on.</p>
+<p>The field was niche but constantly moving, and research was lively. <a href="https://arxiv.org/abs/1810.04805" class="external-link" target="_blank" rel="noopener">The BERT paper</a> had been published a few months before Haystack’s first release, unlocking a small revolution. In the shade of much larger research labs, <a href="https://www.deepset.ai/" class="external-link" target="_blank" rel="noopener">deepset</a>, then just a pre-seed stage startup, was also pouring effort into <a href="https://arxiv.org/abs/2104.12741" class="external-link" target="_blank" rel="noopener">research</a> and <a href="https://huggingface.co/deepset" class="external-link" target="_blank" rel="noopener">model training</a>.</p>
+<p>In those times, competition was close to non-existent. The field was still quite technical, and most people didn’t fully understand its potential. We were free to explore features and use cases at our own pace and set the direction for our product. This allowed us to decide what to work on, what to double down on, and what to deprioritize, postpone, or ignore. Haystack was nurturing its own garden in what was fundamentally a green field.</p>
+<h3 id="chatgpt">
+ ChatGPT
+ <a class="heading-link" href="#chatgpt">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>This rather idyllic situation came to an end all too abruptly at the end of November 2022, when <a href="https://openai.com/blog/chatgpt" class="external-link" target="_blank" rel="noopener">ChatGPT was released</a>.</p>
+<p>For us in the NLP field, everything seemed to change overnight. Day by day. For <em>months</em>.</p>
+<p>The speed of progress went from lively to faster-than-light all at once. Every company with the budget to train an LLM seemed to be doing so, and researchers kept releasing new models just as quickly. Open-source contributors pushed to reduce the hardware requirements for inference lower and lower. My best memory of those times is the drama of <a href="https://github.com/facebookresearch/llama/pull/73" class="external-link" target="_blank" rel="noopener">LlaMa’s first “release”</a>: I remember betting on March 2nd that within a week I would be running LlaMa models on my laptop, and I wasn’t even surprised when my prediction <a href="https://news.ycombinator.com/item?id=35100086" class="external-link" target="_blank" rel="noopener">turned out true</a> with the release of <a href="https://github.com/ggerganov/llama.cpp" class="external-link" target="_blank" rel="noopener">llama.cpp</a> on March 10th.</p>
+<p>Of course, keeping up with this situation was far beyond us. Competitors started to spawn like mushrooms, and our space was quickly crowded with new startups, far more agile and aggressive than us. We suddenly needed to compete and realized we weren’t used to it.</p>
+<h3 id="promptnode-vs-farmreader">
+ PromptNode vs FARMReader
+ <a class="heading-link" href="#promptnode-vs-farmreader">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>Luckily, Haystack seemed capable of keeping up, at least for a while. Thanks to the efforts of <a href="https://twitter.com/vladblagoje" class="external-link" target="_blank" rel="noopener">Vladimir Blagojevic</a>, a few weeks after ChatGPT became a sensation, we added some decent support for LLMs in the form of <a href="https://github.com/deepset-ai/haystack/pull/3665" class="external-link" target="_blank" rel="noopener">PromptNode</a>. Our SaaS team could soon bring new LLM-powered features to our customers. We even managed to add support for <a href="https://github.com/deepset-ai/haystack/pull/3925" class="external-link" target="_blank" rel="noopener">Agents</a>, another hot topic in the wake of ChatGPT.</p>
+<p>However, the go-to library for LLMs was not Haystack in the mind of most developers. It was <a href="https://docs.langchain.com/docs/" class="external-link" target="_blank" rel="noopener">LangChain</a>, and for a long time, it seemed like we would never be able to challenge their status and popularity. Everyone was talking about it, everyone was building demos, products, and startups on it, its development speed was unbelievable and, in the day-to-day discourse of the newly born LLM community, Haystack was nowhere to be found.</p>
+<p>Why?</p>
+<p>That’s because no one even realized that Haystack, the semantic search framework from 2019, also supported LLMs. All our documentation, tutorials, blog posts, research efforts, models on HuggingFace, <em>everything</em> was pointing towards semantic search. LLMs were nowhere to be seen.</p>
+<p>And semantic search was going down <em>fast</em>.</p>
+<p><img src="https://www.zansara.dev/posts/2023-10-11-haystack-series-why/reader-model-downloads.png" alt="Reader Models downloads graph"></p>
+<p>The image above shows today’s monthly downloads for one of deepset’s most successful models on HuggingFace,
+<a href="https://huggingface.co/deepset/roberta-base-squad2" class="external-link" target="_blank" rel="noopener">deepset/roberta-base-squad2</a>. This model performs <a href="https://huggingface.co/tasks/question-answering" class="external-link" target="_blank" rel="noopener">extractive Question Answering</a>, our former primary use case before the release of ChatGPT. Even with more than one and a half million downloads monthly, this model is experiencing a disastrous collapse in popularity, and in the current landscape, it is unlikely to ever recover.</p>
+<h3 id="a-sort-of-pivot">
+ A (Sort Of) Pivot
+ <a class="heading-link" href="#a-sort-of-pivot">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>In this context, around February 2023, we decided to bet on the rise of LLMs and committed to focus all our efforts towards becoming the #1 framework powering production-grade LLM applications.</p>
+<p>As we quickly realized, this was by far not an easy proposition. Extractive QA was not only ingrained deeply in our public image but in our codebase as well: implementing and maintaining PromptNode was proving more and more painful by the day, and when we tried to fit the concept of Agents into Haystack, it felt uncomfortably like trying to force a square peg into a round hole.</p>
+<p>Haystack pipelines made extractive QA straightforward for the users and were highly optimized for this use case. But supporting LLMs was nothing like enabling extractive QA. Using Haystack for LLMs was quite a painful experience, and at the same time, modifying the Pipeline class to accommodate them seemed like the best way to mess with all the users that relied on the current Pipeline for their existing, value-generating applications. Making mistakes with Pipeline could ruin us.</p>
+<p>With this realization in mind, we took what seemed the best option for the future of Haystack: a rewrite. The knowledge and experience we gained while working on Haystack 1 could fuel the design of Haystack 2 and act as a reference frame for it. Unlike our competitors, we already knew a lot about how to make NLP work at scale. We made many mistakes we would avoid in our next iteration. We knew that focusing on the best possible developer experience fueled the growth of Haystack 1 in the early days, and we were committed to doing the same for the next version of it.</p>
+<p>So, the redesign of Haystack started, and it started from the concept of Pipeline.</p>
+<h3 id="fast-forward">
+ Fast-forward
+ <a class="heading-link" href="#fast-forward">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h3>
+<p>Haystack 2.0 hasn’t been released yet, but for now, it seems that we have made the right decision at the start of the year.</p>
+<p>Haystack’s name is starting to appear more often in discussions around LLMs. The general tone of the community is steadily shifting, and scaling up, rather than experimenting, is now the focus. Competitors are re-orienting themselves toward production-readiness, something we’re visibly more experienced with. At the same time, LangChain is becoming a victim of its own success, collecting more and more criticism for its lack of documentation, leaky abstractions, and confusing architecture. Other competitors are gaining steam, but the overall landscape no longer feels as hostile.</p>
+<p>In the next post, I will explore the technical side of Haystack 2.0 and delve deeper into the concept of Pipelines: what they are, how to use them, how they evolved from Haystack 1 to Haystack 2, and why.</p>
+<hr>
+<p><em>Next: <a href="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline" >Haystack’s Pipeline - A Deep Dive</a></em></p>
+<p><em>Previous: <a href="https://www.zansara.dev/posts/2023-10-10-haystack-series-intro" >Haystack 2.0: What is it?</a></em></p>
+<p><em>See the entire series here: <a href="https://www.zansara.dev/series/haystack-2.0-series/" >Haystack 2.0 series</a></em></p>
+
+
+
+
+ Haystack 2.0: What is it?
+ https://www.zansara.dev/posts/2023-10-10-haystack-series-intro/
+ Tue, 10 Oct 2023 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2023-10-10-haystack-series-intro/
+ <p>December is finally approaching, and with it the release of a <a href="https://github.com/deepset-ai/haystack" class="external-link" target="_blank" rel="noopener">Haystack</a> 2.0. At <a href="https://www.deepset.ai/" class="external-link" target="_blank" rel="noopener">deepset</a>, we’ve been talking about it for months, we’ve been iterating on the core concepts what feels like a million times, and it looks like we’re finally getting ready for the approaching deadline.</p>
+<p>But what is it that makes this release so special?</p>
+<p>In short, Haystack 2.0 is a complete rewrite. A huge, big-bang style change. Almost no code survived the migration unmodified: we’ve been across the entire 100,000+ lines of the codebase and redone everything in under a year. For our small team, this is a huge accomplishment.</p>
+<p>In this series, I want to explain what Haystack 2 is from the perspective of the team that developed it. I’m gonna talk about what makes the new Pipeline so different from the old one, how to use new components and features, how these compare with the equivalent in Haystack 1 (when possible) and the principles that led the redesign. I had the pleasure (and sometimes the burden) of being involved in nearly all aspects of this process, from the requirements definition to the release, and I drove many of them through several iterations. In these posts, you can expect a mix of technical details and some diversions on the history and rationale behind each decision, as I’ve seen and understood them.</p>
+<p>For the curious readers, we have already released a lot of information about Haystack 2.0: check out this <a href="https://github.com/deepset-ai/haystack/discussions/5568" class="external-link" target="_blank" rel="noopener">this Github Discussion</a>, or join us on <a href="https://discord.com/invite/VBpFzsgRVF" class="external-link" target="_blank" rel="noopener">Haystack’s Discord server</a> and peek into the <code>haystack-2.0</code> channel for regular updates. We are also slowly building <a href="https://docs.haystack.deepset.ai/v2.0/docs" class="external-link" target="_blank" rel="noopener">brand new documentation</a> for everything, and don’t worry: we’ll make sure to make it as outstanding as the Haystack 1.x version is.</p>
+<p>We also regularly feature 2.0 features in our Office Hours on Discord. Follow <a href="https://twitter.com/Haystack_AI" class="external-link" target="_blank" rel="noopener">@Haystack_AI</a> or <a href="https://twitter.com/deepset_ai" class="external-link" target="_blank" rel="noopener">@deepset_ai</a> on Twitter to stay up-to-date, or <a href="https://www.linkedin.com/company/deepset-ai" class="external-link" target="_blank" rel="noopener">deepset</a> on Linkedin. And you’ll find me and the rest of the team on <a href="https://github.com/deepset-ai/haystack" class="external-link" target="_blank" rel="noopener">GitHub</a> frantically (re)writing code and filing down the rough edges before the big release.</p>
+<p>Stay tuned!</p>
+<hr>
+<p><em>Next: <a href="https://www.zansara.dev/posts/2023-10-11-haystack-series-why" >Why rewriting Haystack?!</a></em></p>
+<p><em>See the entire series here: <a href="https://www.zansara.dev/series/haystack-2.0-series/" >Haystack 2.0 series</a></em></p>
+
+
+
+
+ An (unofficial) Python SDK for Verbix
+ https://www.zansara.dev/posts/2023-09-10-python-verbix-sdk/
+ Sun, 10 Sep 2023 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2023-09-10-python-verbix-sdk/
+ <p>PyPI package: <a href="https://pypi.org/project/verbix-sdk/" class="external-link" target="_blank" rel="noopener">https://pypi.org/project/verbix-sdk/</a></p>
+<p>GitHub Repo: <a href="https://github.com/ZanSara/verbix-sdk" class="external-link" target="_blank" rel="noopener">https://github.com/ZanSara/verbix-sdk</a></p>
+<p>Minimal Docs: <a href="https://github.com/ZanSara/verbix-sdk/blob/main/README.md" class="external-link" target="_blank" rel="noopener">https://github.com/ZanSara/verbix-sdk/blob/main/README.md</a></p>
+<hr>
+<p>As part of a larger side project which is still in the works (<a href="https://github.com/ebisu-flashcards" class="external-link" target="_blank" rel="noopener">Ebisu Flashcards</a>), these days I found myself looking for some decent API for verbs conjugations in different languages. My requirements were “simple”:</p>
+<ul>
+<li>Supports many languages, including Italian, Portuguese and Hungarian</li>
+<li>Conjugates irregulars properly</li>
+<li>Offers an API access to the conjugation tables</li>
+<li>Refuses to conjugate anything except for known verbs</li>
+<li>(Optional) Highlights the irregularities in some way</li>
+</ul>
+<p>Surprisingly these seem to be a shortage of good alternatives in this field. All websites that host polished conjugation data don’t seem to offer API access (looking at you, <a href="https://conjugator.reverso.net" class="external-link" target="_blank" rel="noopener">Reverso</a> – you’ll get your own post one day), and most of the simples ones use heuristics to conjugate, which makes them very prone to errors. So for now I ended up choosing <a href="https://verbix.com" class="external-link" target="_blank" rel="noopener">Verbix</a> to start from.</p>
+<p>Unfortunately the website doesn’t inspire much confidence. I attempted to email the creator just to see them <a href="https://verbix.com/contact.html" class="external-link" target="_blank" rel="noopener">close their email account</a> a while later, an <a href="https://api.verbix.com/" class="external-link" target="_blank" rel="noopener">update in their API</a> seems to have stalled half-way, and the <a href="https://verb-blog.verbix.com/" class="external-link" target="_blank" rel="noopener">blog seems dead</a>. I often have the feeling this site might go under any minute, as soon as their domain registration expires.</p>
+<p>But there are pros to it, as long as it lasts. Verbix offers verbs conjugation and nouns declination tables for some <a href="https://verbix.com/languages/" class="external-link" target="_blank" rel="noopener">very niche languages, dialects and conlangs</a>, to a degree that many other popular websites does not even come close. To support such variety they use heuristic to create the conjugation tables, which is not the best: for Hungarian, for example, I could easily get it to conjugate for me <a href="https://verbix.com/webverbix/go.php?T1=meegy&Submit=Go&D1=121&H1=221" class="external-link" target="_blank" rel="noopener">verbs that don’t exist</a> or that have spelling mistakes. On the other hand their API do have a field that says whether the verb is known or not, which is a great way to filter out false positives.</p>
+<p>So I decided to go the extra mile and I wrote a small Python SDK for their API: <a href="https://pypi.org/project/verbix-sdk/" class="external-link" target="_blank" rel="noopener">verbix-sdk</a>. Enjoy it while it lasts…</p>
+
+
+
+
+ Office Hours: Haystack 2.0
+ https://www.zansara.dev/talks/2023-08-03-office-hours-haystack-2.0-status/
+ Thu, 03 Aug 2023 00:00:00 +0000
+
+ https://www.zansara.dev/talks/2023-08-03-office-hours-haystack-2.0-status/
+ <p><a href="https://drive.google.com/file/d/1PyAlvJ22Z6o1bls07Do5kx2WMTdotsM7/view?usp=drive_link" class="external-link" target="_blank" rel="noopener">Recording</a>, <a href="https://drive.google.com/file/d/1QFNisUk2HzwRL_27bpr338maxLvDBr9D/preview" class="external-link" target="_blank" rel="noopener">slides</a>. All the material can also be found <a href="https://drive.google.com/drive/folders/1zmXwxsSgqDgvYf2ptjHocdtzOroqaudw?usp=drive_link" class="external-link" target="_blank" rel="noopener">here</a>.</p>
+<hr>
+
+
+<div class='iframe-wrapper'>
+<iframe src="https://drive.google.com/file/d/1PyAlvJ22Z6o1bls07Do5kx2WMTdotsM7/preview" width="100%" height="100%" allow="autoplay"></iframe>
+</div>
+
+
+<p>In this <a href="https://discord.com/invite/VBpFzsgRVF" class="external-link" target="_blank" rel="noopener">Office Hours</a> I’ve presented for the first time to our Discord community a preview of the upcoming 2.0 release of Haystack, which has been in the works since the start of the year. As rumors started to arise at the presence of a <code>preview</code> module in the latest Haystack 1.x releases, we took the opportunity to share this early draft of the project to collect early feedback.</p>
+<p>Haystack 2.0 is a total rewrite that rethinks many of the core concepts of the framework and makes LLMs support its primary concern, but makes sure to support all the usecases its predecessor enabled. The rewrite addresses some well-know, old issues about the pipeline’s design, the relationship between the pipeline, its components, and the document stores, and aims at improving drastically the developer experience and the framework’s extensibility.</p>
+<p>As the main designer of this rewrite, I walked the community through a slightly re-hashed version of the slide deck I’ve presented internally just a few days earlier in an All Hands on the same topic.</p>
+
+
+
+
+ OpenNLP Meetup: A Practical Introduction to Image Retrieval
+ https://www.zansara.dev/talks/2022-12-01-open-nlp-meetup/
+ Thu, 01 Dec 2022 00:00:00 +0000
+
+ https://www.zansara.dev/talks/2022-12-01-open-nlp-meetup/
+ <p><a href="https://www.youtube.com/watch?v=7Idjl3OR0FY" class="external-link" target="_blank" rel="noopener">Youtube link</a>,
+<a href="https://gist.github.com/ZanSara/dc4b22e7ffe2a56647e0afba7537c46b" class="external-link" target="_blank" rel="noopener">slides</a>, <a href="https://gist.github.com/ZanSara/9e8557830cc866fcf43a2c5623688c74" class="external-link" target="_blank" rel="noopener">Colab</a> (live coding).
+All the material can also be found <a href="https://drive.google.com/drive/folders/1_3b8PsvykHeM0jSHsMUWQ-4h_VADutcX?usp=drive_link" class="external-link" target="_blank" rel="noopener">here</a>.</p>
+<hr>
+
+
+<div class='iframe-wrapper'>
+<iframe src="https://drive.google.com/file/d/19mxD-xUJ-14G-2XAqXEVpZfqR2MsSZTn/preview" width="100%" height="100%" allow="autoplay"></iframe>
+</div>
+
+
+<h2 id="a-practical-introduction-to-image-retrieval">
+ A Practical Introduction to Image Retrieval
+ <a class="heading-link" href="#a-practical-introduction-to-image-retrieval">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p><em>by Sara Zanzottera from deepset</em></p>
+<p>Search should not be limited to text only. Recently, Transformers-based NLP models started crossing the boundaries of text data and exploring the possibilities of other modalities, like tabular data, images, audio files, and more. Text-to-text generation models like GPT now have their counterparts in text-to-image models, like Stable Diffusion. But what about search? In this talk we’re going to experiment with CLIP, a text-to-image search model, to look for animals matching specific characteristics in a dataset of pictures. Does CLIP know which one is “The fastest animal in the world”?</p>
+<hr>
+<p>For the 7th <a href="https://www.meetup.com/open-nlp-meetup/" class="external-link" target="_blank" rel="noopener">OpenNLP meetup</a> I presented the topic of Image Retrieval, a feature that I’ve recently added to Haystack in the form of a <a href="https://docs.haystack.deepset.ai/docs/retriever#multimodal-retrieval" class="external-link" target="_blank" rel="noopener">MultiModal Retriever</a> (see the <a href="https://haystack.deepset.ai/tutorials/19_text_to_image_search_pipeline_with_multimodal_retriever" class="external-link" target="_blank" rel="noopener">Tutorial</a>).</p>
+<p>The talk consists of 5 parts:</p>
+<ul>
+<li>An introduction of the topic of Image Retrieval</li>
+<li>A mention of the current SOTA model (CLIP)</li>
+<li>An overview of Haystack</li>
+<li>A step-by-step description of how image retrieval applications can be implemented with Haystack</li>
+<li>A live coding session where I start from a blank Colab notebook and build a fully working image retrieval system from the ground up, to the point where I can run queries live.</li>
+</ul>
+<p>Towards the end I mention briefly an even more advanced version of this image retrieval system, which I had no time to implement live. However, I later built a notebook implementing such system and you can find it here: <a href="https://gist.github.com/ZanSara/31ed3fc8252bb74b1952f2d0fe253ed0" class="external-link" target="_blank" rel="noopener">Cheetah.ipynb</a></p>
+<p>The slides were generated from the linked Jupyter notebook with <code>jupyter nbconvert Dec_1st_OpenNLP_Meetup.ipynb --to slides --post serve</code>.</p>
+
+
+
+
+ Adopting PyQt For Beam Instrumentation GUI Development At CERN
+ https://www.zansara.dev/publications/thpv014/
+ Tue, 01 Mar 2022 00:00:00 +0000
+
+ https://www.zansara.dev/publications/thpv014/
+ <h2 id="abstract">
+ Abstract
+ <a class="heading-link" href="#abstract">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>As Java GUI toolkits become deprecated, the Beam Instrumentation (BI)group at CERN has investigated alternatives and selected PyQt as one of the suitable technologies for future GUIs, in accordance with the paper presented at ICALEPCS19. This paper presents tools created, or adapted, to seamlessly integrate future PyQt GUI development alongside current Java oriented workflows and the controls environment. This includes (a) creating a project template and a GUI management tool to ease and standardize our development process, (b) rewriting our previously Java-centric Expert GUI Launcher to be language-agnostic and (c) porting a selection of operational GUIs from Java to PyQt, to test the feasibility of the development process and identify bottlenecks. To conclude, the challenges we anticipate for the BI GUI developer community in adopting this new technology are also discussed.</p>
+<hr>
+<p>Get the full text here: <a href="https://www.zansara.dev/publications/thpv014.pdf" >Adopting PyQt For Beam Instrumentation GUI Development At CERN</a></p>
+<p>Get the poster: <a href="https://www.zansara.dev/publications/thpv014-poster.pdf" >PDF</a></p>
+<p>Publisher’s entry: <a href="https://accelconf.web.cern.ch/icalepcs2021/doi/JACoW-ICALEPCS2021-THPV014.html" class="external-link" target="_blank" rel="noopener">THPV014</a></p>
+
+
+
+
+ Evolution of the CERN Beam Instrumentation Offline Analysis Framework (OAF)
+ https://www.zansara.dev/publications/thpv042/
+ Sat, 11 Dec 2021 00:00:00 +0000
+
+ https://www.zansara.dev/publications/thpv042/
+ <h2 id="abstract">
+ Abstract
+ <a class="heading-link" href="#abstract">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>The CERN accelerators require a large number of instruments, measuring different beam parameters like position, losses, current etc. The instruments’ associated electronics and software also produce information about their status. All these data are stored in a database for later analysis. The Beam Instrumentation group developed the Offline Analysis Framework some years ago to regularly and systematically analyze these data. The framework has been successfully used for nearly 100 different analyses that ran regularly by the end of the LHC run 2. Currently it is being updated for run 3 with modern and efficient tools to improve its usability and data analysis power. In particular, the architecture has been reviewed to have a modular design to facilitate the maintenance and the future evolution of the tool. A new web based application is being developed to facilitate the users’ access both to online configuration and to results. This paper will describe all these evolutions and outline possible lines of work for further improvements.</p>
+<hr>
+<p>Get the full text here: <a href="https://www.zansara.dev/publications/thpv042.pdf" >Evolution of the CERN Beam Instrumentation Offline Analysis Framework (OAF)</a></p>
+<p>Publisher’s entry: <a href="https://accelconf.web.cern.ch/icalepcs2021/doi/JACoW-ICALEPCS2021-THPV042.html" class="external-link" target="_blank" rel="noopener">THPV042</a>.</p>
+
+
+
+
+ My Dotfiles
+ https://www.zansara.dev/posts/2021-12-11-dotfiles/
+ Sat, 11 Dec 2021 00:00:00 +0000
+
+ https://www.zansara.dev/posts/2021-12-11-dotfiles/
+ <p>GitHub Repo: <a href="https://github.com/ZanSara/dotfiles" class="external-link" target="_blank" rel="noopener">https://github.com/ZanSara/dotfiles</a></p>
+<hr>
+<p>What Linux developer would I be if I didn’t also have my very own dotfiles repo?</p>
+<p>After many years of iterations I finally found a combination that lasted quite a while, so I figured it’s time to treat them as a real project. It was originally optimized for my laptop, but then I realized it works quite well on my three-monitor desk setup as well without major issues.</p>
+<p>It sports:</p>
+<ul>
+<li><a href="https://github.com/Airblader/i3" class="external-link" target="_blank" rel="noopener">i3-wm</a> as window manager (of course, with gaps),</li>
+<li>The typical trio of <a href="https://github.com/polybar/polybar" class="external-link" target="_blank" rel="noopener">polybar</a> , <a href="https://github.com/davatorium/rofi" class="external-link" target="_blank" rel="noopener">rofi</a> and <a href="https://github.com/dunst-project/dunst" class="external-link" target="_blank" rel="noopener">dunst</a> to handle top bar, start menu and notifications respectively,</li>
+<li>The odd choice of <a href="https://github.com/nullgemm/ly" class="external-link" target="_blank" rel="noopener">Ly</a> as my display manager. I just love the minimal, TUI aesthetics of it. Don’t forget to enable Doom’s flames!</li>
+<li>A minimalistic animated background from <a href="https://www.jwz.org/xscreensaver/screenshots/" class="external-link" target="_blank" rel="noopener">xscreensaver</a>, <a href="https://www.youtube.com/watch?v=spQRFDmDMeg" class="external-link" target="_blank" rel="noopener">Grav</a>. It’s configured to leave no trails and stay black and white. An odd choice, and yet it manages to use no resources, stay very minimal, and bring a very (in my opinion) futuristic look to the entire setup.</li>
+<li><a href="https://github.com/ohmybash/oh-my-bash/tree/master/themes/font" class="external-link" target="_blank" rel="noopener">OhMyBash</a> with the <a href="https://github.com/ohmybash/oh-my-bash/tree/master/themes/font" class="external-link" target="_blank" rel="noopener">font</a> theme,</li>
+<li>Other small amenities, like <a href="https://docs.rockylinux.org/gemstones/nmtui/" class="external-link" target="_blank" rel="noopener">nmtui</a> for network management, Japanese-numerals as workspace indicators, etc..</li>
+</ul>
+<p>Feel free to take what you like. If you end up using any of these, make sure to share the outcomes!</p>
+
+
+
+
+ Ebisu Flashcards - In Progress!
+ https://www.zansara.dev/projects/ebisu-flashcards/
+ Tue, 01 Jun 2021 00:00:00 +0000
+
+ https://www.zansara.dev/projects/ebisu-flashcards/
+
+
+
+
+ ZanzoCam: An open-source alpine web camera
+ https://www.zansara.dev/talks/2021-05-24-zanzocam-pavia/
+ Mon, 24 May 2021 00:00:00 +0000
+
+ https://www.zansara.dev/talks/2021-05-24-zanzocam-pavia/
+ <p>Slides: <a href="https://www.zansara.dev/talks/2021-05-24-zanzocam-pavia.pdf" >ZanzoCam: An open-source alpine web camera</a></p>
+<hr>
+<p>On May 24th 2021 I held a talk about the <a href="https://zanzocam.github.io/en" class="external-link" target="_blank" rel="noopener">ZanzoCam project</a>
+as invited speaker for the <a href="http://hsw2021.gnudd.com/" class="external-link" target="_blank" rel="noopener">“Hardware and Software Codesign”</a> course at
+<a href="https://portale.unipv.it/it" class="external-link" target="_blank" rel="noopener">Università di Pavia</a>.</p>
+<p>The slides go through the entire lifecycle of the <a href="https://zanzocam.github.io/en" class="external-link" target="_blank" rel="noopener">ZanzoCam project</a>,
+from the very inception of it, the market research, our decision process, earlier prototypes, and
+then goes into a more detailed explanation of the the design and implementation of the project from
+a hardware and software perspective, with some notes about our financial situation and project management.</p>
+
+
+
+
+ Our Journey From Java to PyQt and Web For CERN Accelerator Control GUIs
+ https://www.zansara.dev/publications/tucpr03/
+ Sun, 30 Aug 2020 00:00:00 +0000
+
+ https://www.zansara.dev/publications/tucpr03/
+ <h2 id="abstract">
+ Abstract
+ <a class="heading-link" href="#abstract">
+ <i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
+ <span class="sr-only">Link to heading</span>
+ </a>
+</h2>
+<p>For more than 15 years, operational GUIs for accelerator controls and some lab applications for equipment experts have been developed in Java, first with Swing and more recently with JavaFX. In March 2018, Oracle announced that Java GUIs were not part of their strategy anymore*. They will not ship JavaFX after Java 8 and there are hints that they would like to get rid of Swing as well. This was a wakeup call for us. We took the opportunity to reconsider all technical options for developing operational GUIs. Our options ranged from sticking with JavaFX, over using the Qt framework (either using PyQt or developing our own Java Bindings to Qt), to using Web technology both in a browser and in native desktop applications. This article explains the reasons for moving away from Java as the main GUI technology and describes the analysis and hands-on evaluations that we went through before choosing the replacement.</p>
+<hr>
+<p>Get the full text here: <a href="https://www.zansara.dev/publications/tucpr03.pdf" >Our Journey From Java to PyQt and Web For CERN Accelerator Control GUIs</a></p>
+<p>Publisher’s entry: <a href="https://accelconf.web.cern.ch/icalepcs2019/doi/JACoW-ICALEPCS2019-TUCPR03.html" class="external-link" target="_blank" rel="noopener">TUCPR03</a>.</p>
+
+
+
+
+ ZanzoCam
+ https://www.zansara.dev/projects/zanzocam/
+ Wed, 01 Jan 2020 00:00:00 +0000
+
+ https://www.zansara.dev/projects/zanzocam/
+ <p>Main website: <a href="https://zanzocam.github.io/" class="external-link" target="_blank" rel="noopener">https://zanzocam.github.io/</a></p>
+<hr>
+<p>ZanzoCam is a low-power, low-frequency camera based on Raspberry Pi, designed to operate autonomously in remote locations and under harsh conditions. It was designed and developed between 2019 and 2021 for <a href="https://www.cai.it/gruppo_regionale/gr-lombardia/" class="external-link" target="_blank" rel="noopener">CAI Lombardia</a> by a team of two people, with me as the software developer and the other responsible for the hardware design. CAI later deployed several of these devices on their affiliate huts.</p>
+<p>ZanzoCams are designed to work reliably in the harsh conditions of alpine winters, be as power-efficient as possible, and tolerate unstable network connections: they feature a robust HTTP- or FTP-based picture upload strategy which is remotely configurable from a very simple, single-file web panel. The camera software also improves on the basic capabilities of picamera to take pictures in dark conditions, making ZanzoCams able to shoot good pictures for a few hours after sunset.</p>
+<p>The camera is highly configurable: photo size and frequency, server address and protocol, all the overlays (color, size, position, text and images) and several other parameters can be configured remotely without the need to expose any ports of the device to the internet. They work reliably without the need for a VPN and at the same time are quite secure by design.</p>
+<p>ZanzoCams mostly serve CAI and the hut managers for self-promotion, and help hikers and climbers assess the local conditions before attempting a hike. Pictures taken for this purposes are sent to <a href="https://www.rifugi.lombardia.it/" class="external-link" target="_blank" rel="noopener">RifugiLombardia</a>, and you can see many of them <a href="https://www.rifugi.lombardia.it/territorio-lombardo/webcam" class="external-link" target="_blank" rel="noopener">at this page</a>.</p>
+<p>However, it has also been used by glaciologists to monitor glacier conditions, outlook and extension over the years. <a href="https://www.servizioglaciologicolombardo.it/webcam-3" class="external-link" target="_blank" rel="noopener">Here you can see their webcams</a>, some of which are ZanzoCams.</p>
+<p>Here is the latest picture from <a href="https://maps.app.goo.gl/PwdVC82VHwdPZJDE6" class="external-link" target="_blank" rel="noopener">Rifugio M. Del Grande - R. Camerini</a>, the test location for the original prototype:</p>
+<p><img src="https://webcam.rifugi.lombardia.it/rifugio/00003157/pictures/image__0.jpg" alt="ZanzoCam of Rifugio M. Del Grande - R. Camerini"></p>
+<p>And here is one of the cameras serving a local glaciology research group, <a href="https://www.servizioglaciologicolombardo.it/" class="external-link" target="_blank" rel="noopener">Servizio Glaciologico Lombardo</a>:</p>
+<p><img src="https://webcam.rifugi.lombardia.it/rifugio/90003157/pictures/image__0.jpg" alt="ZanzoCam of M. Disgrazia"></p>
+<p>Both of these cameras are fully solar-powered.</p>
+<p>ZanzoCam is fully open-source: check the <a href="https://github.com/ZanzoCam?view_as=public" class="external-link" target="_blank" rel="noopener">GitHub repo</a>. Due to this decision of open-sourcing the project, I was invited by <a href="https://portale.unipv.it/it" class="external-link" target="_blank" rel="noopener">Università di Pavia</a> to hold a lecture about the project as part of their <a href="http://hsw2021.gnudd.com/" class="external-link" target="_blank" rel="noopener">“Hardware and Software Codesign”</a>. Check out the slides of the lecture <a href="talks/zanzocam-pavia/" >here</a>.</p>
+
+
+
+
+ Evaluation of Qt as GUI Framework for Accelerator Controls
+ https://www.zansara.dev/publications/msc-thesis/
+ Thu, 20 Dec 2018 00:00:00 +0000
+
+ https://www.zansara.dev/publications/msc-thesis/
+ <p>This is the full-text of my MSc thesis, written in collaboration with
+<a href="https://www.polimi.it/" class="external-link" target="_blank" rel="noopener">Politecnico di Milano</a> and <a href="https://home.cern/" class="external-link" target="_blank" rel="noopener">CERN</a>.</p>
+<hr>
+<p>Get the full text here: <a href="https://www.zansara.dev/publications/msc-thesis.pdf" >Evaluation of Qt as GUI Framework for Accelerator Controls</a></p>
+<p>Publisher’s entry: <a href="https://hdl.handle.net/10589/144860" class="external-link" target="_blank" rel="noopener">10589/144860</a>.</p>
+
+
+
+
+ CAI Sovico's Website
+ https://www.zansara.dev/projects/booking-system/
+ Fri, 01 Jan 2016 00:00:00 +0000
+
+ https://www.zansara.dev/projects/booking-system/
+ <p>Main website: <a href="https://www.caisovico.it" class="external-link" target="_blank" rel="noopener">https://www.caisovico.it</a></p>
+<hr>
+<p>Since my bachelor studies I have maintained the IT infrastructure of an alpine hut, <a href="https://maps.app.goo.gl/PwdVC82VHwdPZJDE6" class="external-link" target="_blank" rel="noopener">Rifugio M. Del Grande - R. Camerini</a>. I count this as my first important project, one that people, mostly older and not very tech savvy, depended on to run a real business.</p>
+<p>The website went through several iterations as web technologies evolved, and well as the type of servers we could afford. Right now it features minimal HTML/CSS static pages, plus a reservations system written on a PHP 8 / MySQL backend with a vanilla JS frontend. It also includes an FTP server that supports a couple of <a href="https://www.zansara.dev/projects/zanzocam/" >ZanzoCams</a> and a <a href="http://www.meteoproject.it/ftp/stazioni/caisovico/" class="external-link" target="_blank" rel="noopener">weather monitoring station</a>.</p>
+
+
+
+
+ About
+ https://www.zansara.dev/about/
+ Mon, 01 Jan 0001 00:00:00 +0000
+
+ https://www.zansara.dev/about/
+ <p>I am a Python and Gen AI software engineer based in Portugal, currently working as Lead AI Engineer for
+<a href="https://www.kwal.ai/" class="external-link" target="_blank" rel="noopener">Kwal</a> on voice agents and conversation analysis with LLMs for the recruitment industry.
+I am also open to collaborations on short-term projects and to give <a href="https://www.zansara.dev/talks" >talks</a> on the topic of AI, LLMs, and
+related fields.</p>
+<p>Until recently I worked for <a href="https://www.deepset.ai/" class="external-link" target="_blank" rel="noopener">deepset</a>, a German startup working on NLP
+<a href="https://www.deepset.ai/about" class="external-link" target="_blank" rel="noopener">since “before it was cool”</a>, where I was the
+<a href="https://github.com/deepset-ai/haystack/graphs/contributors" class="external-link" target="_blank" rel="noopener">main contributor</a> of
+<a href="https://haystack.deepset.ai/" class="external-link" target="_blank" rel="noopener">Haystack</a>, their open-source framework for building highly customizable,
+production-ready NLP and LLM applications.</p>
+<p>Previously I worked for a few years at <a href="https://home.cern/" class="external-link" target="_blank" rel="noopener">CERN</a>, where I began my software engineering career.
+During my time there I had the privilege of driving one <a href="https://www.zansara.dev/publications/tucpr03/" >major decision</a> to migrate the graphical
+interface’s software of the accelerator’s control systems from Java to PyQt, and then of helping a client department
+<a href="https://www.zansara.dev/publications/thpv014/" >migrate</a> to this stack. I have also worked on other infrastructure and data pipelines, some of
+which resulted in <a href="https://www.zansara.dev/publications/thpv042/" >publication</a>.</p>
+<p>Outside of work I have too many <a href="https://www.zansara.dev/projects" >pet projects</a> to follow up with than the free time I can dedicate to them.
+I love science fiction and space exploration, I enjoy challenging hikes in nature and learning languages, as much as
+such process can be enjoyed.</p>
+<p>I speak native Italian and fluent English, but I’ve also learned French during my time at CERN, I’m studying Hungarian
+for family reasons, and Portuguese because I currently live there. I still can understand some Russian and I have a
+very basic understanding of Chinese, both from my teenage and university years.</p>
+<p>You can find my latest CV <a href="https://www.zansara.dev/me/sara_zanzottera_cv.pdf" >here</a>. Check out also my <a href="https://www.zansara.dev/projects" >projects</a>,
+my <a href="https://www.zansara.dev/publications" >publications</a> and my <a href="https://www.zansara.dev/talks" >talks</a>. If you prefer newsletters you can find my posts also on
+<a href="https://zansara.substack.com/" class="external-link" target="_blank" rel="noopener">Substack</a>.</p>
+<p>The best way to get in touch with me is through <a href="mailto:blog@zansara.dev" >email</a> or
+<a href="https://www.linkedin.com/in/sarazanzottera" class="external-link" target="_blank" rel="noopener">LinkedIn</a>.</p>
+
+
+
+
+
\ No newline at end of file
diff --git a/themes/hugo-coder/static/js/coder.js b/js/coder.js
similarity index 100%
rename from themes/hugo-coder/static/js/coder.js
rename to js/coder.js
diff --git a/static/me/avatar-black-bg.jpg b/me/avatar-black-bg.jpg
similarity index 100%
rename from static/me/avatar-black-bg.jpg
rename to me/avatar-black-bg.jpg
diff --git a/static/me/avatar-color.svg b/me/avatar-color.svg
similarity index 100%
rename from static/me/avatar-color.svg
rename to me/avatar-color.svg
diff --git a/static/me/avatar-mono.svg b/me/avatar-mono.svg
similarity index 100%
rename from static/me/avatar-mono.svg
rename to me/avatar-mono.svg
diff --git a/static/me/avatar-old.jpeg b/me/avatar-old.jpeg
similarity index 100%
rename from static/me/avatar-old.jpeg
rename to me/avatar-old.jpeg
diff --git a/static/me/avatar.png b/me/avatar.png
similarity index 100%
rename from static/me/avatar.png
rename to me/avatar.png
diff --git a/static/me/sara_zanzottera_cv.pdf b/me/sara_zanzottera_cv.pdf
similarity index 100%
rename from static/me/sara_zanzottera_cv.pdf
rename to me/sara_zanzottera_cv.pdf
diff --git a/static/posts/2021-12-11-dotfiles/cover.png b/posts/2021-12-11-dotfiles/cover.png
similarity index 100%
rename from static/posts/2021-12-11-dotfiles/cover.png
rename to posts/2021-12-11-dotfiles/cover.png
diff --git a/posts/2021-12-11-dotfiles/index.html b/posts/2021-12-11-dotfiles/index.html
new file mode 100644
index 00000000..9c879b74
--- /dev/null
+++ b/posts/2021-12-11-dotfiles/index.html
@@ -0,0 +1,264 @@
+
+
+
+
+
+ My Dotfiles · Sara Zan
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
What Linux developer would I be if I didn’t also have my very own dotfiles repo?
+
After many years of iterations I finally found a combination that lasted quite a while, so I figured it’s time to treat them as a real project. It was originally optimized for my laptop, but then I realized it works quite well on my three-monitor desk setup as well without major issues.
The typical trio of polybar , rofi and dunst to handle top bar, start menu and notifications respectively,
+
The odd choice of Ly as my display manager. I just love the minimal, TUI aesthetics of it. Don’t forget to enable Doom’s flames!
+
A minimalistic animated background from xscreensaver, Grav. It’s configured to leave no trails and stay black and white. An odd choice, and yet it manages to use no resources, stay very minimal, and bring a very (in my opinion) futuristic look to the entire setup.
As part of a larger side project which is still in the works (Ebisu Flashcards), these days I found myself looking for some decent API for verbs conjugations in different languages. My requirements were “simple”:
+
+
Supports many languages, including Italian, Portuguese and Hungarian
+
Conjugates irregulars properly
+
Offers an API access to the conjugation tables
+
Refuses to conjugate anything except for known verbs
+
(Optional) Highlights the irregularities in some way
+
+
Surprisingly these seem to be a shortage of good alternatives in this field. All websites that host polished conjugation data don’t seem to offer API access (looking at you, Reverso – you’ll get your own post one day), and most of the simples ones use heuristics to conjugate, which makes them very prone to errors. So for now I ended up choosing Verbix to start from.
+
Unfortunately the website doesn’t inspire much confidence. I attempted to email the creator just to see them close their email account a while later, an update in their API seems to have stalled half-way, and the blog seems dead. I often have the feeling this site might go under any minute, as soon as their domain registration expires.
+
But there are pros to it, as long as it lasts. Verbix offers verbs conjugation and nouns declination tables for some very niche languages, dialects and conlangs, to a degree that many other popular websites does not even come close. To support such variety they use heuristic to create the conjugation tables, which is not the best: for Hungarian, for example, I could easily get it to conjugate for me verbs that don’t exist or that have spelling mistakes. On the other hand their API do have a field that says whether the verb is known or not, which is a great way to filter out false positives.
+
So I decided to go the extra mile and I wrote a small Python SDK for their API: verbix-sdk. Enjoy it while it lasts…
December is finally approaching, and with it the release of a Haystack 2.0. At deepset, we’ve been talking about it for months, we’ve been iterating on the core concepts what feels like a million times, and it looks like we’re finally getting ready for the approaching deadline.
+
But what is it that makes this release so special?
+
In short, Haystack 2.0 is a complete rewrite. A huge, big-bang style change. Almost no code survived the migration unmodified: we’ve been across the entire 100,000+ lines of the codebase and redone everything in under a year. For our small team, this is a huge accomplishment.
+
In this series, I want to explain what Haystack 2 is from the perspective of the team that developed it. I’m gonna talk about what makes the new Pipeline so different from the old one, how to use new components and features, how these compare with the equivalent in Haystack 1 (when possible) and the principles that led the redesign. I had the pleasure (and sometimes the burden) of being involved in nearly all aspects of this process, from the requirements definition to the release, and I drove many of them through several iterations. In these posts, you can expect a mix of technical details and some diversions on the history and rationale behind each decision, as I’ve seen and understood them.
+
For the curious readers, we have already released a lot of information about Haystack 2.0: check out this this Github Discussion, or join us on Haystack’s Discord server and peek into the haystack-2.0 channel for regular updates. We are also slowly building brand new documentation for everything, and don’t worry: we’ll make sure to make it as outstanding as the Haystack 1.x version is.
+
We also regularly feature 2.0 features in our Office Hours on Discord. Follow @Haystack_AI or @deepset_ai on Twitter to stay up-to-date, or deepset on Linkedin. And you’ll find me and the rest of the team on GitHub frantically (re)writing code and filing down the rough edges before the big release.
Haystack is a relatively young framework, its initial release dating back to November 28th, 2019. Back then, Natural Language Processing was a field that had just started moving its first step outside of research labs, and Haystack was one of the first libraries that promised enterprise-grade, production-ready NLP features. We were proud to enable use cases such as semantic search, FAQ matching, document similarity, document summarization, machine translation, language-agnostic search, and so on.
+
The field was niche but constantly moving, and research was lively. The BERT paper had been published a few months before Haystack’s first release, unlocking a small revolution. In the shade of much larger research labs, deepset, then just a pre-seed stage startup, was also pouring effort into research and model training.
+
In those times, competition was close to non-existent. The field was still quite technical, and most people didn’t fully understand its potential. We were free to explore features and use cases at our own pace and set the direction for our product. This allowed us to decide what to work on, what to double down on, and what to deprioritize, postpone, or ignore. Haystack was nurturing its own garden in what was fundamentally a green field.
This rather idyllic situation came to an end all too abruptly at the end of November 2022, when ChatGPT was released.
+
For us in the NLP field, everything seemed to change overnight. Day by day. For months.
+
The speed of progress went from lively to faster-than-light all at once. Every company with the budget to train an LLM seemed to be doing so, and researchers kept releasing new models just as quickly. Open-source contributors pushed to reduce the hardware requirements for inference lower and lower. My best memory of those times is the drama of LlaMa’s first “release”: I remember betting on March 2nd that within a week I would be running LlaMa models on my laptop, and I wasn’t even surprised when my prediction turned out true with the release of llama.cpp on March 10th.
+
Of course, keeping up with this situation was far beyond us. Competitors started to spawn like mushrooms, and our space was quickly crowded with new startups, far more agile and aggressive than us. We suddenly needed to compete and realized we weren’t used to it.
Luckily, Haystack seemed capable of keeping up, at least for a while. Thanks to the efforts of Vladimir Blagojevic, a few weeks after ChatGPT became a sensation, we added some decent support for LLMs in the form of PromptNode. Our SaaS team could soon bring new LLM-powered features to our customers. We even managed to add support for Agents, another hot topic in the wake of ChatGPT.
+
However, the go-to library for LLMs was not Haystack in the mind of most developers. It was LangChain, and for a long time, it seemed like we would never be able to challenge their status and popularity. Everyone was talking about it, everyone was building demos, products, and startups on it, its development speed was unbelievable and, in the day-to-day discourse of the newly born LLM community, Haystack was nowhere to be found.
+
Why?
+
That’s because no one even realized that Haystack, the semantic search framework from 2019, also supported LLMs. All our documentation, tutorials, blog posts, research efforts, models on HuggingFace, everything was pointing towards semantic search. LLMs were nowhere to be seen.
+
And semantic search was going down fast.
+
+
The image above shows today’s monthly downloads for one of deepset’s most successful models on HuggingFace,
+deepset/roberta-base-squad2. This model performs extractive Question Answering, our former primary use case before the release of ChatGPT. Even with more than one and a half million downloads monthly, this model is experiencing a disastrous collapse in popularity, and in the current landscape, it is unlikely to ever recover.
In this context, around February 2023, we decided to bet on the rise of LLMs and committed to focus all our efforts towards becoming the #1 framework powering production-grade LLM applications.
+
As we quickly realized, this was by far not an easy proposition. Extractive QA was not only ingrained deeply in our public image but in our codebase as well: implementing and maintaining PromptNode was proving more and more painful by the day, and when we tried to fit the concept of Agents into Haystack, it felt uncomfortably like trying to force a square peg into a round hole.
+
Haystack pipelines made extractive QA straightforward for the users and were highly optimized for this use case. But supporting LLMs was nothing like enabling extractive QA. Using Haystack for LLMs was quite a painful experience, and at the same time, modifying the Pipeline class to accommodate them seemed like the best way to mess with all the users that relied on the current Pipeline for their existing, value-generating applications. Making mistakes with Pipeline could ruin us.
+
With this realization in mind, we took what seemed the best option for the future of Haystack: a rewrite. The knowledge and experience we gained while working on Haystack 1 could fuel the design of Haystack 2 and act as a reference frame for it. Unlike our competitors, we already knew a lot about how to make NLP work at scale. We made many mistakes we would avoid in our next iteration. We knew that focusing on the best possible developer experience fueled the growth of Haystack 1 in the early days, and we were committed to doing the same for the next version of it.
+
So, the redesign of Haystack started, and it started from the concept of Pipeline.
Haystack 2.0 hasn’t been released yet, but for now, it seems that we have made the right decision at the start of the year.
+
Haystack’s name is starting to appear more often in discussions around LLMs. The general tone of the community is steadily shifting, and scaling up, rather than experimenting, is now the focus. Competitors are re-orienting themselves toward production-readiness, something we’re visibly more experienced with. At the same time, LangChain is becoming a victim of its own success, collecting more and more criticism for its lack of documentation, leaky abstractions, and confusing architecture. Other competitors are gaining steam, but the overall landscape no longer feels as hostile.
+
In the next post, I will explore the technical side of Haystack 2.0 and delve deeper into the concept of Pipelines: what they are, how to use them, how they evolved from Haystack 1 to Haystack 2, and why.
If you’ve ever looked at Haystack before, you must have come across the Pipeline, one of the most prominent concepts of the framework. However, this abstraction is by no means an obvious choice when it comes to NLP libraries. Why did we adopt this concept, and what does it bring us?
+
In this post, I go into all the details of how the Pipeline abstraction works in Haystack now, why it works this way, and its strengths and weaknesses. This deep dive into the current state of the framework is also a premise for the next episode, where I will explain how Haystack 2.0 addresses this version’s shortcomings.
+
If you think you already know how Haystack Pipelines work, give this post a chance: I might manage to change your mind.
Interestingly, in the very first releases of Haystack, Pipelines were not a thing. Version 0.1.0 was released with a simpler object, the Finder, that did little more than gluing together a Retriever and a Reader, the two fundamental building blocks of a semantic search application.
+
In the next few months, however, the capabilities of language models expanded to enable many more use cases. One hot topic was hybrid retrieval: a system composed of two different Retrievers, an optional Ranker, and an optional Reader. This kind of application clearly didn’t fit the Finder’s design, so in version 0.6.0 the Pipeline object was introduced: a new abstraction that helped users build applications as a graph of components.
+
Pipeline’s API was a huge step forward from Finder. It instantly enabled seemingly endless combinations of components, unlocked almost all use cases conceivable, and became a foundational Haystack concept meant to stay for a very long time. In fact, the API offered by the first version of Pipeline changed very little since its initial release.
+
This is the snippet included in the release notes of version 0.6.0 to showcase hybrid retrieval. Does it look familiar?
+
p = Pipeline()
+p.add_node(component=es_retriever, name="ESRetriever", inputs=["Query"])
+p.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["Query"])
+p.add_node(component=JoinDocuments(join_mode="concatenate"), name="JoinResults", inputs=["ESRetriever", "DPRRetriever"])
+p.add_node(component=reader, name="QAReader", inputs=["JoinResults"])
+res = p.run(query="What did Einstein work on?", top_k_retriever=1)
+
One fascinating aspect of this Pipeline model is the simplicity of its user-facing API. In almost all examples, you see only two or three methods used:
+
+
add_node: to add a component to the graph and connect it to the others.
+
run: to run the Pipeline from start to finish.
+
draw: to draw the graph of the Pipeline to an image.
+
+
At this level, users don’t need to know what kind of data the components need to function, what they produce, or even what the components do: all they need to know is the place they must occupy in the graph for the system to work.
+
For example, as long as the users know that their hybrid retrieval pipeline should look more or less like this (note: this is the output of Pipeline.draw()), translating it into a Haystack Pipeline object using a few add_node calls is mostly straightforward.
+
+
This fact is reflected by the documentation of the various components as well. For example, this is how the documentation page for Ranker opens:
+
+
Note how the first information about this component is where to place it. Right after, it specifies its inputs and outputs, even though it’s not immediately clear why we need this information, and then lists which specific classes can cover the role of a Ranker.
+
The message is clear: all Ranker classes are functionally interchangeable, and as long as you place them correctly in the Pipeline, they will fulfill the function of Ranker as you expect them to. Users don’t need to understand what distinguishes CohereRanker from RecentnessReranker unless they want to: the documentation promises that you can swap them safely, and thanks to the Pipeline abstraction, this statement mostly holds true.
But how can the users know which sort of graph they have to build?
+
Most NLP applications are made by a relatively limited number of high-level components: Retriever, Readers, Rankers, plus the occasional Classifier, Translator, or Summarizer. Systems requiring something more than these components used to be really rare, at least when talking about “query” pipelines (more on this later).
+
Therefore, at this level of abstraction, there are just a few graph topologies possible. Better yet, they could each be mapped to high-level use cases such as semantic search, language-agnostic document search, hybrid retrieval, and so on.
+
But the crucial point is that, in most cases, tailoring the application did not require any changes to the graph’s shape. Users only need to identify their use case, find an example or a tutorial defining the shape of the Pipeline they need, and then swap the single components with other instances from the same category until they find the best combination for their exact requirements.
+
This workflow was evident and encouraged: it was the philosophy behind Finder as well, and from version 0.6.0, Haystack immediately provided what are called “Ready-made Pipelines”: objects that initialized the graph on the user’s behalf, and expected as input the components to place in each point of the graph: for example a Reader and a Retriever, in case of simple Extractive QA.
+
With this further abstraction on top of Pipeline, creating an NLP application became an action that doesn’t even require the user to be aware of the existence of the graph. In fact:
This abstraction is extremely powerful for the use cases that it was designed for. There are a few layers of ease of use vs. customization the user can choose from depending on their expertise, which help them progress from a simple ready-made Pipeline to fully custom graphs.
+
However, the focus was oriented so much on the initial stages of the user’s journey that power-users’ needs were sometimes forgotten. Such issues didn’t show immediately, but quickly added friction as soon as the users tried to customize their system beyond the examples from the tutorials and the documentation.
+
For an example of these issues, let’s talk about pipelines with branches. Here are two small, apparently very similar pipelines.
+
+
The first Pipeline represents the Hybrid Retrieval use case we’ve met with before. Here, the Query node sends its outputs to both retrievers, and they both produce some output. For the Reader to make sense of this data, we need a Join node that merges the two lists into one and a Ranker that takes the lists and sorts them again by similarity to the query. Ranker then sends the rearranged list to the Reader.
+
The second Pipeline instead performs a simpler form of Hybrid Retrieval. Here, the Query node sends its outputs to a Query Classifier, which then triggers only one of the two retrievers, the one that is expected to perform better on it. The triggered Retriever then sends its output directly to the Reader, which doesn’t need to know which Retriever the data comes from. So, in this case, we don’t need the Join node.
+
The two pipelines are built as you would expect, with a bunch of add_node calls. You can even run them with the same identical code, which is the same code needed for every other Pipeline we’ve seen so far.
+
pipeline_1 = Pipeline()
+pipeline_1.add_node(component=sparse_retriever, name="SparseRetriever", inputs=["Query"])
+pipeline_1.add_node(component=dense_retriever, name="DenseRetriever", inputs=["Query"])
+pipeline_1.add_node(component=join_documents, name="JoinDocuments", inputs=["SparseRetriever", "DenseRetriever"])
+pipeline_1.add_node(component=rerank, name="Ranker", inputs=["JoinDocuments"])
+pipeline_1.add_node(component=reader, name="Reader", inputs=["SparseRetriever", "DenseRetriever"])
+
+answers = pipeline_1.run(query="What did Einstein work on?")
+
pipeline_2 = Pipeline()
+pipeline_2.add_node(component=query_classifier, name="QueryClassifier", inputs=["Query"])
+pipeline_2.add_node(component=sparse_retriever, name="DPRRetriever", inputs=["QueryClassifier"])
+pipeline_2.add_node(component=dense_retriever, name="ESRetriever", inputs=["QueryClassifier"])
+pipeline_2.add_node(component=reader, name="Reader", inputs=["SparseRetriever", "DenseRetriever"])
+
+answers = pipeline_2.run(query="What did Einstein work on?")
+
Both pipelines run as you would expect them to. Hooray! Pipelines can branch and join!
+
Now, let’s take the first Pipeline and customize it further.
+
For example, imagine we want to expand language support to include French. The dense Retriever has no issues handling several languages as long as we select a multilingual model; however, the sparse Retriever needs the keywords to match, so we must translate the queries to English to find some relevant documents in our English-only knowledge base.
+
Here is what the Pipeline ends up looking like. Language Classifier sends all French queries over output_1 and all English queries over output_2. In this way, the query passes through the Translator node only if it is written in French.
But… wait. Let’s look again at the graph and at the code. DenseRetriever should receive two inputs from Language Classifier: both output_1 and output_2, because it can handle both languages. What’s going on? Is this a bug in draw()?
+
Thanks to the debug=True parameter of Pipeline.run(), we start inspecting what each node saw during the execution, and we realize quickly that our worst fears are true: this is a bug in the Pipeline implementation. The underlying library powering the Pipeline’s graphs takes the definition of Directed Acyclic Graphs very seriously and does not allow two nodes to be connected by more than one edge. There are, of course, other graph classes supporting this case, but Haystack happens to use the wrong one.
+
Interestingly, Pipeline doesn’t even notice the problem and does not fail. It runs as the drawing suggests: when the query happens to be in French, only the sparse Retriever will process it.
+
Clearly, this is not good for us.
+
Well, let’s look for a workaround. Given that we’re Haystack power users by now, we realize that we can use a Join node with a single input as a “no-op” node. If we put it along one of the edges, that edge won’t directly connect Language Classifier and Dense Retriever, so the bug should be solved.
Great news: the Pipeline now runs as we expect! However, when we run a French query, the results are better but still surprisingly bad.
+
What now? Is the dense Retriever still not running? Is the Translation node doing a poor job?
+
Some debugging later, we realize that the Translator is amazingly good and the Retrievers are both running. But we forgot another piece of the puzzle: Ranker needs the query to be in the same language as the documents. It requires the English version of the query, just like the sparse Retriever does. However, right now, it receives the original French query, and that’s the reason for the lack of performance. We soon realize that this is very important also for the Reader.
+
So… how does the Pipeline pass the query down to the Ranker?
+
Until this point, we didn’t need to know how exactly values are passed from one component to the next. We didn’t need to care about their inputs and outputs at all: Pipeline was doing all this dirty work for us. Suddenly, we need to tell the Pipeline which query to pass to the Ranker and we have no idea how to do that.
+
Worse yet. There is no way to reliably do that. The documentation seems to blissfully ignore the topic, docstrings give us no pointers, and looking at the routing code of Pipeline we quickly get dizzy and cut the chase. We dig through the Pipeline API several times until we’re confident that there’s nothing that can help.
+
Well, there must be at least some workaround. Maybe we can forget about this issue by rearranging the nodes.
+
One easy way out is to translate the query for both retrievers instead of only for the sparse one. This solution also eliminates the NoOpJoin node we introduced earlier, so it doesn’t sound too bad.
We now have two nodes that contain identical translator components. Given that they are stateless, we can surely place the same instance in both places, with different names, and avoid doubling its memory footprint just to work around a couple of Pipeline bugs. After all, Translator nodes use relatively heavy models for machine translation.
+
This is what Pipeline replies as soon as we try.
+
PipelineConfigError: Cannot add node 'Translator2'. You have already added the same
+instance to the Pipeline under the name 'Translator'.
+
Okay, so it seems like we can’t re-use components in two places: there is an explicit check against this, for some reason. Alright, let’s rearrange again this Pipeline with this new constraint in mind.
+
How about we first translate the query and then distribute it?
Looks neat: there is no way now for the original French query to reach Ranker now. Right?
+
We run the pipeline again and soon realize that nothing has changed. The query received by Ranker is still in French, untranslated. Shuffling the order of the add_node calls and the names of the components in the inputs parameters seems to have no effect on the graph. We even try to connect Translator directly with Ranker in a desperate attempt to forward the correct value, but Pipeline now starts throwing obscure, apparently meaningless error messages like:
Along with this beautiful code, we also receive an explanation about how the JoinQueryWorkaround node works only for this specific Pipeline and is pretty hard to generalize, which is why it’s not present in Haystack right now. I’ll spare you the details: you will have an idea why by the end of this journey.
+
Wanna play with this Pipeline yourself and try to make it work in another way? Check out the Colab or the gist and have fun.
+
Having learned only that it’s better not to implement unusual branching patterns with Haystack unless you’re ready for a fight, let’s now turn to the indexing side of your application. We’ll stick to the basics this time.
Indexing pipelines’ main goal is to transform files into Documents from which a query pipeline can later retrieve information. They mostly look like the following.
+
+
And the code looks just like how you would expect it.
There is no surprising stuff here. The starting node is File instead of Query, which seems logical given that this Pipeline expects a list of files, not a query. There is a document store at the end which we didn’t use in query pipelines so far, but it’s not looking too strange. It’s all quite intuitive.
+
Indexing pipelines are run by giving them the paths of the files to convert. In this scenario, more than one Converter may run, so we place a Join node before the PreProcessor to make sense of the merge. We make sure that the directory contains only files that we can convert, in this case, .txt, .pdf, and .docx, and then we run the code above.
+
The code, however, fails.
+
ValueError: Multiple non-default file types are not allowed at once.
+
The more we look at the error, the less it makes sense. What are non-default file types? Why are they not allowed at once, and what can I do to fix that?
+
We head for the documentation, where we find a lead.
+
+
So it seems like the File Classifier can only process the files if they’re all of the same type.
+
After all we’ve been through with the Hybrid Retrieval pipelines, this sounds wrong. We know that Pipeline can run two branches at the same time. We’ve been doing it all the time just a moment ago. Why can’t FileTypeClassifier send data to two converters just like LanguageClassifier sends data to two retrievers?
+
Turns out, this is not the same thing.
+
Let’s compare the three pipelines and try to spot the difference.
+
+
In the first case, Query sends the same identical value to both Retrievers. So, from the component’s perspective, there’s a single output being produced: the Pipeline takes care of copying it for all nodes connected to it.
+
In the second case, QueryClassifier can send the query to either Retriever but never to both. So, the component can produce two different outputs, but at every run, it will always return just one.
+
In the third case, FileTypeClassifier may need to produce two different outputs simultaneously: for example, one with a list of text files and one with a list of PDFs. And it turns out this can’t be done. This is, unfortunately, a well-known limitation of the Pipeline/BaseComponent API design.
+The output of a component is defined as a tuple, (output_values, output_edge), and nodes can’t produce a list of these tuples to send different values to different nodes.
+
That’s the end of the story. This time, there is no workaround. You must pass the files individually or forget about using a Pipeline for this task.
On top of these challenges, other tradeoffs had to be taken for the API to look so simple at first impact. One of these is connection validation.
+
Let’s imagine we quickly skimmed through a tutorial and got one bit of information wrong: we mistakenly believe that in an Extractive QA Pipeline, you need to place a Reader in front of a Retriever. So we sit down and write this.
+
p = Pipeline()
+p.add_node(component=reader, name="Reader", inputs=["Query"])
+p.add_node(component=retriever, name="Retriever", inputs=["Reader"])
+
Up to this point, running the script raises no error. Haystack is happy to connect these two components in this order. You can even draw() this Pipeline just fine.
This is the same error we’ve seen in the translating hybrid retrieval pipeline earlier, but fear not! Here, we can follow the suggestion of the error message by doing:
+
res = p.run(query="What did Einstein work on?", documents=document_store.get_all_documents())
+
And to our surprise, this Pipeline doesn’t crash. It just hangs there, showing an insanely slow progress bar, telling us that some inference is in progress. A few hours later, we kill the process and consider switching to another framework because this one is clearly very slow.
+
What happened?
+
The cause of this issue is the same that makes connecting Haystack components in a Pipeline so effortless, and it’s related to the way components and Pipeline communicate. If you check Pipeline.run()’s signature, you’ll see that it looks like this:
Every component can be connected to every other because their inputs are identical.
+
+
+
Every component can only output the same variables received as input.
+
+
+
It’s impossible to tell if it makes sense to connect two components because their inputs and outputs always match.
+
+
+
Take this with a grain of salt: the actual implementation is far more nuanced than what I just showed you, but the problem is fundamentally this: components are trying to be as compatible as possible with all others and they have no way to signal, to the Pipeline or to the users, that they’re meant to be connected only to some nodes and not to others.
+
In addition to this problem, to respect the shared signature, components often take inputs that they don’t use. A Ranker only needs documents, so all the other inputs required by the run method signature go unused. What do components do with the values? It depends:
+
+
Some have them in the signature and forward them unchanged.
+
Some have them in the signature and don’t forward them.
+
Some don’t have them in the signature, breaking the inheritance pattern, and Pipeline reacts by assuming that they should be added unchanged to the output dictionary.
+
+
If you check closely the two workaround nodes for the Hybrid Retrieval pipeline we tried to build before, you’ll notice the fix entirely focuses on altering the routing of the unused parameters query and documents to make the Pipeline behave the way the user expects. However, this behavior does not generalize: a different pipeline would require another behavior, which is why the components behave differently in the first place.
I could go on for ages talking about the shortcomings of complex Pipelines, but I’d rather stop here.
+
Along this journey into the guts of Haystack Pipelines, we’ve seen at the same time some beautiful APIs and the ugly consequences of their implementation. As always, there’s no free lunch: trying to over-simplify the interface will bite back as soon as the use cases become nontrivial.
+
However, we believe that this concept has a huge potential and that this version of Pipeline can be improved a lot before the impact on the API becomes too heavy. In Haystack 2.0, armed with the experience we gained working with this implementation of Pipeline, we reimplemented it in a fundamentally different way, which will prevent many of these issues.
As we have seen in the previous episode of this series, Haystack’s Pipeline is a powerful concept that comes with its set of benefits and shortcomings. In Haystack 2.0, the pipeline was one of the first items that we focused our attention on, and it was the starting point of the entire rewrite.
+
What does this mean in practice? Let’s look at what Haystack Pipelines in 2.0 will be like, how they differ from their 1.x counterparts, and the pros and cons of this new paradigm.
I’ve already written at length about what made the original Pipeline concept so powerful and its weaknesses. Pipelines were overly effective for the use cases we could conceive while developing them, but they didn’t generalize well on unforeseen situations.
+
For a long time, Haystack was able to afford not focusing on use cases that didn’t fit its architecture, as I have mentioned in my previous post about the reasons for the rewrite. The pipeline was then more than sufficient for its purposes.
+
However, the situation flipped as LLMs and Generative AI “entered” the scene abruptly at the end of 2022 (although it’s certainly been around for longer). Our Pipeline although useable and still quite powerful in many LLM use-cases, seemingly overfit the original use-cases it was designed for.
+
Let’s take one of these use cases and see where it leads us.
Let’s take one typical example: retrieval augmented generation, or RAG for short. This technique has been used since the very early days of the Generative AI boom as an easy way to strongly reduce hallucinations and improve the alignment of LLMs. The basic idea is: instead of asking directly a question, such as "What's the capital of France?", we send to the model a more complex prompt, that includes both the question and the answer. Such a prompt might be:
+
Given the following paragraph, answer the question.
+
+Paragraph: France is a unitary semi-presidential republic with its capital in Paris,
+the country's largest city and main cultural and commercial centre; other major urban
+areas include Marseille, Lyon, Toulouse, Lille, Bordeaux, Strasbourg and Nice.
+
+Question: What's the capital of France?
+
+Answer:
+
In this situation, the task of the LLM becomes far easier: instead of drawing facts from its internal knowledge, which might be lacking, inaccurate, or out-of-date, the model can use the paragraph’s content to answer the question, improving the model’s performance significantly.
+
We now have a new problem, though. How can we provide the correct snippets of text to the LLM? This is where the “retrieval” keyword comes up.
+
One of Haystack’s primary use cases had been Extractive Question Answering: a system where a Retriever component searches a Document Store (such as a vector or SQL database) for snippets of text that are the most relevant to a given question. It then sends such snippets to a Reader (an extractive model), which highlights the keywords that answer the original question.
+
By replacing a Reader model with an LLM, we get a Retrieval Augmented Generation Pipeline. Easy!
+
+
So far, everything checks out. Supporting RAG with Haystack feels not only possible but natural. Let’s take this simple example one step forward: what if, instead of getting the data from a document store, I want to retrieve data from the Internet?
At first glance, the task may not seem daunting. We surely need a special Retriever that, instead of searching through a DB, searches through the Internet using a search engine. But the core concepts stay the same, and so, we assume, should the pipeline’s graph. The end result should be something like this:
+
+
However, the problem doesn’t end there. Search engines return links, which need to be accessed, and the content of the webpage downloaded. Such pages may be extensive and contain artifacts, so the resulting text needs to be cleaned, reduced into paragraphs, potentially embedded by a retrieval model, ranked against the original query, and only the top few resulting pieces of text need to be passed over to the LLM. Just by including these minimal requirements, our pipeline already looks like this:
+
+
And we still need to consider that URLs may reference not HTML pages but PDFs, videos, zip files, and so on. We need file converters, zip extractors, audio transcribers, and so on.
+
+
You may notice how this use case moved quickly from looking like a simple query pipeline into a strange overlap of a query and an indexing pipeline. As we’ve learned in the previous post, indexing pipelines have their own set of quirks, one of which is that they can’t simultaneously process files of different types. But we can only expect the Search Engine to retrieve HTML files or PDFs if we filter them out on purpose, which makes the pipeline less effective. In fact, a pipeline that can read content from different file types, such as the one above, can’t really be made to work.
+
And what if, on top of this, we need to cache the resulting documents to reduce latency? What if I wanted to get the results from Google’s page 2, but only if the content of page 1 did not answer our question? At this point, the pipeline is hard to imagine, let alone draw.
+
Although Web RAG is somewhat possible in Haystack, it stretches far beyond what the pipeline was designed to handle. Can we do better?
When we went back to the drawing board to address these concerns, the first step was pinpointing the issue.
+
The root problem, as we realized, is that Haystack Pipelines treats each component as a locomotive treats its wagons. They all look the same from the pipeline’s perspective, they can all be connected in any order, and they all go from A to B rolling over the same pair of rails, passing all through the same stations.
+
+
In Haystack 1, components are designed to serve the pipeline’s needs first. A good component is identical to all the others, provides the exact interface the pipeline requires, and can be connected to any other in any order. The components are awkward to use outside of a pipeline due to the same run() method that makes the pipeline so ergonomic. Why does the Ranker, which needs only a query and a list of Documents to operate, also accept file_paths and meta in its run() method? It does so uniquely to satisfy the pipeline’s requirements, which in turn only exist to make all components forcefully compatible with each other.
+
Just like a locomotive, the pipeline pushes the components over the input data one by one. When seen in this light, it’s painfully obvious why the indexing pipeline we’ve seen earlier can’t work: the “pipeline train” can only go on one branch at a time. Component trains can’t split mid-execution. They are designed to all see the same data all the time. Even when branching happens, all branches always see the same data. Sending different wagons onto different rails is not possible by design.
The issue’s core is more evident when seen in this light. The pipeline is the only object that drives the execution, while components tend to be as passive and uniform as possible. This approach doesn’t scale: components are fundamentally different, and asking them to all appear equal forces them to hide their differences, making bugs and odd behavior more likely. As the number of components to handle grows, their variety will increase regardless, so the pipeline must always be aware of all the possibilities to manage them and progressively add edge cases that rapidly increase its complexity.
+
Therefore, the pipeline rewrite for Haystack 2.0 focused on one core principle: the components will define and drive the execution process. There is no locomotive anymore: every component needs to find its way, such as grabbing the data they need from the producers and sending their results to whoever needs them by declaring the proper connections. In the railway metaphor, it’s like adding a steering wheel to each container: the result is a truck, and the resulting system looks now like a highway.
+
+
Just as railways are excellent at going from A to B when you only need to take a few well-known routes and never another, highways are unbeatable at reaching every possible destination with the same effort, even though they need a driver for each wagon. A “highway” Pipeline requires more work from the Components’ side, but it frees them to go wherever they need to with a precision that a “railway” pipeline cannot accomplish.
The new Pipeline object may remind vaguely of Haystack’s original pipeline, and using one should feel very familiar. For example, this is how you assemble a simple Pipeline that performs two additions in Haystack 2.0.
+
fromcanalsimport Pipeline
+fromsample_componentsimport AddFixedValue
+
+# Create the Pipeline object
+pipeline = Pipeline()
+
+# Add the components - note the missing`inputs` parameter
+pipeline.add_component("add_one", AddFixedValue(add=1))
+pipeline.add_component("add_two", AddFixedValue(add=2))
+
+# Connect them together
+pipeline.connect("add_one.result", "add_two.value")
+
+# Draw the pipeline
+pipeline.draw("two_additions_pipeline.png")
+
+# Run the pipeline
+results = pipeline.run({"add_one": {"value": 1}})
+
+print(results)
+# prints '{"add_two": {"result": 4}}'
+
Creating the pipeline requires no special attention: however, you can now pass a max_loops_allowed parameter, to limit looping when it’s a risk. On the contrary, old Haystack 1.x Pipelines did not support loops at all.
+
Next, components are added by calling the Pipeline.add_component(name, component) method. This is also subject to very similar requirements to the previous pipeline.add_node:
+
+
Every component needs a unique name.
+
Some are reserved (for now, only _debug).
+
Instances are not reusable.
+
The object needs to be a component.
+However, we no longer connect the components to each other using this function because, although it is possible to implement in principle, it feels more awkward to use in the case of loops.
+
+
Consequently, we introduced a new method, Pipeline.connect(). This method follows the syntax ("producer_component.output_name_", "consumer_component.input_name"): so we don’t simply line up two components one after the other, but we connect one of their outputs to one of their inputs in an explicit manner.
+
This change allows pipelines to perform a much more careful validation of such connections. As we will discover soon, pipeline components in Haystack 2.0 must declare the type of their inputs and outputs. In this way, pipelines not only can make sure that the inputs and outputs exist for the given component, but they can also check whether their types match and can explain connection failures in great detail. For example, if there were a type mismatch, Pipeline.connect() will return an error such as:
+
Cannot connect 'greeter.greeting' with 'add_two.value': their declared input and output
+types do not match.
+
+greeter:
+- greeting: str
+add_two:
+- value: int (available)
+- add: Optional[int] (available)
+
Once the components are connected together, the resulting pipeline can be drawn. Pipeline drawings in Haystack 2.0 show far more details than their predecessors because the components are forced to share much more information about what they need to run, the types of these variables, and so on. The pipeline above draws the following image:
+
+
You can see how the components classes, their inputs and outputs, and all the connections are named and typed.
+
So, how do you run such a pipeline? By just providing a dictionary of input values. Each starting component should have a small dictionary with all the necessary inputs. In the example above, we pass 1 to the value input of add_one. The results mirror the input’s structure: add_two is at the end of the pipeline, so the pipeline will return a dictionary where under the add_two key there is a dictionary: {"result": 4}.
+
By looking at the diagram, you may have noticed that these two components have optional inputs. They’re not necessary for the pipeline to run, but they can be used to dynamically control the behavior of these components. In this case, add controls the “fixed value” this component adds to its primary input. For example:
One evident difficulty of this API is that it might be challenging to understand what to provide to the run method for each component. This issue has also been considered: the pipeline offers a Pipeline.inputs() method that returns a structured representation of all the expected input. For our pipeline, it looks like:
Now that we covered the Pipeline’s API, let’s have a look at what it takes for a Python class to be treated as a pipeline component.
+
You are going to need:
+
+
+
A @component decorator. All component classes must be decorated with the @component decorator. This allows a pipeline to discover and validate them.
+
+
+
A run() method. This is the method where the main functionality of the component should be carried out. It’s invoked by Pipeline.run() and has a few constraints, which we will describe later.
+
+
+
A @component.output_types() decorator for the run() method. This allows the pipeline to validate the connections between components.
+
+
+
Optionally, a warm_up() method. It can be used to defer the loading of a heavy resource (think a local LLM or an embedding model) to the warm-up stage that occurs right before the first execution of the pipeline. Components that use warm_up() can be added to a Pipeline and connected before the heavy operations are carried out. In this way, the validation that a Pipeline performs can happen before resources are wasted.
+
+
+
To summarize, a minimal component can look like this:
Note how the run() method has a few peculiar features. One is that all the method parameters need to be typed: if value was not declared as value: int, the pipeline would raise an exception demanding for typing.
+
This is the way components declare to the pipeline which inputs they expect and of which type: this is the first half of the information needed to perform the validation that Pipeline.connect() carries out.
+
The other half of the information comes from the @component.output_types decorator. Pipelines demand that components declare how many outputs the component will produce and of what type. One may ask why not rely on typing for the outputs, just as we’ve done for the inputs. So why not simply declare components as:
For Double, this is a legitimate solution. However, let’s see an example with another component called CheckParity: if a component’s input value is even, it sends it unchanged over the even output, while if it’s odd, it will send it over the odd output. The following clearly doesn’t work: we’re not communicating anywhere to Canals which output is even and which one is odd.
+
@component
+classCheckParity:
+
+defrun(self, value: int) -> int:
+if value %2==0:
+return value
+return value
+
This approach carries all the information required. However, such information is only available after the run() method is called. Unless we parse the method to discover all return statements and their keys (which is not always possible), pipelines cannot know all the keys the return dictionary may have. So, it can’t validate the connections when Pipeline.connect() is called.
+
The decorator bridges the gap by allowing the class to declare in advance what outputs it will produce and of which type. Pipeline trusts this information to be correct and validates the connections accordingly.
+
Okay, but what if the component is very dynamic? The output type may depend on the input type. Perhaps the number of inputs depends on some initialization parameter. In these cases, pipelines allow components to declare the inputs and output types in their init method as such:
Note that there’s no more typing on run(), and the decorator is gone. The information provided in the init method is sufficient for the pipeline to validate the connections.
+
One more feature of the inputs and output declarations relates to optional and variadic values. Pipelines in Haystack 2.0 support this both through a mix of type checking and signature inspection. For example, let’s have a look at how the AddFixedValue we’ve seen earlier looks like:
You can see that add, the optional parameter we met before, has a default value. Adding a default value to a parameter in the run() signature tells the pipeline that the parameter itself is optional, so the component can run even if that specific input doesn’t receive any value from the pipeline’s input or other components.
+
Another component that generalizes the sum operation is Sum, which instead looks like this:
+
fromcanalsimport component
+fromcanals.component.typesimport Variadic
+
+@component
+classSum:
+"""
+ Adds all its inputs together.
+ """
+
+@component.output_types(total=int)
+defrun(self, values: Variadic[int]):
+"""
+ :param values: the values to sum
+ """
+return {"total": sum(v for v in values if v isnotNone)}
+
In this case, we used the special type Variadic to tell the pipeline that the values input can receive data from multiple producers, instead of just one. Therefore, values is going to be a list type, but it can be connected to single int outputs, making it a valuable aggregator.
Just like old Haystack Pipelines, the new pipelines can be serialized. However, this feature suffered from similar problems plaguing the execution model, so it was changed radically.
+
The original pipeline gathered intrusive information about each of its components when initialized, leveraging the shared BaseComponent class. Conversely, the Pipeline delegates the serialization process entirely to its components.
+
If a component wishes to be serializable, it must provide two additional methods, to_dict and from_dict, which perform serialization and deserialization to a dictionary. The pipeline limits itself to calling each of its component’s methods, collecting their output, grouping them together with some limited extra information (such as the connections between them), and returning the result.
+
For example, if AddFixedValue were serializable, its serialized version could look like this:
Notice how the components are free to perform serialization in the way they see fit. The only requirement imposed by the Pipeline is the presence of two top-level keys, type and init_parameters, which are necessary for the pipeline to deserialize each component into the correct class.
+
This is useful, especially if the component’s state includes some non-trivial values, such as objects, API keys, or other special values. Pipeline no longer needs to know how to serialize everything the Components may contain: the task is fully delegated to them, which always knows best what needs to be done.
Having done a tour of the new Pipeline features, one might have noticed one detail. There’s a bit more work involved in using a Pipeline than there was before: you can’t just chain every component after every other. There are connections to be made, validation to perform, graphs to assemble, and so on.
+
In exchange, the pipeline is now way more powerful than before. Sure, but so is a plain Python script. Do we really need the Pipeline object? And what do we need it for?
+
+
+
Validation. While components normally validate their inputs and outputs, the pipeline does all the validation before the components run, even before loading heavy resources. This makes the whole system far less likely to fail at runtime for a simple input/output mismatch, which can be priceless for complex applications.
+
+
+
Serialization. Redistributing code is always tricky: redistributing a JSON file is much safer. Pipelines make it possible to represent complex systems in a readable JSON file that can be edited, shared, stored, deployed, and re-deployed on different backends at need.
+
+
+
Drawing: The new Pipeline offers a way to see your system clearly and automatically, which is often very handy for debugging, inspecting the system, and collaborating on the pipeline’s design.
+
+
+
On top of this, the pipeline abstraction promotes flatter API surfaces by discouraging components nesting one within the other and providing easy-to-use, single-responsibility components that are easy to reason about.
+
+
+
Having said all of this, however, we don’t believe that the pipeline design makes Haystack win or lose. Pipelines are just a bonus on top of what provides the real value: a broad set of components that reliably perform well-defined tasks. That’s why the Component API does not make the run() method awkward to use outside of a Pipeline: calling Sum.run(values=[1, 2, 3]) feels Pythonic outside of a pipeline and always will.
+
In the following posts, I will explore the world of Haystack components, starting from our now familiar use cases: RAG Pipelines.
Last updated: 18/01/2023 - Read it on the Haystack Blog.
+
+
Retrieval Augmented Generation (RAG) is quickly becoming an essential technique to make LLMs more reliable and effective at answering any question, regardless of how specific. To stay relevant in today’s NLP landscape, Haystack must enable it.
+
Let’s see how to build such applications with Haystack 2.0, from a direct call to an LLM to a fully-fledged, production-ready RAG pipeline that scales. At the end of this post, we will have an application that can answer questions about world countries based on data stored in a private database. At that point, the knowledge of the LLM will be only limited by the content of our data store, and all of this can be accomplished without fine-tuning language models.
+
+
💡 I recently gave a talk about RAG applications in Haystack 2.0, so if you prefer videos to blog posts, you can find the recording here. Keep in mind that the code shown might be outdated.
The idea of Retrieval Augmented Generation was first defined in a paper by Meta in 2020. It was designed to solve a few of the inherent limitations of seq2seq models (language models that, given a sentence, can finish writing it for you), such as:
+
+
Their internal knowledge, as vast as it may be, will always be limited and at least slightly out of date.
+
They work best on generic topics rather than niche and specific areas unless they’re fine-tuned on purpose, which is a costly and slow process.
+
All models, even those with subject-matter expertise, tend to “hallucinate”: they confidently produce false statements backed by apparently solid reasoning.
+
They cannot reliably cite their sources or tell where their knowledge comes from, which makes fact-checking their replies nontrivial.
+
+
RAG solves these issues of “grounding” the LLM to reality by providing some relevant, up-to-date, and trusted information to the model together with the question. In this way, the LLM doesn’t need to draw information from its internal knowledge, but it can base its replies on the snippets provided by the user.
+
+
As you can see in the image above (taken directly from the original paper), a system such as RAG is made of two parts: one that finds text snippets that are relevant to the question asked by the user and a generative model, usually an LLM, that rephrases the snippets into a coherent answer for the question.
+
Let’s build one of these with Haystack 2.0!
+
+
💡 Do you want to see this code in action? Check out the Colab notebook here or the gist here.
+
+
+
+
⚠️ Warning:This code was tested on haystack-ai==2.0.0b5. Haystack 2.0 is still unstable, so later versions might introduce breaking changes without notice until Haystack 2.0 is officially released. The concepts and components however stay the same.
As every NLP framework that deserves its name, Haystack supports LLMs in different ways. The easiest way to query an LLM in Haystack 2.0 is through a Generator component: depending on which LLM and how you intend to query it (chat, text completion, etc…), you should pick the appropriate class.
+
We’re going to use gpt-3.5-turbo (the model behind ChatGPT) for these examples, so the component we need is OpenAIGenerator. Here is all the code required to use it to query OpenAI’s gpt-3.5-turbo :
+
fromhaystack.components.generatorsimport OpenAIGenerator
+
+generator = OpenAIGenerator(api_key=api_key)
+generator.run(prompt="What's the official language of France?")
+# returns {"replies": ['The official language of France is French.']}
+
You can select your favorite OpenAI model by specifying a model_name at initialization, for example, gpt-4. It also supports setting an api_base_url for private deployments, a streaming_callback if you want to see the output generated live in the terminal, and optional kwargs to let you pass whatever other parameter the model understands, such as the number of answers (n), the temperature (temperature), etc.
+
Note that in this case, we’re passing the API key to the component’s constructor. This is unnecessary: OpenAIGenerator can read the value from the OPENAI_API_KEY environment variable and also from the api_key module variable of openai’s SDK.
Let’s imagine that our LLM-powered application also comes with some pre-defined questions that the user can select instead of typing in full. For example, instead of asking them to type What's the official language of France?, we let them select Tell me the official languages from a list, and they simply need to type “France” (or “Wakanda” for a change - our chatbot needs some challenges too).
+
In this scenario, we have two pieces of the prompt: a variable (the country name, like “France”) and a prompt template, which in this case is "What's the official language of {{ country }}?"
+
Haystack offers a component that can render variables into prompt templates: it’s called PromptBuilder. As the generators we’ve seen before, also PromptBuilder is nearly trivial to initialize and use.
+
fromhaystack.components.buildersimport PromptBuilder
+
+prompt_builder = PromptBuilder(template="What's the official language of {{ country }}?")
+prompt_builder.run(country="France")
+# returns {'prompt': "What's the official language of France?"}
+
Note how we defined a variable, country, by wrapping its name in double curly brackets. PromptBuilder lets you define any input variable that way: if the prompt template was "What's the official language of {{ nation }}?", the run() method of PromptBuilder would have expected a nation input.
+
This syntax comes from Jinja2, a popular templating library for Python. If you have ever used Flask, Django, or Ansible, you will feel at home with PromptBuilder. Instead, if you never heard of any of these libraries, you can check out the syntax on Jinja’s documentation. Jinja has a powerful templating language and offers way more features than you’ll ever need in prompt templates, ranging from simple if statements and for loops to object access through dot notation, nesting of templates, variables manipulation, macros, full-fledged import and encapsulation of templates, and more.
With these two components, we can assemble a minimal pipeline to see how they work together. Connecting them is trivial: PromptBuilder generates a prompt output, and OpenAIGenerator expects an input with the same name and type.
+
fromhaystackimport Pipeline
+fromhaystack.components.generatorsimport OpenAIGenerator
+fromhaystack.components.buildersimport PromptBuilder
+
+pipe = Pipeline()
+pipe.add_component("prompt_builder", PromptBuilder(template="What's the official language of {{ country }}?"))
+pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
+pipe.connect("prompt_builder", "llm")
+
+pipe.run({"prompt_builder": {"country": "France"}})
+# returns {"llm": {"replies": ['The official language of France is French.'] }}
+
Building the Generative part of a RAG application was very simple! So far, we only provided the question to the LLM, but no information to base its answers on. Nowadays, LLMs possess a lot of general knowledge, so questions about famous countries such as France or Germany are easy for them to reply to correctly. However, when using an app about world countries, some users may be interested in knowing more about obscure or defunct microstates that don’t exist anymore. In this case, ChatGPT is unlikely to provide the correct answer without any help.
+
For example, let’s ask our pipeline something really obscure.
+
pipe.run({"prompt_builder": {"country": "the Republic of Rose Island"}})
+# returns {
+# "llm": {
+# "replies": [
+# 'The official language of the Republic of Rose Island was Italian.'
+# ]
+# }
+# }
+
The answer is an educated guess but is not accurate: although it was located just outside of Italy’s territorial waters, according to Wikipedia the official language of this short-lived micronation was Esperanto.
+
How can we get ChatGPT to reply to such a question correctly? One way is to make it “cheat” by providing the answer as part of the question. In fact, PromptBuilder is designed to serve precisely this use case.
+
Here is our new, more advanced prompt:
+
Given the following information, answer the question.
+Context: {{ context }}
+Question: {{ question }}
+
Let’s build a new pipeline using this prompt!
+
context_template ="""
+Given the following information, answer the question.
+Context: {{ context }}
+Question: {{ question }}
+"""
+language_template ="What's the official language of {{ country }}?"
+
+pipe = Pipeline()
+pipe.add_component("context_prompt", PromptBuilder(template=context_template))
+pipe.add_component("language_prompt", PromptBuilder(template=language_template))
+pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
+pipe.connect("language_prompt", "context_prompt.question")
+pipe.connect("context_prompt", "llm")
+
+pipe.run({
+"context_prompt": {"context": "Rose Island had its own government, currency, post office, and commercial establishments, and the official language was Esperanto."}
+"language_prompt": {"country": "the Republic of Rose Island"}
+})
+# returns {
+# "llm": {
+# "replies": [
+# 'The official language of the Republic of Rose Island is Esperanto.'
+# ]
+# }
+# }
+
Let’s look at the graph of our Pipeline:
+
+
The beauty of PromptBuilder lies in its flexibility. It allows users to chain instances together to assemble complex prompts from simpler schemas: for example, we used the output of the first PromptBuilder as the value of question in the second prompt.
+
However, in this specific scenario, we can build a simpler system by merging the two prompts into one.
+
Given the following information, answer the question.
+Context: {{ context }}
+Question: What's the official language of {{ country }}?
+
Using this new prompt, the resulting pipeline becomes again very similar to our first.
+
template ="""
+Given the following information, answer the question.
+Context: {{ context }}
+Question: What's the official language of {{ country }}?
+"""
+pipe = Pipeline()
+pipe.add_component("prompt_builder", PromptBuilder(template=template))
+pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
+pipe.connect("prompt_builder", "llm")
+
+pipe.run({
+"prompt_builder": {
+"context": "Rose Island had its own government, currency, post office, and commercial establishments, and the official language was Esperanto.",
+"country": "the Republic of Rose Island"
+ }
+})
+# returns {
+# "llm": {
+# "replies": [
+# 'The official language of the Republic of Rose Island is Esperanto.'
+# ]
+# }
+# }
+
For now, we’ve been playing with prompts, but the fundamental question remains unanswered: where do we get the correct text snippet for the question the user is asking? We can’t expect such information as part of the input: we need our system to be able to fetch this information independently, based uniquely on the query.
+
Thankfully, retrieving relevant information from large corpora (a technical term for extensive collections of data, usually text) is a task that Haystack excels at since its inception: the components that perform this task are called Retrievers.
+
Retrieval can be performed on different data sources: to begin, let’s assume we’re searching for data in a local database, which is the use case that most Retrievers are geared towards.
+
Let’s create a small local database to store information about some European countries. Haystack offers a neat object for these small-scale demos: InMemoryDocumentStore. This document store is little more than a Python dictionary under the hood but provides the same exact API as much more powerful data stores and vector stores, such as Elasticsearch or ChromaDB. Keep in mind that the object is called “Document Store” and not simply “datastore” because what it stores is Haystack’s Document objects: a small dataclass that helps other components make sense of the data that they receive.
+
So, let’s initialize an InMemoryDocumentStore and write some Documents into it.
+
fromhaystack.dataclassesimport Document
+fromhaystack.document_stores.in_memoryimport InMemoryDocumentStore
+
+documents = [
+ Document(content="German is the the official language of Germany."),
+ Document(content="The capital of France is Paris, and its official language is French."),
+ Document(content="Italy recognizes a few official languages, but the most widespread one is Italian."),
+ Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")
+]
+docstore = InMemoryDocumentStore()
+docstore.write_documents(documents=documents)
+
+docstore.filter_documents()
+# returns [
+# Document(content="German is the the official language of Germany."),
+# Document(content="The capital of France is Paris, and its official language is French."),
+# Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea."),
+# Document(content="Italy recognizes a few official languages, but the most widespread one is Italian."),
+# ]
+
Once the document store is set up, we can initialize a retriever. In Haystack 2.0, each document store comes with its own set of highly optimized retrievers: InMemoryDocumentStore offers two, one based on BM25 ranking and one based on embedding similarity.
+
Let’s start with the BM25-based retriever, which is slightly easier to set up. Let’s first use it in isolation to see how it behaves.
+
fromhaystack.components.retrievers.in_memoryimport InMemoryBM25Retriever
+
+retriever = InMemoryBM25Retriever(document_store=docstore)
+retriever.run(query="Rose Island", top_k=1)
+# returns [
+# Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")
+# ]
+
+retriever.run(query="Rose Island", top_k=3)
+# returns [
+# Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")
+# Document(content="Italy recognizes a few official languages, but the most widespread one is Italian."),
+# Document(content="The capital of France is Paris, and its official language is French."),
+# ]
+
We see that InMemoryBM25Retriever accepts a few parameters. query is the question we want to find relevant documents for. In the case of BM25, the algorithm only searches for exact word matches. The resulting retriever is very fast, but it doesn’t fail gracefully: it can’t handle spelling mistakes, synonyms, or descriptions of an entity. For example, documents containing the word “cat” would be considered irrelevant against a query such as “felines”.
+
top_k controls the number of documents returned. We can see that in the first example, only one document is returned, the correct one. In the second, where top_k = 3, the retriever is forced to return three documents even if just one is relevant, so it picks the other two randomly. Although the behavior is not optimal, BM25 guarantees that if there is a document that is relevant to the query, it will be in the first position, so for now, we can use it with top_k=1.
+
Retrievers also accepts a filters parameter, which lets you pre-filter the documents before retrieval. This is a powerful technique that comes useful in complex applications, but for now we have no use for it. I will talk more in detail about this topic, called metadata filtering, in a later post.
+
Let’s now make use of this new component in our Pipeline.
The retriever does not return a single string but a list of Documents. How do we put the content of these objects into our prompt template?
+
It’s time to use Jinja’s powerful syntax to do some unpacking on our behalf.
+
Given the following information, answer the question.
+
+Context:
+{% for document in documents %}
+ {{ document.content }}
+{% endfor %}
+
+Question: What's the official language of {{ country }}?
+
Notice how, despite the slightly alien syntax for a Python programmer, what the template does is reasonably evident: it iterates over the documents and, for each of them, renders their content field.
+
With all these pieces set up, we can finally put them all together.
+
template ="""
+Given the following information, answer the question.
+
+Context:
+{% for document in documents %}
+ {{ document.content }}
+{% endfor %}
+
+Question: What's the official language of {{ country }}?
+"""
+pipe = Pipeline()
+
+pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
+pipe.add_component("prompt_builder", PromptBuilder(template=template))
+pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
+pipe.connect("retriever", "prompt_builder.documents")
+pipe.connect("prompt_builder", "llm")
+
+pipe.run({
+"retriever": {"query": country},
+"prompt_builder": {
+"country": "the Republic of Rose Island"
+ }
+})
+# returns {
+# "llm": {
+# "replies": [
+# 'The official language of the Republic of Rose Island is Esperanto.'
+# ]
+# }
+# }
+
+
Congratulations! We’ve just built our first, true-to-its-name RAG Pipeline.
So, we now have our running prototype. What does it take to scale this system up for production workloads?
+
Of course, scaling up a system to production readiness is no simple task that can be addressed in a paragraph. Still, we can start this journey with one component that can readily be improved: the document store.
+
InMemoryDocumentStore is clearly a toy implementation: Haystack supports much more performant document stores that make more sense to use in a production scenario. Since we have built our app with a BM25 retriever, let’s select Elasticsearch as our production-ready document store of choice.
+
+
⚠️ Warning:While ES is a valid document store to use in this scenario, nowadays if often makes more sense to choose a more specialized document store such as Weaviate, Qdrant, and so on. Check this page to see which document stores are currently supported for Haystack 2.0.
+
+
+
How do we use Elasticsearch on our pipeline? All it takes is to swap out InMemoryDocumentStore and InMemoryBM25Retriever with their Elasticsearch counterparts, which offer nearly identical APIs.
+
First, let’s create the document store: we will need a slightly more complex setup to connect to the Elasticearch backend. In this example, we use Elasticsearch version 8.8.0, but every Elasticsearch 8 version should work.
+
fromelasticsearch_haystack.document_storeimport ElasticsearchDocumentStore
+
+host = os.environ.get("ELASTICSEARCH_HOST", "https://localhost:9200")
+user ="elastic"
+pwd = os.environ["ELASTICSEARCH_PASSWORD"] # You need to provide this value
+
+docstore = ElasticsearchDocumentStore(
+ hosts=[host],
+ basic_auth=(user, pwd),
+ ca_certs="/content/elasticsearch-8.8.0/config/certs/http_ca.crt"
+)
+
Now, let’s write again our four documents into the store. In this case, we specify the duplicate policy, so if the documents were already present, they would be overwritten. All Haystack document stores offer three policies to handle duplicates: FAIL (the default), SKIP, and OVERWRITE.
+
fromhaystack.document_storesimport DuplicatePolicy
+documents = [
+ Document(content="German is the the official language of Germany."),
+ Document(content="The capital of France is Paris, and its official language is French."),
+ Document(content="Italy recognizes a few official languages, but the most widespread one is Italian."),
+ Document(content="Esperanto has been adopted as official language for some microstates as well, such as the Republic of Rose Island, a short-lived microstate built on a sea platform in the Adriatic Sea.")
+]
+docstore.write_documents(documents=documents, policy=DuplicatePolicy.OVERWRITE)
+
Once this is done, we are ready to build the same pipeline as before, but using ElasticsearchBM25Retriever.
+
fromelasticsearch_haystack.bm25_retrieverimport ElasticsearchBM25Retriever
+
+template ="""
+Given the following information, answer the question.
+
+Context:
+{% for document in documents %}
+ {{ document.content }}
+{% endfor %}
+
+Question: What's the official language of {{ country }}?
+"""
+
+pipe = Pipeline()
+pipe.add_component("retriever", ElasticsearchBM25Retriever(document_store=docstore))
+pipe.add_component("prompt_builder", PromptBuilder(template=template))
+pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
+pipe.connect("retriever", "prompt_builder.documents")
+pipe.connect("prompt_builder", "llm")
+
+pipe.draw("elasticsearch-rag-pipeline.png")
+
+country ="the Republic of Rose Island"
+pipe.run({
+"retriever": {"query": country},
+"prompt_builder": {"country": country}
+})
+# returns {
+# "llm": {
+# "replies": [
+# 'The official language of the Republic of Rose Island is Esperanto.'
+# ]
+# }
+# }
+
+
That’s it! We’re now running the same pipeline over a production-ready Elasticsearch instance.
In this post, we’ve detailed some fundamental components that make RAG applications possible with Haystack: Generators, the PromptBuilder, and Retrievers. We’ve seen how they can all be used in isolation and how you can make Pipelines out of them to achieve the same goal. Last, we’ve experimented with some of the (very early!) features that make Haystack 2.0 production-ready and easy to scale up from a simple demo with minimal changes.
+
However, this is just the start of our journey into RAG. Stay tuned!
In the previous post of the Haystack 2.0 series, we saw how to build RAG pipelines using a generator, a prompt builder, and a retriever with its document store. However, the content of our document store wasn’t extensive, and populating one with clean, properly formatted data is not an easy task. How can we approach this problem?
+
In this post, I will show you how to use Haystack 2.0 to create large amounts of documents from a few web pages and write them a document store that you can then use for retrieval.
+
+
💡 Do you want to see the code in action? Check out the Colab notebook or the gist.
+
+
+
+
⚠️ Warning:This code was tested on haystack-ai==2.0.0b5. Haystack 2.0 is still unstable, so later versions might introduce breaking changes without notice until Haystack 2.0 is officially released. The concepts and components, however, stay the same.
In Haystack’s terminology, the process of extracting information from a group of files and storing the data in a document store is called “indexing”. The process includes, at the very minimum, reading the content of a file, generating a Document object containing all its text, and then storing it in a document store.
+
However, indexing pipelines often do more than this. They can process more than one file type, like .txt, .pdf, .docx, .html, audio, video, and images. Having many file types to convert, they route each file to the proper converter based on its type. Files tend to contain way more text than a normal LLM can chew, so they need to split those huge Documents into smaller chunks. Also, the converters are not perfect at reading text from the files, so they need to clean the data from artifacts such as page numbers, headers, footers, and so on. On top of all of this, if you plan to use a retriever that is based on embedding similarity, your indexing pipeline will also need to embed all documents before writing them into the store.
+
Sounds like a lot of work!
+
In this post, we will focus on the preprocessing part of the pipeline: cleaning, splitting, and writing documents. I will talk about the other functionalities of indexing pipelines, such as document embedding and multiple file types routing, in later posts.
As we’ve just seen, the most important task of this pipeline is to convert files into Documents. Haystack provides several converters for this task: at the time of writing, it supports:
+
+
Raw text files (TextFileToDocument)
+
HTML files, so web pages in general (HTMLToDocument)
+
PDF files, by extracting text natively (PyPDFToDocument)
+
Image files, PDFs with images, and Office files with images, by OCR (AzureOCRDocumentConverter)
+
Audio files, doing transcription with Whisper either locally (LocalWhisperTranscriber) or remotely using OpenAI’s hosted models (RemoteWhisperTranscriber)
+
A ton of other formats, such as Microsoft’s Office formats, thanks to Apache Tika (TikaDocumentConverter)
+
+
For this example, let’s assume we have a collection of web pages downloaded from the Internet. These pages are our only source of information and contain all we want our RAG application to know about.
+
In this case, our converter of choice is HTMLToDocument. HTMLToDocument is a Haystack component that understands HTML and can filter all the markup away, leaving only meaningful text. Remember that this is a file converter, not a URL fetcher: it can only process local files, such as a website crawl. Haystack provides some components to fetch web pages, but we will see them later.
+
Here is how you can use this converter:
+
fromhaystack.components.convertersimport HTMLToDocument
+
+path ="Republic_of_Rose_Island.html"
+
+converter = HTMLToDocument()
+converter.run(sources=[path])
+
+# returns {"documents": [Document(content="The Republic of Rose Isla...")]}
+
HTMLToDocument is a straightforward component that offers close to no parameters to customize its behavior. Of its API, one notable feature is its input type: this converter can take paths to local files in the form of strings or Path objects, but it also accepts ByteStream objects.
+
ByteStream is a handy Haystack abstraction that makes handling binary streams easier. If a component accepts ByteStream as input, you don’t necessarily have to save your web pages to file before passing them to this converter. This allows components that retrieve large files from the Internet to pipe their output directly into this component without saving the data to disk first, which can save a lot of time.
With HTMLToDocument, we can convert whole web pages into large Document objects. The converter typically does a decent job of filtering out the markup. Still, it’s not always perfect. To compensate for these occasional issues, Haystack offers a component called DocumentCleaner that can remove noise from the text of the documents.
+
Just like any other component, DocumentCleaner is straightforward to use:
The effectiveness of DocumentCleaner depends a lot on the type of converter you use. Some flags, such as remove_empty_lines and remove_extra_whitespace, are minor fixes that can come in handy but usually have little impact on the quality of the results when used in a RAG pipeline. They can, however, make a vast difference for Extractive QA pipelines.
+
Other parameters, like remove_substrings or remove_regex, work very well but need manual inspection and iteration from a human to get right. For example, for Wikipedia pages, we could use these parameters to remove all instances of the word "Wikipedia", which are undoubtedly many and irrelevant.
+
Finally, remove_repeated_substrings is a convenient method that removes headers and footers from long text, for example, books and articles. However, it works only for PDFs and, to a limited degree, for text files because it relies on the presence of form feed characters (\f), which are rarely present in web pages.
Now that the text is cleaned up, we can move onto a more exciting step: text splitting.
+
So far, each Document stored the content of an entire file. If a file was a whole book with hundreds of pages, a single Document would contain hundreds of thousands of words, which is clearly too much for an LLM to make sense of. Such a large Document is also challenging for Retrievers to understand because it contains so much text that it looks relevant to every possible question. To populate our document store with data that can be used effectively by a RAG pipeline, we need to chunk this data into much smaller Documents.
+
That’s where DocumentSplitter comes into play.
+
+
💡 With LLMs in a race to offer the largest context window and research showing that such a chase is counterproductive, there is no general consensus about how splitting Documents for RAG impacts the LLM’s performance.
+
What you need to keep in mind is that splitting implies a tradeoff. Huge documents will always be slightly relevant for every question, but they will bring a lot of context, which may or may not confuse the model. On the other hand, tiny Documents are much more likely to be retrieved only for questions they’re highly relevant for, but they might provide too little context for the LLM to really understand their meaning.
+
Tweaking the size of your Documents for the specific LLM you’re using and the topic of your documents is one way to optimize your RAG pipeline, so be ready to experiment with different Document sizes before committing to one.
DocumentSplitter lets you configure the approximate size of the chunks you want to generate with three parameters: split_by, split_length, and split_overlap.
+
split_by defines the unit to use when splitting some text. For now, the options are word, sentence, and passage (paragraph), but we will soon add other options.
+
split_length is the number of the units defined above each document should include. For example, if the unit is sentence, split_length=10 means that all your Documents will contain 10 sentences worth of text (except usually for the last document, which may have less). If the unit was word, it would instead contain 10 words.
+
split_overlap is the amount of units that should be included from the previous Document. For example, if the unit is sentence and the length is 10, setting split_overlap=2 means that the last two sentences of the first document will also be present at the start of the second, which will include only 8 new sentences for a total of 10. Such repetition carries over to the end of the text to split.
Once all of this is done, we can finally move on to the last step of our journey: writing the Documents into our document store. We first create the document store:
If you’ve read my previous post about RAG pipelines, you may wonder: why use DocumentWriter when we could call the .write_documents() method of our document store?
+
In fact, the two methods are fully equivalent: DocumentWriter does nothing more than calling the .write_documents() method of the document store. The difference is that DocumentWriter is the way to go if you are using a Pipeline, which is what we’re going to do next.
We finally have all the components we need to go from a list of web pages to a document store populated with clean and short Document objects. Let’s build a Pipeline to sum up this process:
That’s it! We now have a fully functional indexing pipeline that can take a list of web pages and convert them into Documents that our RAG pipeline can use. As long as the RAG pipeline reads from the same store we are writing the Documents to, we can add as many Documents as we need to keep the chatbot’s answers up to date without having to touch the RAG pipeline.
+
To try it out, we only need to take the RAG pipeline we built in my previous post and connect it to the same document store we just populated:
+
fromhaystack.components.generatorsimport OpenAIGenerator
+fromhaystack.components.builders.prompt_builderimport PromptBuilder
+fromhaystack.components.retrievers.in_memoryimport InMemoryBM25Retriever
+
+template ="""
+Given the following information, answer the question: {{ question }}
+
+{% for document in documents %}
+ {{ document.content }}
+{% endfor %}
+"""
+pipe = Pipeline()
+
+pipe.add_component("retriever", InMemoryBM25Retriever(document_store=document_store))
+pipe.add_component("prompt_builder", PromptBuilder(template=template))
+pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
+pipe.connect("retriever", "prompt_builder.documents")
+pipe.connect("prompt_builder", "llm")
+
+question ="Is there any documentary about the story of Rose Island? Can you tell me something about that?"
+pipe.run({
+"retriever": {"query": question},
+"prompt_builder": {"question": question}
+})
+
+# returns {
+# 'llm': {
+# 'replies': [
+# 'Yes, there is a documentary about the story of Rose Island. It is
+# called "Rose Island" and was released on Netflix on 8 December 2020.
+# The documentary follows the true story of Giorgio Rosa, an Italian
+# engineer who built his own island in the Adriatic sea in the late
+# 1960s. The island housed a restaurant, bar, souvenir shop, and even
+# a post office. Rosa\'s goal was to have his self-made structure
+# recognized as an independent state, leading to a battle with the
+# Italian authorities. The film depicts the construction of the island
+# and Rosa\'s refusal to dismantle it despite government demands. The
+# story of Rose Island was relatively unknown until the release of the
+# documentary. The film showcases the technology Rosa invented to build
+# the island and explores themes of freedom and resilience.'
+# ],
+# 'metadata': [...]
+# }
+# }
+
And suddenly, our chatbot knows everything about Rose Island without us having to feed the data to the document store by hand.
Indexing pipelines can be powerful tools, even in their simplest form, like the one we just built. However, it doesn’t end here: Haystack offers many more facilities to extend what’s possible with indexing pipelines, like doing web searches, downloading files from the web, processing many other file types, and so on.
In an earlier post of the Haystack 2.0 series, we’ve seen how to build RAG and indexing pipelines. An application that uses these two pipelines is practical if you have an extensive, private collection of documents and need to perform RAG on such data only. However, in many cases, you may want to get data from the Internet: from news outlets, documentation pages, and so on.
+
In this post, we will see how to build a Web RAG application: a RAG pipeline that can search the Web for the information needed to answer your questions.
+
+
💡 Do you want to see the code in action? Check out the Colab notebook or the gist.
+
+
+
+
⚠️ Warning:This code was tested on haystack-ai==2.0.0b5. Haystack 2.0 is still unstable, so later versions might introduce breaking changes without notice until Haystack 2.0 is officially released. The concepts and components, however, stay the same.
As we’ve seen earlier, a Haystack RAG Pipeline is made of three components: a Retriever, a PromptBuilder, and a Generator, and looks like this:
+
+
To make this pipeline use the Web as its data source, we need to change the retriever with a component that does not look into a local document store for information but can search the web.
+
Haystack 2.0 already provides a search engine component called SerperDevWebSearch. It uses SerperDev’s API to query popular search engines and return two types of data: a list of text snippets coming from the search engine’s preview boxes and a list of links, which point to the top search results.
+
To begin, let’s see how to use this component in isolation.
+
fromhaystack.components.websearchimport SerperDevWebSearch
+
+question ="What's the official language of the Republic of Rose Island?"
+
+search = SerperDevWebSearch(api_key=serperdev_api_key)
+results = search.run(query=question)
+# returns {
+# "documents": [
+# Document(content='Esperanto', meta={'title': 'Republic of Rose Island - Wikipedia', 'link': 'https://en.wikipedia.org/wiki/Republic_of_Rose_Island'}),
+# Document(content="The Republic of Rose Island was a short-lived micronation on a man-made platform in the Adriatic Sea. It's a story that few people knew of until recently, ...", meta={'title': 'Rose Island - The story of a micronation', 'link': 'https://www.rose-island.co/', 'imageUrl': 'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQiRCfTO6OwFS32SX37S-7OadDZCNK6Fy_NZVGsci2gcIS-zcinhOcGhgU&s', 'position': 1},
+# ...
+# ],
+# "links": [
+# 'https://www.rose-island.co/',
+# 'https://www.defactoborders.org/places/rose-island',
+# ...
+# ]
+# }
+
SerperDevWebSearch is a component with a simple interface. Starting from its output, we can see that it returns not one but two different values in the returned dictionary: documents and links.
+
links is the most straightforward and represents the top results that Google found relevant for the input query. It’s a list of strings, each containing a URL. You can configure the number of links to return with the top_k init parameter.
+
documents instead is a list of already fully formed Document objects. The content of these objects corresponds to the “answer boxes” that Google often returns together with its search results. Given that these code snippets are usually clean and short pieces of text, they’re perfect to be fed directly to an LLM without further processing.
+
Other than expecting an API key as an init parameter and top_k to control the number of results, SerperDevWebSearch also accepts an allowed_domains parameter, which lets you configure the domains Google is allowed to look into during search, and search_params, a more generic dictionary input that lets you pass any additional search parameter SerperDev’s API understand.
SerperDevWebSearch is actually the bare minimum we need to be able to build our very first Web RAG Pipeline. All we need to do is replace our original example’s Retriever with our search component.
+
This is the result:
+
fromhaystackimport Pipeline
+fromhaystack.components.buildersimport PromptBuilder
+fromhaystack.components.generatorsimport OpenAIGenerator
+
+template ="""
+Question: {{ question }}
+
+Google Search Answer Boxes:
+{% for document in documents %}
+ {{ document.content }}
+{% endfor %}
+
+Please reformulate the information above to
+answer the user's question.
+"""
+pipe = Pipeline()
+
+pipe.add_component("search", SerperDevWebSearch(api_key=serperdev_api_key))
+pipe.add_component("prompt_builder", PromptBuilder(template=template))
+pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
+pipe.connect("search.documents", "prompt_builder.documents")
+pipe.connect("prompt_builder", "llm")
+
+question ="What's the official language of the Republic of Rose Island?"
+pipe.run({
+"search": {"query": question},
+"prompt_builder": {"question": question}
+})
+# returns {
+# 'llm': {
+# 'replies': [
+# "The official language of the Republic of Rose Island is Esperanto. This artificial language was chosen by the residents of Rose Island as their national language when they declared independence in 1968. However, it's important to note that despite having their own language, government, currency, and postal service, Rose Island was never officially recognized as an independent nation by any country."
+# ],
+# 'metadata': [...]
+# }
+# }
+
+
This solution is already quite effective for simple questions because Google does most of the heavy lifting of reading the content of the top results, extracting the relevant snippets, and packaging them up in a way that is really easy to access and understand by the model.
+
However, there are situations in which this approach is not sufficient. For example, for highly technical or nuanced questions, the answer box does not provide enough context for the LLM to elaborate and grasp the entire scope of the discussion. In these situations, we may need to turn to the second output of SerperDevWebSearch: the links.
First, let’s notice that LinkContentFetcher outputs a list of ByteStream objects. ByteStream is a Haystack abstraction that makes handling binary streams and files equally easy. When a component produces ByteStream as output, you can directly pass these objects to a Converter component that can extract its textual content without saving such binary content to a file.
+
These features come in handy to connect LinkContentFetcher to a component we’ve already met before: HTMLToDocument.
In a previous post, we’ve seen how Haystack can convert web pages into clean Documents ready to be stored in a Document Store. We will reuse many of the components we have discussed there, so if you missed it, make sure to check it out.
+
From the pipeline in question, we’re interested in three of its components: HTMLToDocument, DocumentCleaner, and DocumentSplitter. Once the search component returns the links and LinkContentFetcher downloaded their content, we can connect it to HTMLToDocument to extract the text and DocumentCleaner and DocumentSplitter to clean and chunk the content, respectively. These documents then can go to the PromptBuilder, resulting in a pipeline such as this:
+
template ="""
+Question: {{ question }}
+
+Context:
+{% for document in documents %}
+ {{ document.content }}
+{% endfor %}
+
+Please reformulate the information above to answer the user's question.
+"""
+pipe = Pipeline()
+
+pipe.add_component("search", SerperDevWebSearch(api_key=serperdev_api_key))
+pipe.add_component("fetcher", LinkContentFetcher())
+pipe.add_component("converter", HTMLToDocument())
+pipe.add_component("cleaner", DocumentCleaner())
+pipe.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=3))
+pipe.add_component("prompt_builder", PromptBuilder(template=template))
+pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
+pipe.connect("search.links", "fetcher")
+pipe.connect("fetcher", "converter")
+pipe.connect("converter", "cleaner")
+pipe.connect("cleaner", "splitter")
+pipe.connect("splitter", "prompt_builder.documents")
+pipe.connect("prompt_builder", "llm")
+
+question ="What's the official language of the Republic of Rose Island?"
+pipe.run({
+"search": {"query": question},
+"prompt_builder": {"question": question}
+})
+
+
However, running this pipeline results in a crash.
+
PipelineRuntimeError: llm raised 'InvalidRequestError: This model's maximum context
+length is 4097 tokens. However, your messages resulted in 4911 tokens. Please reduce
+the length of the messages.'
+
Reading the error message reveals the issue right away: the LLM received too much text. And that’s to be expected because we just passed the entire content of several web pages to it.
+
We need to find a way to filter only the most relevant documents from the long list that is generated by DocumentSplitter.
Retrievers are optimized to use the efficient retrieval engines of document stores to sift quickly through vast collections of Documents. However, Haystack also provides smaller, standalone components that work very well on shorter lists and don’t require a full-blown vector database engine to function.
+
These components are called rankers. One example of such a component is TransformersSimilarityRanker: a ranker that uses a model from the transformers library to rank Documents by their similarity to a given query.
+
Let’s see how it works:
+
fromhaystack.components.rankersimport TransformersSimilarityRanker
+
+ranker = TransformersSimilarityRanker()
+ranker.warm_up()
+ranker.run(
+ query="What's the official language of the Republic of Rose Island?",
+ documents=documents,
+ top_k=1
+ )
+# returns {
+# 'documents': [
+# Document(content="Island under construction\nRepublic of Rose Island\nThe Republic of Rose Island ( Esperanto : Respubliko de la Insulo de la Rozoj; Italian : Repubblica dell'Isola delle Rose) was a short-lived micronation on a man-made platform in the Adriatic Sea , 11 kilometres (6.8\xa0mi) off the coast of the province of Rimini , Italy, built by Italian engineer Giorgio Rosa, who made himself its president and declared it an independent state on 1 May 1968. [1] [2] Rose Island had its own government, currency, post office, and commercial establishments, and the official language was Esperanto .", meta={'source_id': '03bfe5f7b7a7ec623e854d2bc5eb36ba3cdf06e1e2771b3a529eeb7e669431b6'}, score=7.594357490539551)
+# ]
+# }
+
This component has a feature we haven’t encountered before: the warm_up() method.
+
Components that need to initialize heavy resources, such as a language model, always perform this operation after initializing them in the warm_up() method. When they are used in a Pipeline, Pipeline.run() takes care of calling warm_up() on all components before running; when used standalone, users need to call warm_up() explicitly to prepare the object to run.
+
TransformersSimilarityRanker accepts a few parameters. When initialized, it accepts a model_name_or_path with the HuggingFace ID of the model to use for ranking: this value defaults to cross-encoder/ms-marco-MiniLM-L-6-v2. It also takes token, to allow users to download private models from the Models Hub, device, to let them leverage PyTorch’s ability to select the hardware to run on, and top_k, the maximum number of documents to return. top_k, as we see above, can also be passed to run(), and the latter overcomes the former if both are set. This value defaults to 10.
+
Let’s also put this component in the pipeline: its place is between the splitter and the prompt builder.
+
template ="""
+Question: {{ question }}
+
+Context:
+{% for document in documents %}
+ {{ document.content }}
+{% endfor %}
+
+Please reformulate the information above to answer the user's question.
+"""
+pipe = Pipeline()
+
+pipe.add_component("search", SerperDevWebSearch(api_key=serperdev_api_key))
+pipe.add_component("fetcher", LinkContentFetcher())
+pipe.add_component("converter", HTMLToDocument())
+pipe.add_component("cleaner", DocumentCleaner())
+pipe.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=3))
+pipe.add_component("ranker", TransformersSimilarityRanker())
+pipe.add_component("prompt_builder", PromptBuilder(template=template))
+pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
+pipe.connect("search.links", "fetcher")
+pipe.connect("fetcher", "converter")
+pipe.connect("converter", "cleaner")
+pipe.connect("cleaner", "splitter")
+pipe.connect("splitter", "ranker")
+pipe.connect("ranker", "prompt_builder.documents")
+pipe.connect("prompt_builder", "llm")
+
+question ="What's the official language of the Republic of Rose Island?"
+
+pipe.run({
+"search": {"query": question},
+"ranker": {"query": question},
+"prompt_builder": {"question": question}
+})
+# returns {
+# 'llm': {
+# 'replies': [
+# 'The official language of the Republic of Rose Island was Esperanto.'
+# ],
+# 'metadata': [...]
+# }
+# }
+
+
Note how the ranker needs to know the question to compare the documents, just like the search and prompt builder components do. So, we need to pass the value to the pipeline’s run() call.
The pipeline we just built works great in most cases. However, it may occasionally fail if the search component happens to return some URL that does not point to a web page but, for example, directly to a video, a PDF, or a PPTX.
+
Haystack does offer some facilities to deal with these file types, but we will see these converters in another post. For now, let’s only filter those links out to prevent HTMLToDocument from crashing.
+
This task could be approached with Haystack in several ways, but the simplest in this scenario is to use a component that would typically be used for a slightly different purpose. This component is called FileTypeRouter.
+
FileTypeRouter is designed to route different files to their appropriate converters by checking their mime type. It does so by inspecting the content or the extension of the files it receives in input and producing an output dictionary with a separate list for each identified type.
+
However, we can also conveniently use this component as a filter. Let’s see how!
FileTypeRouter must always be initialized with the list of mime types it is supposed to handle. Not only that, but this component can also deal with files that do not match any of the expected mime types by putting them all under the unclassified category.
+
By putting this component between LinkContentFetcher and HTMLToDocument, we can make it forward along the pipeline only the files that match the text/html mime type and silently discard all others.
+
Notice how, in the pipeline below, I explicitly connect the text/html output only:
+
template ="""
+Question: {{ question }}
+
+Google Search Answer Boxes:
+{% for document in documents %}
+ {{ document.content }}
+{% endfor %}
+
+Please reformulate the information above to answer the user's question.
+"""
+pipe = Pipeline()
+
+pipe.add_component("search", SerperDevWebSearch(api_key=serperdev_api_key))
+pipe.add_component("fetcher", LinkContentFetcher())
+pipe.add_component("filter", FileTypeRouter(mime_types=["text/html"]))
+pipe.add_component("converter", HTMLToDocument())
+pipe.add_component("cleaner", DocumentCleaner())
+pipe.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=3))
+pipe.add_component("ranker", TransformersSimilarityRanker())
+pipe.add_component("prompt_builder", PromptBuilder(template=template))
+pipe.add_component("llm", OpenAIGenerator(api_key=api_key))
+pipe.connect("search.links", "fetcher")
+pipe.connect("fetcher", "filter")
+pipe.connect("filter.text/html", "converter")
+pipe.connect("converter", "cleaner")
+pipe.connect("cleaner", "splitter")
+pipe.connect("splitter", "ranker")
+pipe.connect("ranker", "prompt_builder.documents")
+pipe.connect("prompt_builder", "llm")
+
+question ="What's the official language of the Republic of Rose Island?"
+
+pipe.run({
+"search": {"query": question},
+"ranker": {"query": question},
+"prompt_builder": {"question": question}
+})
+# returns {
+# 'llm': {
+# 'replies': [
+# 'The official language of the Republic of Rose Island was Esperanto.'
+# ],
+# 'metadata': [...]
+# }
+# }
+
+
With this last addition, we added quite a bit of robustness to our pipeline, making it less likely to fail.
Web RAG is a use case that can be expanded to cover many use cases, resulting in very complex pipelines. Haystack helps make sense of their complexity by pipeline graphs and detailed error messages in case of mismatch connections. However, pipelines this large can become overwhelming, especially when more branches are added.
+
In one of our next posts, we will see how to cover such use cases while keeping the resulting complexity as low as possible.
Setting up a Raspberry Pi headless without the Raspberry Pi Imager used to be a fairly simple process for the average Linux user, to the point where a how-to and a few searches on the Raspberry Pi forums would sort the process out. After flashing the image with dd, creating ssh in the boot partition and populating wpa_supplicant.conf was normally enough to get started.
But what does the Imager really do to configure the OS? Is it really that complex that it requires downloading a dedicated installer?
+
In this post I’m going to find out first how to get the OS connect to the WiFi without Imager, and then I’m going to dig a bit deeper to find out why such advice is given and how the Imager performs this configuration step.
In the announcement of the new OS release, one of the highlights is the move to NetworkManager as the default mechanism to deal with networking. While this move undoubtely brings many advantages, it is the reason why the classic technique of dropping a wpa_supplicant.conf file under /etc/wpa_supplicant/ no longer works.
+
The good news is that also NetworkManager can be manually configured with a text file. The file needs to be called SSID.nmconnection (replace SSID with your network’s SSID) and placed under /etc/NetworkManager/system-connections/ in the Pi’s rootfs partition.
+
[connection]
+
+id=SSID
+uuid= # random UUID in the format 11111111-1111-1111-1111-111111111111
+type=wifi
+autoconnect=true
+
+[wifi]
+mode=infrastructure
+ssid=SSID
+
+[wifi-security]
+auth-alg=open
+key-mgmt=wpa-psk
+psk=PASSWORD
+
+[ipv4]
+method=auto
+
+[ipv6]
+method=auto
+
(replace SSID and PASSWORD with your wifi network’s SSID and password). Here you can find the full syntax for this file.
+
You’ll need also to configure its access rights as:
So far it doesn’t seem too complicated. However, interestingly, this is not what the Raspberry Pi Imager does, because if you use it to flash the image and check the result, these files are nowhere to be found. Is there a better way to go about this?
To find out what the Imager does, my first idea was to have a peek at its source code. Being a Qt application the source might be quite intimidating, but with a some searching it’s possible to locate this interesting snippet:
I’m no C++ expert, but this function tells me a few things:
+
+
The Imager writes the configuration in these files: config.txt, cmdline.txt, firstuse.sh (we’ll soon figure out this is a typo: the file is actually called firstrun.sh).
+
It also prepares a “Cloudinit” configuration file, but it’s unclear if it writes it and where
+
The content of these files is printed to the console as debug output.
+
+
So let’s enable the debug logs and see what they produce:
+
rpi-imager --debug
+
The console stays quiet until I configure the user, password, WiFi and so on in the Imager, at which point it starts printing all the expected configuration files to the console.
Note the reference to /boot/firstrun.sh. If you plan to implement your own firstrun.sh file and want to change its name, don’t forget to modify this line as well.
+
+
That’s a lot of Bash in one go, but upon inspection one can spot a recurring pattern. For example, when setting the hostname, it does this:
The script clearly messages that there is a “preferred” way to set the hostname: to use /usr/lib/raspberrypi-sys-mods/imager_custom set_hostname [NAME]. Only if this executable is not available, then it falls back to the “traditional” way of setting the hostname by editing /etc/hosts.
+
The same patterns repeat a few times to perform the following operations:
+
+
set the hostname (/usr/lib/raspberrypi-sys-mods/imager_custom set_hostname [NAME])
configure the user (/usr/lib/userconf-pi/userconf [USERNAME] [HASHED-PASSWORD])
+
configure the WiFi (/usr/lib/raspberrypi-sys-mods/imager_custom set_wlan [MY-SSID [MY-PASSWORD] [2-LETTER-COUNTRY-CODE])
+
set the keyboard layout (/usr/lib/raspberrypi-sys-mods/imager_custom set_keymap [CODE])
+
set the timezone (/usr/lib/raspberrypi-sys-mods/imager_custom set_timezone [TIMEZONE-NAME])
+
+
It seems like using raspberrypi-sys-mods to configure the OS at the first boot is the way to go in this RPi OS version, and it might be true in future versions as well. There are hints that the Raspberry PI OS team is going to move to cloud-init in the near future, but for now this seems to be the way that the initial setup is done.
So let’s check out what raspberrypi-sys-mods do! The source code can be found here: raspberrypi-sys-mods.
+
Given that we’re interested in the WiFi configuration, let’s head straight to the imager_custom script (here), where we discover that it’s a Bash script which does this:
So after all this searching, we’re back to square one. This utility is doing exactly what we’ve done at the start: it writes a NetworkManager configuration file called preconfigured.nmconnection and it fills it in with the information that we’ve provided to the Imager, then changes the permissions to make sure NetworkManager can use it.
It would be great if the Raspberry Pi OS team would expand their documentation to include this information, so that users aren’t left wondering what makes the RPi Imager so special and whether their manual setup is the right way to go or rather a hack that is likely to break. For now it seems like there is one solid good approach to this problem, and we are going to see what is going to change in the next version of the Raspberry Pi OS.
+
On this note you should remember that doing a manual configuration of NetworkManager, using the Imager, or using raspberrypi-sys-mods may be nearly identical right now, but when choosing which approach to use for your project you should also keep in mind the maintenance burden that this decision brings.
+
Doing a manual configuration is easier on many levels, but only if you don’t intend to support other versions of RPi OS. If you do, or if you expect to migrate when a new version comes out, you should consider doing something similar to what the Imager does: use a firstrun.sh file that tries to use raspberrypi-sys-mods and falls back to a manual configuration only if that executable is missing. That is likely to make migrations easier if the Raspberry Pi OS team should choose once again to modify the way that headless setups work.
This blogpost is a teaser for my upcoming talk at ODSC East 2024 in Boston, April 23-25. It is published on the ODSC blog at this link.
+
+
Retrieval Augmented Generation (RAG) is by far one of the most popular and effective techniques to bring LLMs to production. Introduced by a Meta paper in 2021, it since took off and evolved to become a field in itself, fueled by the immediate benefits that it provides: lowered risk of hallucinations, access to updated information, and so on. On top of this, RAG is relatively cheap to implement for the benefit it provides, especially when compared to costly techniques like LLM finetuning. This makes it a no-brainer for a lot of usecases, to the point that nowadays every production system that uses LLMs in production seems to be implemented as some form of RAG.
However, retrieval augmentation is not a silver bullet that many claim it is. Among all these obvious benefits, RAG brings its own set of weaknesses and limitations, which it’s good to be aware of when scale and accuracy need to be improved further.
At a high level, RAG introduces a retrieval step right before the LLM generation. This means that we can classify the failure modes of a RAG system into two main categories:
+
+
+
Retrieval failures: when the retriever returns only documents which are irrelevant to the query or misleading, which in turn gives the LLM wrong information to build the final answer from.
+
+
+
Generation failures: when the LLM generates a reply that is unrelated or directly contradicts the documents that were retrieved. This is a classic LLM hallucination.
+
+
+
When developing a simple system or a PoC, these sorts of errors tends to have a limited impact on the results as long as you are using the best available tools. Powerful LLMs such as GPT 4 and Mixtral are not at all prone to hallucination when the provided documents are correct and relevant, and specialized systems such as vector databases, combined with specialized embedding models, that can easily achieve high retrieval accuracy, precision and recall on most queries.
+
However, as the system scales to larger corpora, lower quality documents, or niche and specialized fields, these errors end up amplifying each other and may degrade the overall system performance noticeably. Having a good grasp of the underlying causes of these issues, and an idea of how to minimize them, can make a huge difference.
+
+
The difference between retrieval and generation failures. Identifying where your RAG system is more likely to fail is key to improve the quality of its answers.
This is one of the most common applications of LLMs is a chatbot that helps users by answering their questions about a product or a service. Apps like this can be used in situations that are more or less sensitive for the user and difficult for the LLM: from simple developer documentation search, customer support for airlines or banks, up to bots that provide legal or medical advice.
+
These three systems are very similar from a high level perspective: the LLM needs to use snippets retrieved from a a corpus of documents to build a coherent answer for the user. In fact, RAG is a fitting architecture for all of them, so let’s assume that all the three systems are build more or less equally, with a retrieval step followed by a generation one.
+Let’s see what are the challenges involved in each of them.
For this usecase, RAG is usually sufficient to achieve good results. A simple proof of concept may even overshoots expectations.
+
When present and done well, developer documentation is structured and easy for a chatbot to understand. Retrieval is usually easy and effective, and the LLM can reinterpret the retrieved snippets effectively. On top of that, hallucinations are easy to spot by the user or even by an automated system like a REPL, so they have a limited impact on the perceived quality of the results.
+
As a bonus, the queries are very likely to always be in English, which happens to be the case for the documentation too and to be the language which LLMs are the strongest at.
+
+
The MongoDB documentation provides a chatbot interface which is quite useful.
In this case, the small annoyances that are already present above have a much stronger impact.
+
Even if your airline or bank’s customer support pages are top notch, hallucinations are not as easy to spot, because to make sure that the answers are accurate the user needs to check the sources that the LLM is quoting… which voids the whole point of the generation step. And what if the user cannot read such pages at all? Maybe they speak a minority language, so they can’t read them. Also, LLMs tend to perform worse on languages other than English and hallucinate more often, exacerbating the problem where it’s already more acute.
+
+
You are going to need a very good RAG system and a huge disclaimer to avoid this scenario.
The third case brings the exact same issues to a whole new level. In these scenarios, vanilla RAG is normally not enough.
+
Laws and scientific articles are hard to read for the average person, require specialized knowledge to understand, and they need to be read in context: asking the user to check the sources that the LLM is quoting is just not possible. And while retrieval on this type of documents is feasible, its accuracy is not as high as on simple, straightforward text.
+
Even worse, LLMs often have no reliable background knowledge on these topics, so their reply need to be strongly grounded by relevant documents for the answers to be correct and dependable. While a simple RAG implementation is still better than a vanilla reply from GPT-4, the results can be problematic in entirely different ways.
Moving your simple PoC to real world use cases without reducing the quality of the response requires a deeper understanding of how the retrieval and the generation work together. You need to be able to measure your system’s performance, to analyze the causes of the failures, and to plan experiments to improve such metrics. Often you will need to complement it with other techniques that can improve its retrieval and generation abilities to reach the quality thresholds that makes such a system useful at all.
+
In my upcoming talk at ODSC East “RAG, the bad parts (and the good!): building a deeper understanding of this hot LLM paradigm’s weaknesses, strengths, and limitations” we are going to cover all these topics:
+
+
+
how to measure the performance of your RAG applications, from simple metrics like F1 to more sophisticated approaches like Semantic Answer Similarity.
+
+
+
how to identify if you’re dealing with a retrieval or a generation failure and where to look for a solution: is the problem in your documents content, in their size, in the way you chunk them or embed them? Or is the LLM that is causing the most trouble, maybe due to the way you are prompting it?
+
+
+
what techniques can help you raise the quality of the answers, from simple prompt engineering tricks like few-shot prompting, all the way up to finetuning, self-correction loops and entailment checks.
+
+
+
Make sure to attend to the talk to learn more about all these techniques and how to apply them in your projects.
As everyone who has been serious about studying with Anki knows, the first step of the journey is writing your own flashcards. Writing the cards yourself is often cited as the most straigthforward way to make the review process more effective. However, this can become a big chore, and not having enough cards to study is a sure way to not learn anything.
A lot has been written about the best way to create Anki cards. However, as a HackerNews commenter once said:
+
+
One massively overlooked way to improve spaced repetition is to make easier cards.
+
+
Cards can hardly be too simple to be effective. You don’t need to write complicated tricky questions to make sure you are making the most of your reviews. On the contrary, even a long sentence where the word you need to study is highlighted is often enough to make the review worthwhile.
+
In the case of language learning, if you’re an advanced learner one of the easiest way to create such cards is to copy-paste a sentence with your target word into a card and write the translation of that word (or sentence) on the back. But if you’re a beginner, even these cards can be complicated both to write and to review. What if the sentence where you found the new word is too complex? You’ll need to write a brand new sentence. But what if you write an incorrect sentence? And so on.
Automated card generation has been often compared to the usage of pre-made decks, because the students don’t see the content of the cards they’re adding to their decks before doing so. However, this depends a lot on how much the automation is hiding from the user.
+
In my family we’re currently learning Portuguese, so we end up creating a lot of cards with Portuguese vocabulary. Given that many useful words are hard to make sense of without context, having cards with sample sentences helps me massively to remember them. But our sample sentences often sound unnatural in Portuguese, even when they’re correct. It would be great if we could have a “sample sentence generator” that creates such sample sentences for me in more colloquial Portuguese!
+
This is when we’ve got the idea of using an LLM to help with the task. GPT models are great sentence generators: can we get them to make some good sample sentence cards?
+
A quick experiment proves that there is potential to this concept.
The natural next step is to store that set of instructions into a custom prompt, or as they’re called now, a custom GPT. Making these small wrapper is really easy: it requires no coding, only a well crafted prompt and a catchy name. So we called our new GPT “ClozeGPT” and started off with a prompt like this:
+
Your job is to create Portuguese Anki cloze cards.
+I might give you a single word or a pair (word + translation).
+
+Front cards:
+- Use Anki's `{{c1::...}}` feature to template in cards.
+- You can create cards with multiple clozes.
+- Keep the verb focused, and don't rely too much on auxiliary verbs like
+ "precisar", "gostar", etc...
+- Use the English translation as a cloze hint.
+
+Back cards:
+- The back card should contain the Portuguese word.
+- If the word could be mistaken (e.g. "levantar" vs. "acordar"),
+ write a hint that can help me remember the difference.
+- The hint should be used sparingly.
+
+Examples:
+
+---------
+
+Input: cozinhar
+
+# FRONT
+```
+Eu {{c1::cozinho::cook}} todos os dias para minha família.
+```
+
+# BACK
+```
+cozinhar - to cook
+```
+---------
+
+Input: levantar
+
+# FRONT
+```
+Eu preciso {{c1::levantar::get up}} cedo amanhã para ir ao trabalho.
+```
+
+# BACK
+```
+levantar - to get up, to raise (don't mistake this with "acordar", which is to wake up from sleep)
+```
+
+
This simple prompt already gives very nice results!
Naturally, once a tool works well it’s hard to resist the urge to add some new features to it. So for our ClozeGPT we added a few more abilities:
+
# Commands
+
+## `+<<word>>`
+Expands the back card with an extra word present in the sentence.
+Include all the previous words, plus the one given.
+In this case, only the back card needs to be printed; don't show the front card again.
+
+## `R[: <<optional hint>>]`
+Regenerates the response based on the hint given.
+If the hint is absent, regenerate the sentence with a different context.
+Do not change the target words, the hint most often a different context I would like to have a sentence for.
+
+## `Q: <<question>>`
+This is an escape to a normal chat about a related question.
+Answer the question as usual, you don't need to generate anything.
+
+
The + command is useful when the generated sentence contains some other interesting word you can take the occasion to learn as well:
+
+
The R command can be used to direct the card generation a bit better than with a simple press on the “Regenerate” icon:
+
+
And finally Q is a convenient escape hatch to make this GPT revert back to its usual helpful self, where it can engage in conversation.
Our small ClozeGPT works only for Portuguese now, but feel free to play with it if you find it useful. And, of course, always keep in mind that LLMs are only pretending to be humans.
These days everyone’s boss seems to want some form of GenAI in their products. That doesn’t always make sense: however, understanding when it does and when it doesn’t is not obvious even for us experts, and nearly impossible for everyone else.
+
How can we help our colleagues understand the pros and cons of this tech, and figure out when and how it makes sense to use it?
+
In this post I am going to outline a narrative that explains LLMs without tecnicalities and help you frame some high level technical decisions, such as RAG vs finetuning, or which specific model size to use, in a way that a non-technical audience can not only grasp but also reason about. We’ll start by “translating” a few terms into their “human equivalent” and then use this metaphor to reason about the differences between RAG and finetuning.
Large Language Models are often described as “super-intelligent” entities that know far more than any human could possibly know. This makes a lot of people think that they are also extremely intelligent and are able to reason about anything in a super-human way. The reality is very different: LLMs are able to memorize and repeat far more facts that humans do, but in their abilities to reason they are often inferior to the average person.
+
Rather than describing LLMs as all-knowing geniuses, it’s much better to frame them as an average high-school student. They’re not the smartest humans on the planet, but they can help a lot if you guide them through the process. And just as a normal person might, sometimes they forget things, and occasionally they remember them wrong.
Language models are not all born equal. Some are inherently able to do more complex reasoning, to learn more facts and to talk more smoothly in more languages.
+
The “IQ” of an LLM can be approximated, more or less, to its parameter count. An LLM with 7 billion parameters is almost always less clever than a 40 billion parameter model, will have a harder time learning more facts, and will be harder to reason with.
+
However, just like with real humans, there are exceptions. Recent “small” models can easily outperform older and larger models, due to improvements in the way they’re built. Also, some small models are very good at some very specialized job and can outperform a large, general purpose model on that task.
Another similarity to human students is that LLMs also learn all the fact they know by “going to school” and studying a ton of general and unrelated facts. This is what training an LLM means. This implies that, just like with students, an LLM needs a lot of varied material to study from. This material is what is usually called “training data” or “training dataset”.
+
They can also learn more than what they currently know and specialize on a topic: all they need to do is to study further on it. This is what finetuning represents, and as you would expect, it also needs some study material. This is normally called “finetuning data/datasets”.
+
The distinction between training and fine tuning is not much about how it’s done, but mostly about the size and contents of the dataset required. The initial training usually takes a lot of time, computing power, and tons of very varied data, just like what’s needed to bring a baby to the level of a high-schooler. Fine tuning instead looks like preparing for a specific test or exam: the study material is a lot less and a lot more specific.
+
Keep in mind that, just like for humans, studying more can make a student a bit smarter, but it won’t make it a genius. In many cases, no amount of training and/or finetuning can close the gap between the 7 billion parameter version of an LLM and the 40 billion one.
One of the most common usecases for LLMs is question answering, an NLP task where users ask questions to the model and expect a correct answer back. The fact that the answer must be correct means that this interaction is very similar to an exam: the LLM is being tested by the user on its knowledge.
+
This means that, just like a student, when the LLM is used directly it has to rely on its knowledge to answer the question. If it studied the topic well it will answer accurately most of the times. However if it didn’t study the subject, it will do what students always do: they will make up stuff that sounds legit, hoping that the teacher will not notice how little they know. This is what we call hallucinations.
+
When the answer is known to the user the answer of the LLM can be graded, just like in a real exam, to make the LLM improve. This process is called evaluation. Just like with humans, there are many ways in which the answer can be graded: the LLM can be graded on the accuracy of the facts it recalled, or the fluency it delivered its answer with, or it can be scored on the correctness of a reasoning exercise, like a math problem. These ways of grading an LLM are called metrics.
Hallucinations are very dangerous if the user doesn’t know what the LLM was supposed to reply, so they need to be reduced, possibly eliminated entirely. It’s like we really need the students to pass the exam with flying colors, no matter how much they studied.
+
Luckily there are many ways to help our student succeed. One way to improve the score is, naturally, to make them study more and better. Giving them more time to study (more finetuning) and better material (better finetuning datasets) is one good way to make LLMs reply correctly more often. The issue is that this method is expensive, because it needs a lot of computing power and high quality data, and the student may still forget something during the exam.
+
We can make the exams even easier by converting them into open-book exams. Instead of asking the students to memorize all the facts and recall them during the exam, we can let them bring the book and lookup the information they need when the teacher asks the question. This method can be applied to LLMs too and is called RAG, which stands for “retrieval augmented generation”.
+
RAG has a few interesting properties. First of all, it can make very easy even for “dumb”, small LLMs to recall nearly all the important facts correctly and consistently. By letting your students carry their history books to the exam, all of them will be able to tell you the date of any historical event by just looking it up, regardless of how smart they are or how much they studied.
+
RAG doesn’t need a lot of data, but you need an efficient way to access it. In our metaphor, you need a well structured book with a good index to help the student find the correct facts when asked, or they might fail to find the information they need when they’re quizzed.
+
A trait that makes RAG unique is that is can be used to keep the LLM up-to-date with information that can’t be “studied” because it changes too fast. Let’s imagine a teacher that wants to quiz the students about today’s stock prices. They can’t expect the pupils to know them if they don’t have access to the latest financial data. Even if they were to study the prices every hour the result would be quite pointless, because all the knowledge they acquire becomes immediately irrelevant and might even confuse them.
+
Last but not least, RAG can be used together with finetuning. Just as a teacher can make students study the topic and then also bring the book to the exam to make sure they will answer correctly, you can also use RAG and finetuning together.
+
However, there are situations where RAG doesn’t help. For example, it’s pointless if the questions are asked in language that the LLM doesn’t know, or if the exam is made of tasks that require complex reasoning. This is true for human students too: books won’t help them much to understand a foreign language to the point that they can take an exam in it, and won’t be useful to crack a hard math problem. For these sort of exams the students just need to be smart and study more, which in LLM terms means that you should prefer a large model and you probably need to finetune it.
Training an LLM is the same as making a student go to school
+
Finetuning it means to make it specialize on a subject by making it study only books and other material on the subject
+
A training dataset is the books and material the student needs to study on
+
User interactions are like university exams
+
Evaluating an LLM means to score its answers as if they were the responses to a test
+
A metric is a type of evaluation that focuses on a specific trait of the answer
+
A hallucination is a wrong answer that the LLM makes up just like a student would, to in order to try passing an exam when it doesn’t know the answer or can’t recall it in that moment
+
RAG (retrieval augmented generation) is like an open-book exam: it gives the LLM access to some material on the question’s topic, so it won’t need to hallucinate an answer. It will help the LLM recall facts, but it won’t make it smarter.
+
+
By drawing a parallel with a human student it can be a lot easier to explain to non-technical audience why some decisions were taken.
+
For example, it might not be obvious why RAG is cheaper than finetuning, because both need domain-specific data. By explaining that RAG is like an open-book exam versus a closed-book one, the difference is clearer: the students need less time and effort to prepare and they’re less likely to make trivial mistakes if they can bring the book with them at the exam.
+
Another example is hallucinations. It’s difficult for many people to understand why LLMs don’t like to say “I don’t know”, until they realise that from the LLM’s perspective every question is like an exam: better make up something that admit they’re unprepared! And so on.
+
Building a shared, simple intuition of how LLM works is a very powerful tool. Next time you’re asked to explain a technical decision related to LLMs, building a story around it may get the message across in a much more effective way and help everyone be on the same page. Give it a try!
If you’ve been at any AI or Python conference this year, there’s one acronym that you’ve probably heard in nearly every talk: it’s RAG. RAG is one of the most used techniques to enhance LLMs in production, but why is it so? And what are its weak points?
+
In this post, we will first describe what RAG is and how it works at a high level. We will then see what type of failures we may encounter, how they happen, and a few reasons that may trigger these issues. Next, we will look at a few tools to help us evaluate a RAG application in production. Last, we’re going to list a few techniques to enhance your RAG app and make it more capable in a variety of scenarios.
RAG stands for Retrieval Augmented Generation, which can be explained as: “A technique to augment LLM’s knowledge beyond its training data by retrieving contextual information before a generating an answer.”
+
+
RAG is a technique that works best for question-answering tasks, such as chatbots or similar knowledge extraction applications. This means that the user of a RAG app is a user who needs an answer to a question.
+
The first step of RAG is to take the question and hand it over to a component called retriever. A retriever is any system that, given a question, can find data relevant to the question within a vast dataset, be it text, images, rows in a DB, or anything else.
+
When implementing RAG, many developers think immediately that a vector database is necessary for retrieval. While vector databases such as Qdrant, ChromaDB, Weaviate and so on, are great for retrieval in some applications, they’re not the only option. Keyword-based algorithms such as Elasticsearch BM25 or TF-IDF can be used as retrievers in a RAG application, and you can even go as far as using a web search engine API, such as Google or Bing. Anything that is given a question and can return information relevant to the question can be used here.
+
Once our retriever sifted through all the data and returned a few relevant snippets of context, the question and the context are assembled into a RAG prompt. It looks like this:
+
Read the text below and answer the question at the bottom.
+
+Text: [all the text found by the retriever]
+
+Question: [the user's question]
+
This prompt is then fed to the last component, called a generator. A generator is any system that, given a prompt, can answer the question that it contains. In practice, “generator” is an umbrella term for any LLM, be it behind an API like GPT-3.5 or running locally, such as a Llama model. The generator receives the prompt, reads and understands it, and then writes down an answer that can be given back to the user, closing the loop.
There are three main benefits of using a RAG architecture for your LLM apps instead of querying the LLM directly.
+
+
+
Reduces hallucinations. The RAG prompt contains the answer to the user’s question together with the question, so the LLM doesn’t need to know the answer, but it only needs to read the prompt and rephrase a bit of its content.
+
+
+
Allows access to fresh data. RAG makes LLMs capable of reasoning about data that wasn’t present in their training set, such as highly volatile figures, news, forecasts, and so on.
+
+
+
Increases transparency. The retrieval step is much easier to inspect than LLM’s inference process, so it’s far easier to spot and fact-check any answer the LLM provides.
+
+
+
To understand these points better, let’s see an example.
We’re making a chatbot for a weather forecast app. Suppose the user asks an LLM directly, “Is it going to rain in Lisbon tomorrow morning?”. In that case, the LLM will make up a random answer because it obviously didn’t have tomorrow’s weather forecast for Lisbon in its training set and knows nothing about it.
+
When an LLM is queried with a direct question, it will use its internal knowledge to answer it. LLMs have read the entire Internet during their training phase, so they learned that whenever they saw a line such as “What’s the capital of France?”, the string “Paris” always appeared among the following few words. So when a user asks the same question, the answer will likely be “Paris”.
+
This “recalling from memory” process works for well-known facts but is not always practical. For more nuanced questions or something that the LLM hasn’t seen during training, it often fails: in an attempt to answer the question, the LLM will make up a response that is not based on any real source. This is called a hallucination, one of LLMs’ most common and feared failure modes.
+
RAG helps prevent hallucinations because, in the RAG prompt, the question and all the data needed to answer it are explicitly given to the LLM. For our weather chatbot, the retriever will first do a Google search and find some data. Then, we will put together the RAG prompt. The result will look like this:
+
Read the text below and answer the question at the bottom.
+
+Text: According to the weather forecast, the weather in Lisbon tomorrow
+is expected to be mostly sunny, with a high of 18°C and a low of 11°C.
+There is a 25% chance of showers in the evening.
+
+Question: Is it going to rain in Lisbon tomorrow morning?
+
Now, it’s clear that the LLM doesn’t have to recall anything about the weather in Lisbon from its memory because the prompt already contains the answer. The LLM only needs to rephrase the context. This makes the task much simpler and drastically reduces the chances of hallucinations.
+
In fact, RAG is the only way to build an LLM-powered system that can answer a question like this with any confidence at all. Retraining an LLM every morning with the forecast for the day would be a lot more wasteful, require a ton of data, and won’t return consistent results. Imagine if we were making a chatbot that gives you figures from the stock market!
+
In addition, a weather chatbot built with RAG can be fact-checked. If users have access to the web pages that the retriever found, they can check the pages directly when the results are not convincing, which helps build trust in the application.
If you want to compare a well-implemented RAG system with a plain LLM, you can put ChatGPT (the free version, powered by GPT-3.5) and Perplexity to the test. ChatGPT does not implement RAG, while Perplexity is one of the most effective implementations existing today.
+
Let’s ask both: “Where does ODSC East 2024 take place?”
+
ChatGPT says:
+
+
While Perplexity says:
+
+
Note how ChatGPT clearly says that it doesn’t know: this is better than many other LLMs, which would just make up a place and date. On the contrary, Perplexity states some specific facts, and in case of doubt it’s easy to verify that it’s right by simply checking the sources above. Even just looking at the source’s URL can give users a lot more confidence in whether the answer is grounded.
Now that we understand how RAG works, let’s see what can go wrong in the process.
+
As we’ve just described, a RAG app goes in two steps – retrieval and generation. Therefore, we can classify RAG failures into two broad categories:
+
+
+
Retrieval failures: The retriever component fails to find the correct context for the given question. The RAG prompt injects irrelevant noise into the prompt, which confuses the LLM and results in a wrong or unrelated answer.
+
+
+
Generation failures: The LLM fails to produce a correct answer even with a proper RAG prompt containing a question and all the data needed to answer it.
+
+
+
To understand them better, let’s pretend an imaginary user poses our application the following question about a little-known European microstate:
+
What was the official language of the Republic of Rose Island?
+
Here is what would happen in an ideal case:
+
+
First, the retriever searches the dataset (let’s imagine, in this case, Wikipedia) and returns a few snippets. The retriever did a good job here, and the snippets contain clearly stated information about the official language of Rose Island. The LLM reads these snippets, understands them, and replies to the user (correctly):
+
The official language of the Republic of Rose Island was Esperanto.
+
What would happen if the retrieval step didn’t go as planned?
+
+
Here, the retriever finds some information about Rose Island, but none of the snippets contain any information about the official language. They only say where it was located, what happened to it, and so on. So the LLM, which knows nothing about this nation except what the prompt says, takes an educated guess and replies:
+
The official language of the Republic of Rose Island was Italian.
+
The wrong answer here is none of the LLM’s fault: the retriever is the component to blame.
+
When and why can retrieval fail? There are as many answers to this question as retrieval methods, so each should be inspected for its strengths and weaknesses. However there are a few reasons that are common to most of them.
+
+
+
The relevant data does not exist in the database. When the data does not exist, it’s impossible to retrieve it. Many retrieval techniques, however, give a relevance score to each result that they return, so filtering out low-relevance snippets may help mitigate the issue.
+
+
+
The retrieval algorithm is too naive to match a question with its relevant context. This is a common issue for keyword-based retrieval methods such as TF-IDF or BM25 (Elasticsearch). These algorithms can’t deal with synonims or resolve acronyms, so if the question and the relevant context don’t share the exact same words, the retrieval won’t work.
+
+
+
Embedding model (if used) is too small or unsuitable for the data. The data must be embedded before being searchable when doing a vector-based search. “Embedded” means that every snippet of context is associated with a list of numbers called an embedding. The quality of the embedding then determines the quality of the retrieval. If you embed your documents with a naive embedding model, or if you are dealing with a very specific domain such as narrow medical and legal niches, the embedding of your data won’t be able to represent their content precisely enough for the retrieval to be successful.
+
+
+
The data is not chunked properly (too big or too small chunks). Retrievers thrive on data that is chunked properly. Huge blocks of text will be found relevant to almost any question and will drown the LLM with information. Too small sentences or sentence fragments won’t carry enough context for the LLM to benefit from the retriever’s output. Proper chunking can be a huge lever to improve the quality of your retrieval.
+
+
+
The data and the question are in different languages. Keyword-based retrieval algorithms suffer from this issue the most because keywords in different languages rarely match. If you expect questions to come in a different language than the data you are retrieving from, consider adding a translation step or performing retrieval with a multilingual embedder instead.
+
+
+
One caveat with retrieval failures is that if you’re using a very powerful LLM such as GPT-4, sometimes your LLM is smart enough to understand that the retrieved context is incorrect and will discard it, hiding the failure. This means that it’s even more important to make sure retrieval is working well in isolation, something we will see in a moment.
Assuming that retrieval was successful, what would happen if the LLM still hallucinated?
+
+
This is clearly an issue with our LLM: even when given all the correct data, the LLM still generated a wrong answer. Maybe our LLM doesn’t know that Esperanto is even a language? Or perhaps we’re using an LLM that doesn’t understand English well?
+
Naturally, each LLM will have different weak points that can trigger issues like these. Here are some common reasons why you may be getting generation failures.
+
+
+
The model is too small and can’t follow instructions well. When building in a resource-constrained environment (such as local smartphone apps or IoT), the choice of LLMs shrinks to just a few tiny models. However, the smaller the model, the less it will be able to understand natural language, and even when it does, it limits its ability to follow instructions. If you notice that your model consistently doesn’t pay enough attention to the question when answering it, consider switching to a larger or newer LLM.
+
+
+
The model knows too little about the domain to even understand the question. This can happen if your domain is highly specific, uses specific terminology, or relies on uncommon acronyms. Models are trained on general-purpose text, so they might not understand some questions without finetuning, which helps specify the meaning of the most critical key terms and acronyms. When the answers given by your model somewhat address the question but miss the point entirely and stay generic or hand-wavy, this is likely the case.
+
+
+
The model is not multilingual, but the questions and context may be. It’s essential that the model understands the question being asked in order to be able to answer it. The same is true for context: if the data found by the retriever is in a language that the LLM cannot understand, it won’t help it answer and might even confuse it further. Always make sure that your LLM understands the languages your users use.
+
+
+
The RAG prompt is not built correctly. Some LLMs, especially older or smaller ones, may be very sensitive to how the prompt is built. If your model ignores part of the context or misses the question, the prompt might contain contradicting information, or it might be simply too large. LLMs are not always great at finding a needle in the haystack: if you are consistently building huge RAG prompts and you observe generation issues, consider cutting it back to help the LLM focus on the data that actually contains the answer.
Once we put our RAG system in production, we should keep an eye on its performance at scale. This is where evaluation frameworks come into play.
+
To properly evaluate the performance of RAG, it’s best to perform two evaluation steps:
+
+
+
Isolated Evaluation. Being a two-step process, failures at one stage can hide or mask the other, so it’s hard to understand where the failures originate from. To address this issue, evaluate the retrieval and generation separately: both must work well in isolation.
+
+
+
End to end evaluation. To ensure the system works well from start to finish, it’s best to evaluate it as a whole. End-to-end evaluation brings its own set of challenges, but it correlates more directly to the quality of the overall app.
Each retrieval method has its own state-of-the-art evaluation method and framework, so it’s usually best to refer to those.
+
For keyword-based retrieval algorithms such as TD-IDF, BM25, PageRank, and so on, evaluation is often done by checking the keywords match well. For this, you can use one of the many metrics used for this purpose: recall, precision, F1, MRR, MAP, …
+
For vector-based retrievers like vector DBs, the evaluation is more tricky because checking for matching keywords is not sufficient: the semantics of the question and the answer must evaluated for similarity. We are going to see some libraries that help with this when evaluating generation: in short, they use another LLM to judge the similarity or compute metrics like semantic similarity.
Evaluating an LLM’s answers to a question is still a developing art, and several libraries can help with the task. One commonly used framework is UpTrain, which implements an “LLM-as-a-judge” approach. This means that the answers given by an LLM are then evaluated by another LLM, normally a larger and more powerful model.
+
This approach has the benefit that responses are not simply checked strictly for the presence or absence of keywords but can be evaluated according to much more sophisticated criteria like completeness, conciseness, relevance, factual accuracy, conversation quality, and more.
+
This approach leads to a far more detailed view of what the LLM is good at and what aspects of the generation could or should be improved. The criteria to select depend strongly on the application: for example, in medical or legal apps, factual accuracy should be the primary metric to optimize for, while in customer support, user satisfaction and conversation quality are also essential. For personal assistants, it’s usually best to focus on conciseness, and so on.
+
+
💡 UpTrain can also be used to evaluate RAG applications end-to-end. Check its documentation for details.
The evaluation of RAG systems end-to-end is also quite complex and can be implemented in many ways, depending on the aspect you wish to monitor. One of the simplest approaches is to focus on semantic similarity between the question and the final answer.
+
A popular framework that can be used for such high-level evaluation is RAGAS. In fact, RAGAS offers two interesting metrics:
+
+
+
Answer semantic similarity. This is computed simply by taking the cosine similarity between the answer and the ground truth.
+
+
+
Answer correctness. Answer correctness is defined as a weighted average of the semantic similarity and the F1 score between the generated answer and the ground truth. This metric is more oriented towards fact-based answers, where F1 can help ensure that relevant facts such as dates, names, and so on are explicitly stated.
+
+
+
On top of evaluation metrics, RAGAS also offers the capability to build synthetic evaluation datasets to evaluate your app against. Such datasets spare you the work-intensive process of building a real-world evaluation dataset with human-generated questions and answers but also trade high quality for volume and speed. If your domain is very specific or you need extreme quality, synthetic datasets might not be an option, but for most real-world apps, such datasets can save tons of labeling time and resources.
+
+
💡 RAGAS can also be used to evaluate each step of a RAG application in isolation. Check its documentation for details.
+
+
+
+
💡 I recently discovered an even more comprehensive framework for end-to-end evaluation called continuous-eval from Relari.ai, which focuses on modular evaluation of RAG pipelines. Check it out if you’re interested in this topic and RAGAS doesn’t offer enough flexibility for your use case.
Once you know how you want to evaluate your app, it’s time to put it together. A convenient framework for this step is Haystack, a Python open-source LLM framework focused on building RAG applications. Haystack is an excellent choice because it can be used through all stages of the application lifecycle, from prototyping to production, including evaluation.
+
Haystack supports several evaluation libraries including UpTrain, RAGAS and DeepEval. To understand more about how to implement and evaluate a RAG application with it, check out their tutorial about model evaluation here.
Once our RAG app is ready and deployed in production, the natural next step is to look for ways to improve it even further. RAG is a very versatile technique, and many different flavors of “advanced RAG” have been experimented with, many more than I can list here. Depending on the situation, you may focus on different aspects, so let’s list some examples of tactics you can deploy to make your pipeline more powerful, context-aware, accurate, and so on.
Sometimes, a RAG app needs access to vastly different types of data simultaneously. For example, a personal assistant might need access to the Internet, your Slack, your emails, your personal notes, and maybe even your pictures. Designing a single retriever that can handle data of so many different kinds is possible. Still, it can be a real challenge and require, in many cases, an entire data ingestion pipeline.
+
Instead of going that way, you can instead use multiple retrievers, each specialized to a specific subset of your data: for example, one retriever that browses the web, one that searches on Slack and in your emails, one that checks for relevant pictures.
+
When using many retrievers, however, it’s often best to introduce another step called reranking. A reranker double-checks that all the results returned by each retriever are actually relevant and sorts them again before the RAG prompt is built. Rerankers are usually much more precise than retrievers in assessing the relative importance of various snippets of context, so they can dramatically improve the quality of the pipeline. In exceptional cases, they can be helpful even in RAG apps with a single retriever.
+
Here is an example of such a pipeline built with Haystack.
We mentioned that one of the most common evaluation strategies for RAG output is “LLM-as-a-judge”: the idea of using another LLM to evaluate the answer of the first. However, why use this technique only for evaluation?
+
Self-correcting RAG apps add one extra step at the end of the pipeline: they take the answer, pass it to a second LLM, and ask it to assess whether the answer is likely to be correct. If the check fails, the second LLM will provide some feedback on why it believes the answer is wrong, and this feedback will be given back to the first LLM to try answering another time until an agreement is reached.
+
Self-correcting LLMs can help improve the accuracy of the answers at the expense of more LLM calls per user question.
In the LLMs field, the term “agent” or “agentic” is often used to identify systems that use LLMs to make decisions. In the case of a RAG application, this term refers to a system that does not always perform retrieval but decides whether to perform it by reading the question first.
+
For example, imagine we’re building a RAG app to help primary school children with their homework. When the question refers to topics like history or geography, RAG is very helpful to avoid hallucinations. However, if the question regards math, the retrieval step is entirely unnecessary, and it might even confuse the LLM by retrieving similar math problems with different answers.
+
Making your RAG app agentic is as simple as giving the question to an LLM before retrieval in a prompt such as:
+
Reply YES if the answer to this question should include facts and
+figures, NO otherwise.
+
+Question: What's the capital of France?
+
Then, retrieval is run or skipped depending on whether the answer is YES or NO.
+
This is the most basic version of agentic RAG. Some advanced LLMs can do better: they support so-called “function calling,” which means that they can tell you exactly how to invoke the retriever and even provide specific parameters instead of simply answering YES or NO.
+
For more information about function calling with LLMs, check out OpenAI’s documentation on the topic or the equivalent documentation of your LLM provider.
Multihop RAG is an even more complex version of agentic RAG. Multihop pipelines often use chain-of-thought prompts, a type of prompt that looks like this:
+
You are a helpful and knowledgeable agent.
+
+To answer questions, you'll need to go through multiple steps involving step-by-step
+thinking and using a search engine to do web searches. The browser will respond with
+snippets of text from web pages. When you are ready for a final answer, respond with
+`Final Answer:`.
+
+Use the following format:
+
+- Question: the question to be answered
+- Thought: Reason if you have the final answer. If yes, answer the question. If not,
+ find out the missing information needed to answer it.
+- Search Query: the query for the search engine
+- Observation: the search engine will respond with the results
+- Final Answer: the final answer to the question, make it short (1-5 words)
+
+Thought, Search Query, and Observation steps can be repeated multiple times, but
+sometimes, we can find an answer in the first pass.
+
+---
+
+- Question: "Was the capital of France founded earlier than the discovery of America?"
+- Thought:
+
This prompt is very complex, so let’s break it down:
+
+
The LLM reads the question and decides which information to retrieve.
+
The LLM returns a query for the search engine (or a retriever of our choice).
+
Retrieval is run with the query the LLM provided, and the resulting context is appended to the original prompt.
+
The entire prompt is returned to the LLM, which reads it, follows all the reasoning it did in the previous steps, and decides whether to do another search or reply to the user.
+
+
Multihop RAG is used for autonomous exploration of a topic, but it can be very expensive because many LLM calls are performed, and the prompts tend to become really long really quickly. The process can also take quite some time, so it’s not suitable for low-latency applications. However, the idea is quite powerful, and it can be adapted into other forms.
It’s important to remember that finetuning is not an alternative to RAG. Finetuning can and should be used together with RAG on very complex domains, such as medical or legal.
+
When people think about finetuning, they usually focus on finetuning the LLM. In RAG, though, it is not only the LLM that needs to understand the question: it’s crucial that the retriever understands it well, too! This means the embedding model needs finetuning as much as the LLM. Finetuning your embedding models, and in some cases also your reranker, can improve the effectiveness of your RAG by orders of magnitude. Such a finetune often requires only a fraction of the training data, so it’s well worth the investment.
+
Finetuning the LLM is also necessary if you need to alter its behavior in production, such as making it more colloquial, more concise, or stick to a specific voice. Prompt engineering can also achieve these effects, but it’s often more brittle and can be more easily worked around. Finetuning the LLM has a much more powerful and lasting effect.
RAG is a vast topic that could fill books: this was only an overview of some of the most important concepts to remember when working on a RAG application. For more on this topic, check out my other blog posts and stay tuned for future talks!
Having fun with fonts doesn’t always mean obsessing over kerning and ligatures. Sometimes, writing text is not even the point!
+
You don’t believe it? Type something in here.
+
+
+
+
+
+
+ Characters to generate:
+
+
+
+
+
+
+
+
Teranoptia is a cool font that lets you build small creatures by mapping each letter (and a few other characters) to a piece of a creature like a head, a tail, a leg, a wing and so on. By typing words you can create strings of creatures.
+
Here is the glyphset:
+
+
+
+
A
A
+
+
B
B
+
+
C
C
+
+
D
D
+
+
E
E
+
+
F
F
+
+
G
G
+
+
H
H
+
+
I
I
+
+
J
J
+
+
K
K
+
+
L
L
+
+
M
M
+
+
N
N
+
+
O
O
+
+
P
P
+
+
Q
Q
+
+
R
R
+
+
S
S
+
+
T
T
+
+
U
U
+
+
V
V
+
+
W
W
+
+
X
X
+
+
Ẋ
Ẋ
+
+
Y
Y
+
+
Z
Z
+
+
Ź
Ź
+
+
Ž
Ž
+
+
Ż
Ż
+
+
a
a
+
+
b
b
+
+
ḅ
ḅ
+
+
c
c
+
+
d
d
+
+
e
e
+
+
f
f
+
+
g
g
+
+
h
h
+
+
i
i
+
+
j
j
+
+
k
k
+
+
l
l
+
+
m
m
+
+
n
n
+
+
o
o
+
+
p
p
+
+
q
q
+
+
r
r
+
+
s
s
+
+
t
t
+
+
u
u
+
+
v
v
+
+
w
w
+
+
x
x
+
+
y
y
+
+
z
z
+
+
ź
ź
+
+
ž
ž
+
+
ż
ż
+
+
,
,
+
+
*
*
+
+
(
(
+
+
)
)
+
+
{
{
+
+
}
}
+
+
[
[
+
+
]
]
+
+
‐
‐
+
+
“
“
+
+
”
”
+
+
‘
‘
+
+
’
’
+
+
«
«
+
+
»
»
+
+
‹
‹
+
+
›
›
+
+
$
$
+
+
€
€
+
+
+You'll notice that there's a lot you can do with it, from assembling simple creatures:
+
+
vTN
+
+to more complex, multi-line designs:
+
+
{Ž}
+
F] [Z
+
+
+
+
Let’s play with it a bit and see how we can put together a few “correct” looking creatures.
+
+
As you’re about to notice, I’m no JavaScript developer. Don’t expect high-quality JS in this post.
To begin with, let’s start with a simple function: animal mirroring. The glyphset includes a mirrored version of each non-symmetric glyph, but the mapping is rather arbitrary, so we are going to need a map.
+
Here are the pairs:
+
By Ev Hs Kp Nm Ri Ve Za Żź Az Cx Fu Ir Lo Ol Sh Wd Źż vE Dw Gt Jq Mn Pk Qj Tg Uf Xc Ẋḅ Yb Žž bY cX () [] {}
While it’s fun to build complicated animals this way, you’ll notice something: it’s pretty hard to make them come out right by simply typing something. Most of the time you need quite careful planning. In addition there’s almost no meaningful (English) word that corresponds to a well-defined creature. Very often the characters don’t match, creating a sequence of “chopped” creatures.
+
For example, “Hello” becomes:
+
Hello
+
This is a problem if we want to make a parametric or random creature generator, because most of the random strings won’t look good.
There are many ways to define “good” or “well-formed” creatures. One of the first rules we can introduce is that we don’t want chopped body parts to float alone.
+
Translating it into a rule we can implement: a character that is “open” on the right must be followed by a character that is open on the left, and a character that is not open on the right must be followed by another character that is not open on the left.
+
For example, A may be followed by B to make AB, but A cannot be followed by C to make AC.
+
In the same way, Z may be followed by A to make ZA, but Z cannot be followed by ż to make Zż.
+
This way we will get rid of all those “chopped” monsters that make up most of the randomly generated string.
+
To summarize, the rules we have to implement are:
+
+
Any character that is open on the right must be followed by another character that is open on the left.
+
Any character that is closed on the right must be followed by another character that is closed on the left.
There are still a few things we may want to fix. For example, some animals end up being just a pair of heads (such as sN); others instead have their bodies oriented in the wrong direction (like IgV).
+
Let’s try to get rid of those too.
+
The trick here is to separate the characters into two groups: elements that are “facing left”, elements that are “facing right”, and symmetric ones. At this point, it’s convenient to call them “heads”, “bodies” and “tails” to make the code more understandable, like the following:
What we’ve defined up to this point is a set of rules that, given a string, determine what characters are allowed next. This is called a formal grammar in Computer Science.
+
A grammar is defined primarily by:
+
+
an alphabet of symbols (our Teranoptia font).
+
a set of starting characters: all the characters that can be used at the start of the string (such as a or *).
+
a set of terminating character: all the characters that can be used to terminate the string (such as d or -).
+
a set of production rules: the rules needed to generate valid strings in that grammar.
+
+
In our case, we’re looking for a grammar that defines “well formed” animals. For example, our production rules might look like this:
+
+
S (the start of the string) → a (a)
+
a (a) → ad (ad)
+
a (a) → ab (ab)
+
b (b) → bb (bb)
+
b (b) → bd (bd)
+
d (d) → E (the end of the string)
+
, (,) → E (the end of the string)
+
+
and so on. Each combination would have its own rule.
+
There are three main types of grammars according to Chomsky’s hierarchy:
+
+
Regular grammars: in all rules, the left-hand side is only a single nonterminal symbol and right-hand side may be the empty string, or a single terminal symbol, or a single terminal symbol followed by a nonterminal symbol, but nothing else.
+
Context-free grammars: in all rules, the left-hand side of each production rule consists of only a single nonterminal symbol, while the right-hand side may contain any number of terminal and non-terminal symbols.
+
Context-sensitive grammars: rules can contain many terminal and non-terminal characters on both sides.
+
+
In our case, all the production rules look very much like the examples we defined above: one character on the left-hand side, at most two on the right-hand side. This means we’re dealing with a regular grammar. And this is good news, because it means that this language can be encoded into a regular expression.
Regular expressions are a very powerful tool, one that needs to be used with care. They’re best used for string validation: given an arbitrary string, they are going to check whether it respects the grammar, i.e. whether the string it could have been generated by applying the rules above.
+
Having a regex for our Teranoptia animals will allow us to search for valid animals in long lists of stroings, for example an English dictionary. Such a search would have been prohibitively expensive without a regular expression: using one, while still quite costly, is orders of magnitude more efficient.
+
In order to build this complex regex, let’s start with a very limited example: a regex that matches left-facing snakes.
+
^(a(b|c|X|Y)*d)+$
+
This regex is fairly straightforward: the string must start with a (a), can contain any number of b (b), c (c), X (X) and Y (Y), and must end with d (d). While we’re at it, let’s add a + to the end, meaning that this pattern can repeat multiple times: the string will simply contain many snakes.
That looks super-promising until we realize that there’s a problem: this “snake” aZ also matches the regex. To generate well-formed animals we need to keep heads and tails separate. In the regex, it would look like:
Once here, building the rest of the regex is simply matter of adding the correct characters to each group. We’re gonna trade some extra characters for an easier structure by duplicating the symmetric characters when needed.
If you play with the above regex, you’ll notice a slight discrepancy with what our well-formed animal generator creates. The generator can create “double-headed” monsters where a symmetric body part is inserted, like a«Z. However, the regex does not allow it. Extending it to account for these scenarios would make it even more unreadable, so this is left as an exercise for the reader.
Let’s put the regex to use! There must be some English words that match the regex, right?
+
Google helpfully compiled a text file with the most frequent 10.000 English words by frequency. Let’s load it up and match every line with our brand-new regex. Unfortunately Teranoptia is case-sensitive and uses quite a few odd letters and special characters, so it’s unlikely we’re going to find many interesting creatures. Still worth an attempt.
In this post I’ve just put together a few exercises for fun, but these tools can be great for teaching purposes: the output is very easy to validate visually, and the grammar involved, while not trivial, is not as complex as natural language or as dry as numerical sequences. If you need something to keep your students engaged, this might be a simple trick to help them visualize the concepts better.
+
On my side, I think I’m going to use these neat little monsters as weird fleurons :)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/themes/hugo-coder/static/fonts/teranoptia/COPYRIGHT.md b/posts/2024-05-06-teranoptia/teranoptia/COPYRIGHT.md
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/COPYRIGHT.md
rename to posts/2024-05-06-teranoptia/teranoptia/COPYRIGHT.md
diff --git a/themes/hugo-coder/static/fonts/teranoptia/LICENSE.txt b/posts/2024-05-06-teranoptia/teranoptia/LICENSE.txt
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/LICENSE.txt
rename to posts/2024-05-06-teranoptia/teranoptia/LICENSE.txt
diff --git a/themes/hugo-coder/static/fonts/teranoptia/METADATA.yml b/posts/2024-05-06-teranoptia/teranoptia/METADATA.yml
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/METADATA.yml
rename to posts/2024-05-06-teranoptia/teranoptia/METADATA.yml
diff --git a/themes/hugo-coder/static/fonts/teranoptia/README.md b/posts/2024-05-06-teranoptia/teranoptia/README.md
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/README.md
rename to posts/2024-05-06-teranoptia/teranoptia/README.md
diff --git a/themes/hugo-coder/static/fonts/teranoptia/TRADEMARKS.md b/posts/2024-05-06-teranoptia/teranoptia/TRADEMARKS.md
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/TRADEMARKS.md
rename to posts/2024-05-06-teranoptia/teranoptia/TRADEMARKS.md
diff --git a/themes/hugo-coder/static/fonts/teranoptia/documentation/specimen/teranopia-specimen-01.png b/posts/2024-05-06-teranoptia/teranoptia/documentation/specimen/teranopia-specimen-01.png
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/documentation/specimen/teranopia-specimen-01.png
rename to posts/2024-05-06-teranoptia/teranoptia/documentation/specimen/teranopia-specimen-01.png
diff --git a/themes/hugo-coder/static/fonts/teranoptia/documentation/specimen/teranopia-specimen-02.png b/posts/2024-05-06-teranoptia/teranoptia/documentation/specimen/teranopia-specimen-02.png
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/documentation/specimen/teranopia-specimen-02.png
rename to posts/2024-05-06-teranoptia/teranoptia/documentation/specimen/teranopia-specimen-02.png
diff --git a/themes/hugo-coder/static/fonts/teranoptia/documentation/specimen/teranopia-specimen-03.png b/posts/2024-05-06-teranoptia/teranoptia/documentation/specimen/teranopia-specimen-03.png
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/documentation/specimen/teranopia-specimen-03.png
rename to posts/2024-05-06-teranoptia/teranoptia/documentation/specimen/teranopia-specimen-03.png
diff --git a/themes/hugo-coder/static/fonts/teranoptia/documentation/specimen/teranopia-specimen-04.png b/posts/2024-05-06-teranoptia/teranoptia/documentation/specimen/teranopia-specimen-04.png
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/documentation/specimen/teranopia-specimen-04.png
rename to posts/2024-05-06-teranoptia/teranoptia/documentation/specimen/teranopia-specimen-04.png
diff --git a/themes/hugo-coder/static/fonts/teranoptia/documentation/teranoptia-specimen-print.pdf b/posts/2024-05-06-teranoptia/teranoptia/documentation/teranoptia-specimen-print.pdf
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/documentation/teranoptia-specimen-print.pdf
rename to posts/2024-05-06-teranoptia/teranoptia/documentation/teranoptia-specimen-print.pdf
diff --git a/themes/hugo-coder/static/fonts/teranoptia/documentation/teranoptia-specimen-web.pdf b/posts/2024-05-06-teranoptia/teranoptia/documentation/teranoptia-specimen-web.pdf
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/documentation/teranoptia-specimen-web.pdf
rename to posts/2024-05-06-teranoptia/teranoptia/documentation/teranoptia-specimen-web.pdf
diff --git a/themes/hugo-coder/static/fonts/teranoptia/fonts/Teranoptia-Furiae.otf b/posts/2024-05-06-teranoptia/teranoptia/fonts/Teranoptia-Furiae.otf
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/fonts/Teranoptia-Furiae.otf
rename to posts/2024-05-06-teranoptia/teranoptia/fonts/Teranoptia-Furiae.otf
diff --git a/themes/hugo-coder/static/fonts/teranoptia/fonts/Teranoptia-Furiae.ttf b/posts/2024-05-06-teranoptia/teranoptia/fonts/Teranoptia-Furiae.ttf
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/fonts/Teranoptia-Furiae.ttf
rename to posts/2024-05-06-teranoptia/teranoptia/fonts/Teranoptia-Furiae.ttf
diff --git a/themes/hugo-coder/static/fonts/teranoptia/fonts/web/Teranoptia-Furiae.woff b/posts/2024-05-06-teranoptia/teranoptia/fonts/web/Teranoptia-Furiae.woff
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/fonts/web/Teranoptia-Furiae.woff
rename to posts/2024-05-06-teranoptia/teranoptia/fonts/web/Teranoptia-Furiae.woff
diff --git a/themes/hugo-coder/static/fonts/teranoptia/fonts/web/Teranoptia-Furiae.woff2 b/posts/2024-05-06-teranoptia/teranoptia/fonts/web/Teranoptia-Furiae.woff2
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/fonts/web/Teranoptia-Furiae.woff2
rename to posts/2024-05-06-teranoptia/teranoptia/fonts/web/Teranoptia-Furiae.woff2
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/features.fea b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/features.fea
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/features.fea
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/features.fea
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/fontinfo.plist b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/fontinfo.plist
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/fontinfo.plist
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/fontinfo.plist
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/A_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/A_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/A_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/A_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/B_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/B_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/B_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/B_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/C_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/C_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/C_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/C_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/D_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/D_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/D_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/D_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/E_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/E_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/E_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/E_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/E_uro.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/E_uro.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/E_uro.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/E_uro.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/F_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/F_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/F_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/F_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/G_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/G_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/G_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/G_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/H_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/H_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/H_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/H_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/I_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/I_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/I_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/I_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/J_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/J_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/J_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/J_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/K_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/K_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/K_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/K_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/L_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/L_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/L_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/L_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/M_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/M_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/M_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/M_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/N_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/N_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/N_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/N_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/O_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/O_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/O_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/O_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/P_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/P_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/P_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/P_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Q_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Q_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Q_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Q_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/R_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/R_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/R_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/R_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/S_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/S_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/S_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/S_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/T_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/T_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/T_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/T_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/U_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/U_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/U_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/U_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/V_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/V_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/V_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/V_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/W_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/W_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/W_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/W_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/X_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/X_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/X_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/X_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Y_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Y_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Y_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Y_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_acute.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_acute.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_acute.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_acute.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_caron.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_caron.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_caron.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_caron.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_dotaccent.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_dotaccent.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_dotaccent.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/Z_dotaccent.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/_notdef.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/_notdef.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/_notdef.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/_notdef.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/a.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/a.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/a.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/a.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/asterisk.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/asterisk.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/asterisk.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/asterisk.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/b.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/b.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/b.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/b.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/braceleft.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/braceleft.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/braceleft.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/braceleft.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/braceright.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/braceright.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/braceright.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/braceright.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/bracketleft.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/bracketleft.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/bracketleft.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/bracketleft.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/bracketright.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/bracketright.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/bracketright.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/bracketright.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/c.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/c.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/c.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/c.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/comma.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/comma.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/comma.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/comma.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/contents.plist b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/contents.plist
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/contents.plist
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/contents.plist
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/d.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/d.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/d.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/d.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/dollar.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/dollar.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/dollar.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/dollar.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/e.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/e.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/e.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/e.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/f.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/f.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/f.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/f.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/g.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/g.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/g.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/g.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guillemotleft.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guillemotleft.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guillemotleft.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guillemotleft.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guillemotright.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guillemotright.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guillemotright.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guillemotright.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guilsinglleft.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guilsinglleft.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guilsinglleft.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guilsinglleft.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guilsinglright.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guilsinglright.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guilsinglright.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/guilsinglright.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/h.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/h.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/h.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/h.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/hyphen.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/hyphen.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/hyphen.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/hyphen.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/i.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/i.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/i.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/i.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/j.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/j.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/j.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/j.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/k.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/k.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/k.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/k.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/l.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/l.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/l.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/l.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/m.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/m.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/m.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/m.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/n.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/n.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/n.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/n.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/o.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/o.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/o.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/o.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/p.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/p.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/p.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/p.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/parenleft.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/parenleft.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/parenleft.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/parenleft.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/parenright.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/parenright.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/parenright.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/parenright.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/q.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/q.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/q.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/q.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quotedblleft.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quotedblleft.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quotedblleft.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quotedblleft.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quotedblright.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quotedblright.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quotedblright.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quotedblright.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quoteleft.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quoteleft.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quoteleft.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quoteleft.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quoteright.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quoteright.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quoteright.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/quoteright.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/r.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/r.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/r.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/r.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/s.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/s.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/s.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/s.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/space.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/space.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/space.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/space.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/t.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/t.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/t.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/t.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/tainome.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/tainome.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/tainome.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/tainome.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/u.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/u.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/u.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/u.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/uni1E_05.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/uni1E_05.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/uni1E_05.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/uni1E_05.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/uni1E_8A_.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/uni1E_8A_.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/uni1E_8A_.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/uni1E_8A_.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/v.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/v.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/v.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/v.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/w.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/w.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/w.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/w.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/x.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/x.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/x.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/x.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/y.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/y.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/y.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/y.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/z.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/z.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/z.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/z.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zacute.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zacute.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zacute.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zacute.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zcaron.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zcaron.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zcaron.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zcaron.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zdotaccent.glif b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zdotaccent.glif
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zdotaccent.glif
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/glyphs/zdotaccent.glif
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/layercontents.plist b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/layercontents.plist
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/layercontents.plist
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/layercontents.plist
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/lib.plist b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/lib.plist
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/lib.plist
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/lib.plist
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/metainfo.plist b/posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/metainfo.plist
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/Teranoptia-Furiae.ufo/metainfo.plist
rename to posts/2024-05-06-teranoptia/teranoptia/sources/Teranoptia-Furiae.ufo/metainfo.plist
diff --git a/themes/hugo-coder/static/fonts/teranoptia/sources/teranoptia.glyphs b/posts/2024-05-06-teranoptia/teranoptia/sources/teranoptia.glyphs
similarity index 100%
rename from themes/hugo-coder/static/fonts/teranoptia/sources/teranoptia.glyphs
rename to posts/2024-05-06-teranoptia/teranoptia/sources/teranoptia.glyphs
diff --git a/static/posts/2024-06-10-the-agent-compass/agentic-rag-compass.png b/posts/2024-06-10-the-agent-compass/agentic-rag-compass.png
similarity index 100%
rename from static/posts/2024-06-10-the-agent-compass/agentic-rag-compass.png
rename to posts/2024-06-10-the-agent-compass/agentic-rag-compass.png
diff --git a/static/posts/2024-06-10-the-agent-compass/agentic-rag.png b/posts/2024-06-10-the-agent-compass/agentic-rag.png
similarity index 100%
rename from static/posts/2024-06-10-the-agent-compass/agentic-rag.png
rename to posts/2024-06-10-the-agent-compass/agentic-rag.png
diff --git a/static/posts/2024-06-10-the-agent-compass/ai-crews-compass.png b/posts/2024-06-10-the-agent-compass/ai-crews-compass.png
similarity index 100%
rename from static/posts/2024-06-10-the-agent-compass/ai-crews-compass.png
rename to posts/2024-06-10-the-agent-compass/ai-crews-compass.png
diff --git a/static/posts/2024-06-10-the-agent-compass/basic-conversational-agent.png b/posts/2024-06-10-the-agent-compass/basic-conversational-agent.png
similarity index 100%
rename from static/posts/2024-06-10-the-agent-compass/basic-conversational-agent.png
rename to posts/2024-06-10-the-agent-compass/basic-conversational-agent.png
diff --git a/static/posts/2024-06-10-the-agent-compass/basic-rag-compass.png b/posts/2024-06-10-the-agent-compass/basic-rag-compass.png
similarity index 100%
rename from static/posts/2024-06-10-the-agent-compass/basic-rag-compass.png
rename to posts/2024-06-10-the-agent-compass/basic-rag-compass.png
diff --git a/static/posts/2024-06-10-the-agent-compass/basic-rag.png b/posts/2024-06-10-the-agent-compass/basic-rag.png
similarity index 100%
rename from static/posts/2024-06-10-the-agent-compass/basic-rag.png
rename to posts/2024-06-10-the-agent-compass/basic-rag.png
diff --git a/static/posts/2024-06-10-the-agent-compass/chain-of-thought-compass.png b/posts/2024-06-10-the-agent-compass/chain-of-thought-compass.png
similarity index 100%
rename from static/posts/2024-06-10-the-agent-compass/chain-of-thought-compass.png
rename to posts/2024-06-10-the-agent-compass/chain-of-thought-compass.png
diff --git a/static/posts/2024-06-10-the-agent-compass/chain-of-thought.png b/posts/2024-06-10-the-agent-compass/chain-of-thought.png
similarity index 100%
rename from static/posts/2024-06-10-the-agent-compass/chain-of-thought.png
rename to posts/2024-06-10-the-agent-compass/chain-of-thought.png
diff --git a/static/posts/2024-06-10-the-agent-compass/compass-full.png b/posts/2024-06-10-the-agent-compass/compass-full.png
similarity index 100%
rename from static/posts/2024-06-10-the-agent-compass/compass-full.png
rename to posts/2024-06-10-the-agent-compass/compass-full.png
diff --git a/static/posts/2024-06-10-the-agent-compass/compass.png b/posts/2024-06-10-the-agent-compass/compass.png
similarity index 100%
rename from static/posts/2024-06-10-the-agent-compass/compass.png
rename to posts/2024-06-10-the-agent-compass/compass.png
diff --git a/static/posts/2024-06-10-the-agent-compass/conversational-agent-compass.png b/posts/2024-06-10-the-agent-compass/conversational-agent-compass.png
similarity index 100%
rename from static/posts/2024-06-10-the-agent-compass/conversational-agent-compass.png
rename to posts/2024-06-10-the-agent-compass/conversational-agent-compass.png
diff --git a/static/posts/2024-06-10-the-agent-compass/cover.png b/posts/2024-06-10-the-agent-compass/cover.png
similarity index 100%
rename from static/posts/2024-06-10-the-agent-compass/cover.png
rename to posts/2024-06-10-the-agent-compass/cover.png
diff --git a/static/posts/2024-06-10-the-agent-compass/diagrams.excalidraw b/posts/2024-06-10-the-agent-compass/diagrams.excalidraw
similarity index 100%
rename from static/posts/2024-06-10-the-agent-compass/diagrams.excalidraw
rename to posts/2024-06-10-the-agent-compass/diagrams.excalidraw
diff --git a/static/posts/2024-06-10-the-agent-compass/direct-llm-call-compass.png b/posts/2024-06-10-the-agent-compass/direct-llm-call-compass.png
similarity index 100%
rename from static/posts/2024-06-10-the-agent-compass/direct-llm-call-compass.png
rename to posts/2024-06-10-the-agent-compass/direct-llm-call-compass.png
diff --git a/static/posts/2024-06-10-the-agent-compass/direct-llm-call.png b/posts/2024-06-10-the-agent-compass/direct-llm-call.png
similarity index 100%
rename from static/posts/2024-06-10-the-agent-compass/direct-llm-call.png
rename to posts/2024-06-10-the-agent-compass/direct-llm-call.png
diff --git a/static/posts/2024-06-10-the-agent-compass/empty-compass.png b/posts/2024-06-10-the-agent-compass/empty-compass.png
similarity index 100%
rename from static/posts/2024-06-10-the-agent-compass/empty-compass.png
rename to posts/2024-06-10-the-agent-compass/empty-compass.png
diff --git a/posts/2024-06-10-the-agent-compass/index.html b/posts/2024-06-10-the-agent-compass/index.html
new file mode 100644
index 00000000..f0ffb991
--- /dev/null
+++ b/posts/2024-06-10-the-agent-compass/index.html
@@ -0,0 +1,443 @@
+
+
+
+
+
+ The Agent Compass · Sara Zan
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
The concept of Agent is one of the vaguest out there in the post-ChatGPT landscape. The word has been used to identify systems that seem to have nothing in common with one another, from complex autonomous research systems down to a simple sequence of two predefined LLM calls. Even the distinction between Agents and techniques such as RAG and prompt engineering seems blurry at best.
+
Let’s try to shed some light on the topic by understanding just how much the term “AI Agent” covers and set some landmarks to better navigate the space.
The problem starts with the definition of “agent”. For example, Wikipedia reports that a software agent is
+
+
a computer program that acts for a user or another program in a relationship of agency.
+
+
This definition is extremely high-level, to the point that it could be applied to systems ranging from ChatGPT to a thermostat. However, if we restrain our definition to “LLM-powered agents”, then it starts to mean something: an Agent is an LLM-powered application that is given some agency, which means that it can take actions to accomplish the goals set by its user. Here we see the difference between an agent and a simple chatbot, because a chatbot can only talk to a user. but don’t have the agency to take any action on their behalf. Instead, an Agent is a system you can effectively delegate tasks to.
+
In short, an LLM powered application can be called an Agent when
+
+
it can take decisions and choose to perform actions in order to achieve the goals set by the user.
On top of this definition there’s an additional distinction to take into account, normally brought up by the terms autonomous and conversational agents.
+
Autonomous Agents are applications that don’t use conversation as a tool to accomplish their goal. They can use several tools several times, but they won’t produce an answer for the user until their goal is accomplished in full. These agents normally interact with a single user, the one that set their goal, and the whole result of their operations might be a simple notification that the task is done. The fact that they can understand language is rather a feature that lets them receive the user’s task in natural language, understand it, and then to navigate the material they need to use (emails, webpages, etc).
+
An example of an autonomous agent is a virtual personal assistant: an app that can read through your emails and, for example, pays the bills for you when they’re due. This is a system that the user sets up with a few credentials and then works autonomously, without the user’s supervision, on the user’s own behalf, possibly without bothering them at all.
+
On the contrary, Conversational Agents use conversation as a tool, often their primary one. This doesn’t have to be a conversation with the person that set them off: it’s usually a conversation with another party, that may or may not be aware that they’re talking to an autonomous system. Naturally, they behave like agents only from the perspective of the user that assigned them the task, while in many cases they have very limited or no agency from the perspective of the users that holds the conversation with them.
+
An example of a conversational agent is a virtual salesman: an app that takes a list of potential clients and calls them one by one, trying to persuade them to buy. From the perspective of the clients receiving the call this bot is not an agent: it can perform no actions on their behalf, in fact it may not be able to perform actions at all other than talking to them. But from the perspective of the salesman the bots are agents, because they’re calling people for them, saving a lot of their time.
+
The distinction between these two categories is very blurry, and some systems may behave like both depending on the circumnstances. For example, an autonomous agent might become a conversational one if it’s configured to reschedule appointments for you by calling people, or to reply to your emails to automatically challenge parking fines, and so on. Alternatively, an LLM that asks you if it’s appropriate to use a tool before using it is behaving a bit like a conversational agent, because it’s using the chat to improve its odds of providing you a better result.
All the distinctions we made above are best understood as a continuous spectrum rather than hard categories. Various AI systems may have more or less agency and may be tuned towards a more “autonomous” or “conversational” behavior.
+
In order to understand this difference in practice, let’s try to categorize some well-known LLM techniques and apps to see how “agentic” they are. Having two axis to measure by, we can build a simple compass like this:
Many apps out there perform nothing more than direct calls to LLMs, such as ChatGPT’s free app and other similarly simple assistants and chatbots. There are no more components to this system other than the model itself and their mode of operation is very straightforward: a user asks a question to an LLM, and the LLM replies directly.
+
+
This systems are not designed with the intent of accomplishing a goal, and neither can take any actions on the user’s behalf. They focus on talking with a user in a reactive way and can do nothing else than talk back. An LLM on its own has no agency at all.
+
At this level it also makes very little sense to distinguish between autonomous or conversational agent behavior, because the entire app shows no degrees of autonomy. So we can place them at the very center-left of the diagram.
Together with direct LLM calls and simple chatbots, basic RAG is also an example of an application that does not need any agency or goals to pursue in order to function. Simple RAG apps works in two stages: first the user question is sent to a retriever system, which fetches some additional data relevant to the question. Then, the question and the additional data is sent to the LLM to formulate an answer.
+
+
This means that simple RAG is not an agent: the LLM has no role in the retrieval step and simply reacts to the RAG prompt, doing little more than what a direct LLM call does. The LLM is given no agency, takes no decisions in order to accomplish its goals, and has no tools it can decide to use, or actions it can decide to take. It’s a fully pipelined, reactive system. However, we may rank basic RAG more on the autonomous side with respect to a direct LLM call, because there is one step that is done automonously (the retrieval).
Agentic RAG is a slightly more advanced version of RAG that does not always perform the retrieval step. This helps the app produce better prompts for the LLM: for example, if the user is asking a question about trivia, retrieval is very important, while if they’re quizzing the LLM with some mathematical problem, retrieval might confuse the LLM by giving it examples of solutions to different puzzles, and therefore make hallucinations more likely.
+
This means that an agentic RAG app works as follows: when the user asks a question, before calling the retriever the app checks whether the retrieval step is necessary at all. Most of the time the preliminary check is done by an LLM as well, but in theory the same check coould be done by a properly trained classifier model. Once the check is done, if retrieval was necessary it is run, otherwise the app skips directly to the LLM, which then replies to the user.
+
+
You can see immediately that there’s a fundamental difference between this type of RAG and the basic pipelined form: the app needs to take a decision in order to accomplish the goal of answering the user. The goal is very limited (giving a correct answer to the user), and the decision very simple (use or not use a single tool), but this little bit of agency given to the LLM makes us place an application like this definitely more towards the Agent side of the diagram.
+
+
We keep Agentic RAG towards the Autonomous side because in the vast majority of cases the decision to invoke the retriever is kept hidden from the user.
Some LLM applications, such as ChatGPT with GPT4+ or Bing Chat, can make the LLM use some predefined tools: a web search, an image generator, and maybe a few more. The way they work is quite straightforward: when a user asks a question, the LLM first needs to decide whether it should use a tool to answer the question. If it decides that a tool is needed, it calls it, otherwise it skips directly to generating a reply, which is then sent back to the user.
+
+
You can see how this diagram resemble agentic RAG’s: before giving an answer to the user, the app needs to take a decision.
+
With respect to Agentic RAG this decision is a lot more complex: it’s not a simple yes/no decision, but it involves choosing which tool to use and also generate the input parameters for the selected tool that will provide the desired output. In many cases the tool’s output will be given to the LLM to be re-elaborated (such as the output of a web search), while in some other it can go directly to the user (like in the case of image generators). This all implies that more agency is given to the system and, therefore, it can be placed more clearly towards the Agent end of the scale.
+
+
We place LLMs with function calling in the middle between Conversational and Autonomous because the degree to which the user is aware of this decision can vary greatly between apps. For example, Bing Chat and ChatGPT normally notify the user that they’re going to use a tool when they do, and the user can instruct them to use them or not, so they’re slightly more conversational.
Self-correcting RAG is a technique that improves on simple RAG by making the LLM double-check its replies before returning them to the user. It comes from an LLM evaluation technique called “LLM-as-a-judge”, because an LLM is used to judge the output of a different LLM or RAG pipeline.
+
Self-correcting RAG starts as simple RAG: when the user asks a question, the retriever is called and the results are sent to the LLM to extract an answer from. However, before returning the answer to the user, another LLM is asked to judge whether in their opinion, the answer is correct. If the second LLM agrees, the answer is sent to the user. If not, the second LLM generates a new question for the retriever and runs it again, or in other cases, it simply integrates its opinion in the prompt and runs the first LLM again.
+
+
Self-correcting RAG can be seen as one more step towards agentic behavior because it unlocks a new possibility for the application: the ability to try again. A self-correcting RAG app has a chance to detect its own mistakes and has the agency to decide that it’s better to try again, maybe with a slightly reworded question or different retrieval parameters, before answering the user. Given that this process is entirely autonomous, we’ll place this technique quite towards the Autonomous end of the scale.
Chain-of-thought is a family of prompting techniques that makes the LLM “reason out loud”. It’s very useful when the model needs to process a very complicated question, such as a mathematical problem or a layered question like “When was the eldest sistem of the current King of Sweden born?” Assuming that the LLM knows these facts, in order to not hallucinate it’s best to ask the model to proceed “step-by-step” and find out, in order:
+
+
Who the current King of Sweden is,
+
Whether he has an elder sister,
+
If yes, who she is,
+
The age of the person identified above.
+
+
The LLM might know the final fact in any case, but the probability of it giving the right answer increases noticeably if the LLM is prompted this way.
+
Chain-of-thought prompts can also be seen as the LLM accomplishing the task of finding the correct answer in steps, which implies that there are two lines of thinking going on: on one side the LLM is answering the questions it’s posing to itself, while on the other it’s constantly re-assessing whether it has a final answer for the user.
+
In the example above, the chain of thought might end at step 2 if the LLM realizes that the current King of Sweden has no elder sisters (he doesn’t): the LLM needs to keep an eye of its own thought process and decide whether it needs to continue or not.
+
We can summarize an app using chain-of-thought prompting like this: when a user asks a question, first of all the LLM reacts to the chain-of-thought prompt to lay out the sub-questions it needs to answer. Then it answers its own questions one by one, asking itself each time whether the final answer has already been found. When the LLM believes it has the final answer, it rewrites it for the user and returns it.
+
+
This new prompting technique makes a big step towards full agency: the ability for the LLM to assess whether the goal has been achieved before returning any answer to the user. While apps like Bing Chat iterate with the user and need their feedback to reach high-level goals, chain-of-thought gives the LLM the freedom to check its own answers before having the user judge them, which makes the loop much faster and can increase the output quality dramatically.
+
This process is similar to what self-correcting RAG does, but has a wider scope, because the LLM does not only need to decide whether an answer is correct, it can also decide to continue reasoning in order to make it more complete, more detailed, to phrase it better, and so on.
+
Another interesting trait of chain-of-thought apps is that they introduce the concept of inner monologue. The inner monologue is a conversation that the LLM has with itself, a conversation buffer where it keeps adding messages as the reasoning develops. This monologue is not visible to the user, but helps the LLM deconstruct a complex reasoning line into a more manageable format, like a researcher that takes notes instead of keeping all their earlier reasoning inside their head all the times.
+
Due to the wider scope of the decision-making that chain-of-thought apps are able to do, they also place in the middle of our compass They can be seen as slightly more autonomous than conversational due to the fact that they hide their internal monologue to the user.
+
+
From here, the next step is straightforward: using tools.
Multi-hop RAG applications are nothing else than simple RAG apps that use chain-of-thought prompting and are free to invoke the retriever as many times as needed and only when needed.
+
This is how it works. When the user makes a question, a chain of thought prompt is generated and sent to the LLM. The LLM assesses whether it knows the answer to the question and if not, asks itself whether a retrieval is necessary. If it decides that retrieval is necessary it calls it, otherwise it skips it and generates an answer directly. It then checks again whether the question is answered. Exiting the loop, the LLM produces a complete answer by re-reading its own inner monologue and returns this reply to the user.
+
+
An app like this is getting quite close to a proper autonomous agent, because it can perform its own research autonomously. The LLM calls are made in such a way that the system is able to assess whether it knows enough to answer or whether it should do more research by formulating more questions for the retriever and then reasoning over the new collected data.
+
Multi-hop RAG is a very powerful technique that shows a lot of agency and autonomy, and therefore can be placed in the lower-right quadrant of out compass. However, it is still limited with respect to a “true” autonomous agent, because the only action it can take is to invoke the retriever.
Let’s now move onto apps that can be defined proper “agents”. One of the first flavor of agentic LLM apps, and still the most popular nowadays, is called “ReAct” Agents, which stands for “Reason + Act”. ReAct is a prompting technique that belongs to the chain-of-thought extended family: it makes the LLM reason step by step, decide whether to perform any action, and then observe the result of the actions it took before moving further.
+
A ReAct agent works more or less like this: when user sets a goal, the app builds a ReAct prompt, which first of all asks the LLM whether the answer is already known. If the LLM says no, the prompt makes it select a tool. The tool returns some values which are added to the inner monologue of the application toghether with the invitation to re-assess whether the goal has been accomplished. The app loops over until the answer is found, and then the answer is returned to the user.
+
+
As you can see, the structure is very similar to a multi-hop RAG, with an important difference: ReAct Agents normally have many tools to choose from rather than a single retriever. This gives them the agency to take much more complex decisions and can be finally called “agents”.
+
+
ReAct Agents are very autonomous in their tasks and rely on an inner monologue rather than a conversation with a user to achieve their goals. Therefore we place them very much on the Autonomous end of the spectrum.
Conversational Agents are a category of apps that can vary widely. As stated earlier, conversational agents focus on using the conversation itself as a tool to accomplish goals, so in order to understand them, one has to distinguish between the people that set the goal (let’s call them owners) and those who talk with the bot (the users).
+
Once this distinction is made, this is how the most basic conversational agents normally work. First, the owner sets a goal. The application then starts a conversation with a user and, right after the first message, starts asking itself if the given goal was accomplished. It then keeps talking to the target user until it believes the goal was attained and, once done, it returns back to its owner to report the outcome.
+
+
Basic conversational agents are very agentic in the sense that they can take a task off the hands of their owners and keep working on them until the goal is achieved. However, they have varying degrees of agency depending on how many tools they can use and how sophisticated is their ability to talk to their target users.
+
For example, can the communication occurr over one single channel, be it email, chat, voice, or something else? Can the agent choose among different channels to reach the user? Can it perform side tasks to behalf of either party to work towards its task? There is a large variety of these agents available and no clear naming distinction between them, so depending on their abilities, their position on our compass might be very different. This is why we place them in the top center, spreading far out in both directions.
By far the most advanced agent implementation available right now is called AI Crew, such as the ones provided by CrewAI. These apps take the concept of autonomous agent to the next level by making several different agents work together.
+
The way these apps works is very flexible. For example, let’s imagine we are making an AI application that can build a fully working mobile game from a simple description. This is an extremely complex task that, in real life, requires several developers. To achieve the same with an AI Crew, the crew needs to contain several agents, each one with their own special skills, tools, and background knowledge. There could be:
+
+
a Designer Agent, that has all the tools to generate artwork and assets;
+
a Writer Agent that writes the story, the copy, the dialogues, and most of the text;
+
a Frontend Developer Agent that designs and implements the user interface;
+
a Game Developer Agent that writes the code for the game itself;
+
a Manager Agent, that coordinates the work of all the other agents, keeps them on track and eventually reports the results of their work to the user.
+
+
These agents interact with each other just like a team of humans would: by exchanging messages in a chat format, asking each other to perform actions for them, until their manager decides that the overall goal they were set to has been accomplished, and reports to the user.
+
AI Crews are very advanced and dynamic systems that are still actively researched and explored. One thing that’s clear though is that they show the highest level of agency of any other LLM-based app, so we can place them right at the very bottom-right end of the scale.
What we’ve seen here are just a few examples of LLM-powered applications and how close or far they are to the concept of a “real” AI agent. AI agents are still a very active area of research, and their effectiveness is getting more and more reasonable as LLMs become cheaper and more powerful.
+
As matter of fact, with today’s LLMs true AI agents are possible, but in many cases they’re too brittle and expensive for real production use cases. Agentic systems today suffer from two main issues: they perform huge and frequent LLM calls and they tolerate a very low error rate in their decision making.
+
Inner monologues can grow to an unbounded size during the agent’s operation, making the context window size a potential limitation. A single bad decision can send a chain-of-thought reasoning train in a completely wrong direction and many LLM calls will be performed before the system realizes its mistake, if it does at all. However, as LLMs become faster, cheaper and smarter, the day when AI Agent will become reliable and cheap enough is nearer than many think.
This is part one of the write-up of my talk at ODSC Europe 2024.
+
+
In the last few years, the world of voice agents saw dramatic leaps forward in the state of the art of all its most basic components. Thanks mostly to OpenAI, bots are now able to understand human speech almost like a human would, they’re able to speak back with completely naturally sounding voices, and are able to hold a free conversation that feels extremely natural.
+
But building voice bots is far from a solved problem. These improved capabilities are raising the bar, and even users accustomed to the simpler capabilities of old bots now expect a whole new level of quality when it comes to interacting with them.
+
In this post we’re going to focus mostly on the challenges: we’ll discuss the basic structure of most voice bots today, their shortcomings and the main issues that you may face on your journey to improve the quality of the conversation.
+
In Part 2 we are going to focus on the solutions that are available today, and we are going to build our own voice bot using Pipecat, a recently released open-source library that makes building these bots a lot simpler.
As the name says, voice agents are programs that are able to carry on a task and/or take actions and decisions on behalf of a user (“software agents”) by using voice as their primary mean of communication (as opposed to the much more common text chat format). Voice agents are inherently harder to build than their text based counterparts: computers operate primarily with text, and the art of making machines understand human voices has been an elusive problem for decades.
+
Today, the basic architecture of a modern voice agent can be decomposed into three main fundamental building blocks:
+
+
a speech-to-text (STT) component, tasked to translate an audio stream into readable text,
+
the agent’s logic engine, which works entirely with text only,
+
a text-to-speech (TTS) component, which converts the bot’s text responses back into an audio stream of synthetic speech.
Speech-to-text software is able to convert the audio stream of a person saying something and produce a transcription of what the person said. Speech-to-text engines have a long history, but their limitations have always been quite severe: they used to require fine-tuning on each individual speaker, have a rather high word error rate (WER) and they mainly worked strictly with native speakers of major languages, failing hard on foreign and uncommon accents and native speakers of less mainstream languages. These issues limited the adoption of this technology for anything else than niche software and research applications.
+
With the first release of OpenAI’s Whisper models in late 2022, the state of the art improved dramatically. Whisper enabled transcription (and even direct translation) of speech from many languages with an impressively low WER, finally comparable to the performance of a human, all with relatively low resources, higher then realtime speed, and no finetuning required. Not only, but the model was free to use, as OpenAI open-sourced it together with a Python SDK, and the details of its architecture were published, allowing the scientific community to improve on it.
+
+
The WER (word error rate) of Whisper was extremely impressive at the time of its publication (see the full diagram here).
+
Since then, speech-to-text models kept improving at a steady pace. Nowadays the Whisper’s family of models sees some competition for the title of best STT model from companies such as Deepgram, but it’s still one of the best options in terms of open-source models.
Text-to-speech model perform the exact opposite task than speech-to-text models: their goal is to convert written text into an audio stream of synthetic speech. Text-to-speech has historically been an easier feat than speech-to-text, but it also recently saw drastic improvements in the quality of the synthetic voices, to the point that it could nearly be considered a solved problem in its most basic form.
+
Today many companies (such as OpenAI, Cartesia, ElevenLabs, Azure and many others) offer TTS software with voices that sound nearly indistinguishable to a human. They also have the capability to clone a specific human voice with remarkably little training data (just a few seconds of speech) and to tune accents, inflections, tone and even emotion.
+
+
+
+
+
+
+
+
Cartesia’s Sonic TTS example of a gaming NPC. Note how the model subtly reproduces the breathing in between sentences.
+
TTS is still improving in quality by the day, but due to the incredibly high quality of the output competition now tends to focus on price and performance.
Advancements in the agent’s ability to talk to users goes hand in hand with the progress of natural language understanding (NLU), another field with a long and complicated history. Until recently, the bot’s ability to understand the user’s request has been severely limited and often available only for major languages.
+
Based on the way their logic is implemented, today you may come across bots that rely on three different categories.
Tree-based (or rule-based) logic is one of the earliest method of implementing chatbot’s logic, still very popular today for its simplicity. Tree-based bots don’t really try to understand what the user is saying, but listen to the user looking for a keyword or key sentence that will trigger the next step. For example, a customer support chatbot may look for the keyword “refund” to give the user any information about how to perform a refund, or the name of a discount campaign to explain the user how to take advantage of that.
+
Tree-based logic, while somewhat functional, doesn’t really resemble a conversation and can become very frustrating to the user when the conversation tree was not designed with care, because it’s difficult for the end user to understand which option or keyword they should use to achieve the desired outcome. It is also unsuitable to handle real questions and requests like a human would.
+
One of its most effective usecases is as a first-line screening to triage incoming messages.
+
+
Example of a very simple decision tree for a chatbot. While rather minimal, this bot already has several flaws: there’s no way to correct the information you entered at a previous step, and it has no ability to recognize synonyms (“I want to buy an item” would trigger the fallback route.)
In intent-based bots, intents are defined roughly as “actions the users may want to do”. With respect to a strict, keyword-based tree structure, intent-based bots may switch from an intent to another much more easily (because they lack a strict tree-based routing) and may use advanced AI techniques to understand what the user is actually trying to accomplish and perform the required action.
+
Advanced voice assistants such as Siri and Alexa use variations of this intent-based system. However, as their owners are usually familiar with, interacting with an intent-based bot doesn’t always feel natural, especially when the available intents don’t match the user’s expectation and the bot ends up triggering an unexpected action. In the long run, this ends with users carefully second-guessing what words and sentence structures activate the response they need and eventually leads to a sort of “magical incantation” style of prompting the agent, where the user has to learn what is the “magical sentence” that the bot will recognize to perform a specific intent without misunderstandings.
+
+
Modern voice assistants like Alexa and Siri are often built on the concept of intent (image from Amazon).
The introduction of instruction-tuned GPT models like ChatGPT revolutionized the field of natural language understanding and, with it, the way bots can be built today. LLMs are naturally good at conversation and can formulate natural replies to any sort of question, making the conversation feel much more natural than with any technique that was ever available earlier.
+
However, LLMs tend to be harder to control. Their very ability of generating naturally sounding responses for anything makes them behave in ways that are often unexpected to the developer of the chatbot: for example, users can get the LLM-based bot to promise them anything they ask for, or they can be convinced to say something incorrect or even occasionally lie.
+
The problem of controlling the conversation, one that traditionally was always on the user’s side, is now entirely on the shoulders of the developers and can easily backfire.
+
+
In a rather famous instance, a user managed to convince a Chevrolet dealership chatbot to promise selling him a Chevy Tahoe for a single dollar.
Thanks to all these recent improvements, it would seem that making natural-sounding, smart bots is getting easier and easier. It is indeed much simpler to make a simple bot sound better, understand more and respond appropriately, but there’s still a long way to go before users can interact with these new bots as they would with a human.
+
The issue lays in the fact that users expectations grow with the quality of the bot. It’s not enough for the bot to have a voice that sounds human: users want to be able to interact with it in a way that it feels human too, which is far more rich and interactive than what the rigid tech of earlier chatbots allowed so far.
+
What does this mean in practice? What are the expectations that users might have from our bots?
Traditional bots can only handle turn-based conversations: the user talks, then the bot talks as well, then the user talks some more, and so on. A conversation with another human, however, has no such limitation: people may talk over each other, give audible feedback without interrupting, and more.
+
Here are some examples of this richer interaction style:
+
+
+
Interruptions. Interruptions occur when a person is talking and another one starts talking at the same time. It is expected that the first person stops talking, at least for a few seconds, to understand what the interruption was about, while the second person continue to talk.
+
+
+
Back-channeling. Back-channeling is the practice of saying “ok”, “sure”, “right” while the other person is explaining something, to give them feedback and letting them know we’re paying attention to what is being said. The person that is talking is not supposed to stop: the aim of this sort of feedback is to let them know they are being heard.
+
+
+
Pinging. This is the natural reaction a long silence, especially over a voice-only medium such as a phone call. When one of the two parties is supposed to speak but instead stays silent, the last one that talked might “ping” the silent party by asking “Are you there?”, “Did you hear?”, or even just “Hello?” to test whether they’re being heard. This behavior is especially difficult to handle for voice agents that have a significant delay, because it may trigger an ugly vicious cycle of repetitions and delayed replies.
+
+
+
Buying time. When one of the parties know that it will stay silent for a while, a natural reaction is to notify the other party in advance by saying something like “Hold on…”, “Wait a second…”, “Let me check…” and so on. This message has the benefit of preventing the “pinging” behavior we’ve seen before and can be very useful for voice bots that may need to carry on background work during the conversation, such as looking up information.
+
+
+
Audible clues. Not everything can be transcribed by a speech-to-text model, but audio carries a lot of nuance that is often used by humans to communicate. A simple example is pitch: humans can often tell if they’re talking to a child, a woman or a man by the pitch of their voice, but STT engines don’t transcribe that information. So if a child picks up the phone when your bot asks for their mother or father, the model won’t pick up the obvious audible clue and assume it is talking to the right person. Similar considerations should be made for tone (to detect mood, sarcasm, etc) or other sounds like laughter, sobs, and more
Tree-based bots, and to some degree intent-based too, work on the implicit assumption that conversation flows are largely predictable. Once the user said something and the bot replied accordingly, they can only follow up with a fixed set of replies and nothing else.
+
This is often a flawed assumption and the primary reason why talking to chatbots tends to be so frustrating.
+
In reality, natural conversations are largely unpredictable. For example, they may feature:
+
+
+
Sudden changes of topic. Maybe user and bot were talking about making a refund, but then the user changes their mind and decides to ask for assistance finding a repair center for the product. Well designed intent-based bots can deal with that, but most bots are in practice unable to do so in a way that feels natural to the user.
+
+
+
Unexpected, erratic phrasing. This is common when users are nervous or in a bad mood for any reason. Erratic, convoluted phrasing, long sentences, rambling, are all very natural ways of expressing themselves, but such outbursts very often confuse bots completely.
+
+
+
Non native speakers. Due to the nature la language learning, non native speakers may have trouble pronouncing words correctly, they may use highly unusual synonyms, or structure sentences in complicated ways. This is also difficult for bots to handle, because understanding the sentence is harder and transcription issues are far more likely.
+
+
+
Non sequitur. Non sequitur is an umbrella term for a sequence of sentences that bear no relation to each other in a conversation. A simple example is the user asking the bot “What’s the capital of France” and the bot replies “It’s raining now”. When done by the bot, this is often due to a severe transcription issue or a very flawed conversation design. When done by the user, it’s often a malicious intent to break the bot’s logic, so it should be handled with some care.
It may seem that some of these issues, especially the ones related to conversation flow, could be easily solved with an LLM. These models, however, bring their own set of issues too:
+
+
+
Hallucinations. This is a technical term to say that LLMs can occasionally mis-remember information, or straight up lie. The problem is that they’re also very confident about their statements, sometimes to the point of trying to gaslight their users. Hallucinations are a major problem for all LLMs: although it may seem to get more manageable with larger and smarter models, the problem only gets more subtle and harder to spot.
+
+
+
Misunderstandings. While LLMs are great at understanding what the user is trying to say, they’re not immune to misunderstandings. Unlike a human though, LLMs rarely suspect a misunderstanding and they rather make assumptions that ask for clarifications, resulting in surprising replies and behavior that are reminiscent of intent-based bots.
+
+
+
Lack of assertiveness. LLMs are trained to listen to the user and do their best to be helpful. This means that LLMs are also not very good at taking the lead of the conversation when we would need them to, and are easily misled and distracted by a motivated user. Preventing your model to give your user’s a literary analysis of their unpublished poetry may sound silly, but it’s a lot harder than many suspect.
+
+
+
Prompt hacking. Often done with malicious intent by experienced users, prompt hacking is the practice of convincing an LLM to reveal its initial instructions, ignore them and perform actions they are explicitly forbidden from. This is especially dangerous and, while a lot of work has gone into this field, this is far from a solved problem.
LLMs need to keep track of the whole conversation, or at least most of it, to be effective. However, they often have a limitation to the amount of text they can keep in mind at any given time: this limit is called context window and for many models is still relatively low, at about 2000 tokens (between 1500-1800 words).
+
The problem is that this window also need to include all the instructions your bot needs for the conversation. This initial set of instructions is called system prompt, and is slightly distinct from the other messages in the conversation to make the LLM understand that it’s not part of it, but it’s a set of instructions about how to handle the conversation.
+
For example, a system prompt for a customer support bot may look like this:
+
You're a friendly customer support bot named VirtualAssistant.
+You are always kind to the customer and you must do your best
+to make them feel at ease and helped.
+
+You may receive a set of different requests. If the users asks
+you to do anything that is not in the list below, kindly refuse
+to do so.
+
+# Handle refunds
+
+If the user asks you to handle a refund, perform these actions:
+- Ask for their shipping code
+- Ask for their last name
+- Use the tool `get_shipping_info` to verify the shipping exists
+...
+
and so on.
+
Although very effective, system prompts have a tendency to become huge in terms of tokens. Adding information to it makes the LLM behave much more like you expect (although it’s not infallible), hallucinate less, and can even shape its personality to some degree. But if the system prompt becomes too long (more than 1000 words), this means that the bot will only be able to exchange about 800 words worth of messages with the user before it starts to forget either its instructions or the first messages of the conversation. For example, the bot will easily forget its own name and role, or it will forget the user’s name and initial demands, which can make the conversation drift completely.
If all these issues weren’t enough, there’s also a fundamental issue related to voice interaction: latency. Voice bots interact with their users in real time: this means that the whole pipeline of transcription, understanding, formulating a reply and synthesizing it back but be very fast.
+
How fast? On average, people expect a reply from another person to arrive within 300-500ms to sound natural. They can normally wait for about 1-2 seconds. Any longer and they’ll likely ping the bot, breaking the flow.
+
This means that, even if we had some solutions to all of the above problems (and we do have some), these solutions needs to operate at blazing fast speed. Considering that LLM inference alone can take the better part of a second to even start being generated, latency is often one of the major issues that voice bots face when deployed at scale.
+
+
Time to First Token (TTFT) stats for several LLM inference providers running Llama 2 70B chat. From LLMPerf leaderboard. You can see how the time it takes for a reply to even start being produced is highly variable, going up to more than one second in some scenarios.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/posts/index.xml b/posts/index.xml
new file mode 100644
index 00000000..9f7096bc
--- /dev/null
+++ b/posts/index.xml
@@ -0,0 +1,131 @@
+
+
+
+ Posts on Sara Zan
+ https://www.zansara.dev/posts/
+ Recent content in Posts on Sara Zan
+ Hugo
+ en
+ Wed, 18 Sep 2024 00:00:00 +0000
+
+
+ Building Reliable Voice Bots with Open Source Tools - Part 1
+ https://www.zansara.dev/posts/2024-09-05-odsc-europe-voice-agents-part-1/
+ Wed, 18 Sep 2024 00:00:00 +0000
+ https://www.zansara.dev/posts/2024-09-05-odsc-europe-voice-agents-part-1/
+ <p><em>This is part one of the write-up of my talk at <a href="https://www.zansara.dev/talks/2024-09-05-odsc-europe-voice-agents/" >ODSC Europe 2024</a>.</em></p>
<hr>
<p>In the last few years, the world of voice agents saw dramatic leaps forward in the state of the art of all its most basic components. Thanks mostly to OpenAI, bots are now able to understand human speech almost like a human would, they’re able to speak back with completely naturally sounding voices, and are able to hold a free conversation that feels extremely natural.</p>
+
+
+ The Agent Compass
+ https://www.zansara.dev/posts/2024-06-10-the-agent-compass/
+ Mon, 10 Jun 2024 00:00:00 +0000
+ https://www.zansara.dev/posts/2024-06-10-the-agent-compass/
+ <p>The concept of Agent is one of the vaguest out there in the post-ChatGPT landscape. The word has been used to identify systems that seem to have nothing in common with one another, from complex autonomous research systems down to a simple sequence of two predefined LLM calls. Even the distinction between Agents and techniques such as RAG and prompt engineering seems blurry at best.</p>
<p>Let’s try to shed some light on the topic by understanding just how much the term “AI Agent” covers and set some landmarks to better navigate the space.</p>
+
+
+ Generating creatures with Teranoptia
+ https://www.zansara.dev/posts/2024-05-06-teranoptia/
+ Mon, 06 May 2024 00:00:00 +0000
+ https://www.zansara.dev/posts/2024-05-06-teranoptia/
+ <style>
@font-face {
font-family: teranoptia;
src: url("/posts/2024-05-06-teranoptia/teranoptia/fonts/Teranoptia-Furiae.ttf");
}
.teranoptia {
font-size: 5rem;
font-family: teranoptia;
hyphens: none!important;
line-height: 70px;
}
.small {
font-size:3rem;
line-height: 40px;
}
.glyphset {
display: flex;
flex-wrap: wrap;
}
.glyphset div {
margin: 3px;
}
.glyphset div p {
text-align: center;
}
</style>
<p>Having fun with fonts doesn’t always mean obsessing over kerning and ligatures. Sometimes, writing text is not even the point!</p>
<p>You don’t believe it? Type something in here.</p>
+
+
+ RAG, the bad parts (and the good!)
+ https://www.zansara.dev/posts/2024-04-29-odsc-east-rag/
+ Mon, 29 Apr 2024 00:00:00 +0000
+ https://www.zansara.dev/posts/2024-04-29-odsc-east-rag/
+ <p><em>This is a writeup of my talk at <a href="https://www.zansara.dev/talks/2024-04-25-odsc-east-rag/" >ODSC East 2024</a> and <a href="https://www.zansara.dev/talks/2024-07-10-europython-rag/" >EuroPython 2024</a>.</em></p>
<hr>
<p>If you’ve been at any AI or Python conference this year, there’s one acronym that you’ve probably heard in nearly every talk: it’s RAG. RAG is one of the most used techniques to enhance LLMs in production, but why is it so? And what are its weak points?</p>
<p>In this post, we will first describe what RAG is and how it works at a high level. We will then see what type of failures we may encounter, how they happen, and a few reasons that may trigger these issues. Next, we will look at a few tools to help us evaluate a RAG application in production. Last, we’re going to list a few techniques to enhance your RAG app and make it more capable in a variety of scenarios.</p>
+
+
+ Explain me LLMs like I'm five: build a story to help anyone get the idea
+ https://www.zansara.dev/posts/2024-04-14-eli5-llms/
+ Sun, 14 Apr 2024 00:00:00 +0000
+ https://www.zansara.dev/posts/2024-04-14-eli5-llms/
+ <p>These days everyone’s boss seems to want some form of GenAI in their products. That doesn’t always make sense: however, understanding when it does and when it doesn’t is not obvious even for us experts, and nearly impossible for everyone else.</p>
<p>How can we help our colleagues understand the pros and cons of this tech, and figure out when and how it makes sense to use it?</p>
<p>In this post I am going to outline a narrative that explains LLMs without tecnicalities and help you frame some high level technical decisions, such as RAG vs finetuning, or which specific model size to use, in a way that a non-technical audience can not only grasp but also reason about. We’ll start by “translating” a few terms into their “human equivalent” and then use this metaphor to reason about the differences between RAG and finetuning.</p>
+
+
+ ClozeGPT: Write Anki cloze cards with a custom GPT
+ https://www.zansara.dev/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/
+ Wed, 28 Feb 2024 00:00:00 +0000
+ https://www.zansara.dev/posts/2024-02-28-create-anki-cloze-cards-with-custom-gpt/
+ <p>As everyone who has been serious about studying with <a href="https://apps.ankiweb.net/" class="external-link" target="_blank" rel="noopener">Anki</a> knows, the first step of the journey is writing your own flashcards. Writing the cards yourself is often cited as the most straigthforward way to make the review process more effective. However, this can become a big chore, and not having enough cards to study is a sure way to not learn anything.</p>
<p>What can we do to make this process less tedious?</p>
+
+
+ Is RAG all you need? A look at the limits of retrieval augmentation
+ https://www.zansara.dev/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser/
+ Wed, 21 Feb 2024 00:00:00 +0000
+ https://www.zansara.dev/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser/
+ <p><em>This blogpost is a teaser for <a href="https://odsc.com/speakers/rag-the-bad-parts-and-the-good-building-a-deeper-understanding-of-this-hot-llm-paradigms-weaknesses-strengths-and-limitations/" class="external-link" target="_blank" rel="noopener">my upcoming talk</a> at ODSC East 2024 in Boston, April 23-25. It is published on the ODSC blog <a href="https://opendatascience.com/is-rag-all-you-need-a-look-at-the-limits-of-retrieval-augmentation/" class="external-link" target="_blank" rel="noopener">at this link</a>.</em></p>
<hr>
<p>Retrieval Augmented Generation (RAG) is by far one of the most popular and effective techniques to bring LLMs to production. Introduced by a Meta <a href="https://arxiv.org/abs/2005.11401" class="external-link" target="_blank" rel="noopener">paper</a> in 2021, it since took off and evolved to become a field in itself, fueled by the immediate benefits that it provides: lowered risk of hallucinations, access to updated information, and so on. On top of this, RAG is relatively cheap to implement for the benefit it provides, especially when compared to costly techniques like LLM finetuning. This makes it a no-brainer for a lot of usecases, to the point that nowadays every production system that uses LLMs in production seems to be implemented as some form of RAG.</p>
+
+
+ Headless WiFi setup on Raspberry Pi OS "Bookworm" without the Raspberry Pi Imager
+ https://www.zansara.dev/posts/2024-01-06-raspberrypi-headless-bookworm-wifi-config/
+ Sat, 06 Jan 2024 00:00:00 +0000
+ https://www.zansara.dev/posts/2024-01-06-raspberrypi-headless-bookworm-wifi-config/
+ <p>Setting up a Raspberry Pi headless without the Raspberry Pi Imager used to be a fairly simple process for the average Linux user, to the point where a how-to and a few searches on the Raspberry Pi forums would sort the process out. After flashing the image with <code>dd</code>, creating <code>ssh</code> in the boot partition and populating <code>wpa_supplicant.conf</code> was normally enough to get started.</p>
<p>However with the <a href="https://www.raspberrypi.com/news/bookworm-the-new-version-of-raspberry-pi-os/" class="external-link" target="_blank" rel="noopener">recently released Raspberry Pi OS 12 “Bookworm”</a> this second step <a href="https://www.raspberrypi.com/documentation/computers/configuration.html#connect-to-a-wireless-network" class="external-link" target="_blank" rel="noopener">doesn’t work anymore</a> and the only recommendation that users receive is to “just use the Raspberry Pi Imager” (like <a href="https://github.com/raspberrypi/bookworm-feedback/issues/72" class="external-link" target="_blank" rel="noopener">here</a>).</p>
+
+
+ The World of Web RAG
+ https://www.zansara.dev/posts/2023-11-09-haystack-series-simple-web-rag/
+ Thu, 09 Nov 2023 00:00:00 +0000
+ https://www.zansara.dev/posts/2023-11-09-haystack-series-simple-web-rag/
+ <p><em>Last updated: 18/01/2023</em></p>
<hr>
<p>In an earlier post of the Haystack 2.0 series, we’ve seen how to build RAG and indexing pipelines. An application that uses these two pipelines is practical if you have an extensive, private collection of documents and need to perform RAG on such data only. However, in many cases, you may want to get data from the Internet: from news outlets, documentation pages, and so on.</p>
<p>In this post, we will see how to build a Web RAG application: a RAG pipeline that can search the Web for the information needed to answer your questions.</p>
+
+
+ Indexing data for RAG applications
+ https://www.zansara.dev/posts/2023-11-05-haystack-series-minimal-indexing/
+ Sun, 05 Nov 2023 00:00:00 +0000
+ https://www.zansara.dev/posts/2023-11-05-haystack-series-minimal-indexing/
+ <p><em>Last updated: 18/01/2023</em></p>
<hr>
<p>In the <a href="https://www.zansara.dev/posts/2023-10-27-haystack-series-rag" >previous post</a> of the Haystack 2.0 series, we saw how to build RAG pipelines using a generator, a prompt builder, and a retriever with its document store. However, the content of our document store wasn’t extensive, and populating one with clean, properly formatted data is not an easy task. How can we approach this problem?</p>
<p>In this post, I will show you how to use Haystack 2.0 to create large amounts of documents from a few web pages and write them a document store that you can then use for retrieval.</p>
+
+
+ RAG Pipelines from scratch
+ https://www.zansara.dev/posts/2023-10-27-haystack-series-rag/
+ Fri, 27 Oct 2023 00:00:00 +0000
+ https://www.zansara.dev/posts/2023-10-27-haystack-series-rag/
+ <p><em>Last updated: 18/01/2023 - Read it on the <a href="https://haystack.deepset.ai/blog/rag-pipelines-from-scratch" class="external-link" target="_blank" rel="noopener">Haystack Blog</a>.</em></p>
<hr>
<p>Retrieval Augmented Generation (RAG) is quickly becoming an essential technique to make LLMs more reliable and effective at answering any question, regardless of how specific. To stay relevant in today’s NLP landscape, Haystack must enable it.</p>
<p>Let’s see how to build such applications with Haystack 2.0, from a direct call to an LLM to a fully-fledged, production-ready RAG pipeline that scales. At the end of this post, we will have an application that can answer questions about world countries based on data stored in a private database. At that point, the knowledge of the LLM will be only limited by the content of our data store, and all of this can be accomplished without fine-tuning language models.</p>
+
+
+ A New Approach to Haystack Pipelines
+ https://www.zansara.dev/posts/2023-10-26-haystack-series-canals/
+ Thu, 26 Oct 2023 00:00:00 +0000
+ https://www.zansara.dev/posts/2023-10-26-haystack-series-canals/
+ <p><em>Updated on 21/12/2023</em></p>
<hr>
<p>As we have seen in <a href="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/" class="external-link" target="_blank" rel="noopener">the previous episode of this series</a>, Haystack’s Pipeline is a powerful concept that comes with its set of benefits and shortcomings. In Haystack 2.0, the pipeline was one of the first items that we focused our attention on, and it was the starting point of the entire rewrite.</p>
<p>What does this mean in practice? Let’s look at what Haystack Pipelines in 2.0 will be like, how they differ from their 1.x counterparts, and the pros and cons of this new paradigm.</p>
+
+
+ Haystack's Pipeline - A Deep Dive
+ https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/
+ Sun, 15 Oct 2023 00:00:00 +0000
+ https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/
+ <p>If you’ve ever looked at Haystack before, you must have come across the <a href="https://docs.haystack.deepset.ai/docs/pipelines" class="external-link" target="_blank" rel="noopener">Pipeline</a>, one of the most prominent concepts of the framework. However, this abstraction is by no means an obvious choice when it comes to NLP libraries. Why did we adopt this concept, and what does it bring us?</p>
<p>In this post, I go into all the details of how the Pipeline abstraction works in Haystack now, why it works this way, and its strengths and weaknesses. This deep dive into the current state of the framework is also a premise for the next episode, where I will explain how Haystack 2.0 addresses this version’s shortcomings.</p>
+
+
+ Why rewriting Haystack?!
+ https://www.zansara.dev/posts/2023-10-11-haystack-series-why/
+ Wed, 11 Oct 2023 00:00:00 +0000
+ https://www.zansara.dev/posts/2023-10-11-haystack-series-why/
+ <p>Before even diving into what Haystack 2.0 is, how it was built, and how it works, let’s spend a few words about the whats and the whys.</p>
<p>First of all, <em>what is</em> Haystack?</p>
<p>And next, why on Earth did we decide to rewrite it from the ground up?</p>
<h3 id="a-pioneer-framework">
A Pioneer Framework
<a class="heading-link" href="#a-pioneer-framework">
<i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
<span class="sr-only">Link to heading</span>
</a>
</h3>
<p>Haystack is a relatively young framework, its initial release dating back to <a href="https://github.com/deepset-ai/haystack/releases/tag/0.1.0" class="external-link" target="_blank" rel="noopener">November 28th, 2019</a>. Back then, Natural Language Processing was a field that had just started moving its first step outside of research labs, and Haystack was one of the first libraries that promised enterprise-grade, production-ready NLP features. We were proud to enable use cases such as <a href="https://medium.com/deepset-ai/what-semantic-search-can-do-for-you-ea5b1e8dfa7f" class="external-link" target="_blank" rel="noopener">semantic search</a>, <a href="https://medium.com/deepset-ai/semantic-faq-search-with-haystack-6a03b1e13053" class="external-link" target="_blank" rel="noopener">FAQ matching</a>, document similarity, document summarization, machine translation, language-agnostic search, and so on.</p>
+
+
+ Haystack 2.0: What is it?
+ https://www.zansara.dev/posts/2023-10-10-haystack-series-intro/
+ Tue, 10 Oct 2023 00:00:00 +0000
+ https://www.zansara.dev/posts/2023-10-10-haystack-series-intro/
+ <p>December is finally approaching, and with it the release of a <a href="https://github.com/deepset-ai/haystack" class="external-link" target="_blank" rel="noopener">Haystack</a> 2.0. At <a href="https://www.deepset.ai/" class="external-link" target="_blank" rel="noopener">deepset</a>, we’ve been talking about it for months, we’ve been iterating on the core concepts what feels like a million times, and it looks like we’re finally getting ready for the approaching deadline.</p>
<p>But what is it that makes this release so special?</p>
<p>In short, Haystack 2.0 is a complete rewrite. A huge, big-bang style change. Almost no code survived the migration unmodified: we’ve been across the entire 100,000+ lines of the codebase and redone everything in under a year. For our small team, this is a huge accomplishment.</p>
+
+
+ An (unofficial) Python SDK for Verbix
+ https://www.zansara.dev/posts/2023-09-10-python-verbix-sdk/
+ Sun, 10 Sep 2023 00:00:00 +0000
+ https://www.zansara.dev/posts/2023-09-10-python-verbix-sdk/
+ <p>PyPI package: <a href="https://pypi.org/project/verbix-sdk/" class="external-link" target="_blank" rel="noopener">https://pypi.org/project/verbix-sdk/</a></p>
<p>GitHub Repo: <a href="https://github.com/ZanSara/verbix-sdk" class="external-link" target="_blank" rel="noopener">https://github.com/ZanSara/verbix-sdk</a></p>
<p>Minimal Docs: <a href="https://github.com/ZanSara/verbix-sdk/blob/main/README.md" class="external-link" target="_blank" rel="noopener">https://github.com/ZanSara/verbix-sdk/blob/main/README.md</a></p>
<hr>
<p>As part of a larger side project which is still in the works (<a href="https://github.com/ebisu-flashcards" class="external-link" target="_blank" rel="noopener">Ebisu Flashcards</a>), these days I found myself looking for some decent API for verbs conjugations in different languages. My requirements were “simple”:</p>
<ul>
<li>Supports many languages, including Italian, Portuguese and Hungarian</li>
<li>Conjugates irregulars properly</li>
<li>Offers an API access to the conjugation tables</li>
<li>Refuses to conjugate anything except for known verbs</li>
<li>(Optional) Highlights the irregularities in some way</li>
</ul>
<p>Surprisingly these seem to be a shortage of good alternatives in this field. All websites that host polished conjugation data don’t seem to offer API access (looking at you, <a href="https://conjugator.reverso.net" class="external-link" target="_blank" rel="noopener">Reverso</a> – you’ll get your own post one day), and most of the simples ones use heuristics to conjugate, which makes them very prone to errors. So for now I ended up choosing <a href="https://verbix.com" class="external-link" target="_blank" rel="noopener">Verbix</a> to start from.</p>
+
+
+ My Dotfiles
+ https://www.zansara.dev/posts/2021-12-11-dotfiles/
+ Sat, 11 Dec 2021 00:00:00 +0000
+ https://www.zansara.dev/posts/2021-12-11-dotfiles/
+ <p>GitHub Repo: <a href="https://github.com/ZanSara/dotfiles" class="external-link" target="_blank" rel="noopener">https://github.com/ZanSara/dotfiles</a></p>
<hr>
<p>What Linux developer would I be if I didn’t also have my very own dotfiles repo?</p>
<p>After many years of iterations I finally found a combination that lasted quite a while, so I figured it’s time to treat them as a real project. It was originally optimized for my laptop, but then I realized it works quite well on my three-monitor desk setup as well without major issues.</p>
+
+
+
diff --git a/posts/is-rag-all-you-need/index.html b/posts/is-rag-all-you-need/index.html
new file mode 100644
index 00000000..15053e55
--- /dev/null
+++ b/posts/is-rag-all-you-need/index.html
@@ -0,0 +1,10 @@
+
+
+
+ https://www.zansara.dev/posts/2024-02-20-is-rag-all-you-need-odsc-east-2024-teaser/
+
+
+
+
+
+
diff --git a/static/posts/outdated/2024-02-07-haystack-series-multiplexer/auto_correction_loop_no_multiplexer.png b/posts/outdated/2024-02-07-haystack-series-multiplexer/auto_correction_loop_no_multiplexer.png
similarity index 100%
rename from static/posts/outdated/2024-02-07-haystack-series-multiplexer/auto_correction_loop_no_multiplexer.png
rename to posts/outdated/2024-02-07-haystack-series-multiplexer/auto_correction_loop_no_multiplexer.png
diff --git a/static/posts/outdated/2024-02-07-haystack-series-multiplexer/auto_correction_loop_with_multiplexer.jpeg b/posts/outdated/2024-02-07-haystack-series-multiplexer/auto_correction_loop_with_multiplexer.jpeg
similarity index 100%
rename from static/posts/outdated/2024-02-07-haystack-series-multiplexer/auto_correction_loop_with_multiplexer.jpeg
rename to posts/outdated/2024-02-07-haystack-series-multiplexer/auto_correction_loop_with_multiplexer.jpeg
diff --git a/static/posts/outdated/2024-02-07-haystack-series-multiplexer/broken_pipeline.png b/posts/outdated/2024-02-07-haystack-series-multiplexer/broken_pipeline.png
similarity index 100%
rename from static/posts/outdated/2024-02-07-haystack-series-multiplexer/broken_pipeline.png
rename to posts/outdated/2024-02-07-haystack-series-multiplexer/broken_pipeline.png
diff --git a/static/posts/outdated/2024-02-07-haystack-series-multiplexer/cover.jpeg b/posts/outdated/2024-02-07-haystack-series-multiplexer/cover.jpeg
similarity index 100%
rename from static/posts/outdated/2024-02-07-haystack-series-multiplexer/cover.jpeg
rename to posts/outdated/2024-02-07-haystack-series-multiplexer/cover.jpeg
diff --git a/static/posts/outdated/2024-02-07-haystack-series-multiplexer/hybrid_search_no_multiplexer.jpeg b/posts/outdated/2024-02-07-haystack-series-multiplexer/hybrid_search_no_multiplexer.jpeg
similarity index 100%
rename from static/posts/outdated/2024-02-07-haystack-series-multiplexer/hybrid_search_no_multiplexer.jpeg
rename to posts/outdated/2024-02-07-haystack-series-multiplexer/hybrid_search_no_multiplexer.jpeg
diff --git a/static/posts/outdated/2024-02-07-haystack-series-multiplexer/hybrid_search_with_multiplexer.jpeg b/posts/outdated/2024-02-07-haystack-series-multiplexer/hybrid_search_with_multiplexer.jpeg
similarity index 100%
rename from static/posts/outdated/2024-02-07-haystack-series-multiplexer/hybrid_search_with_multiplexer.jpeg
rename to posts/outdated/2024-02-07-haystack-series-multiplexer/hybrid_search_with_multiplexer.jpeg
diff --git a/posts/page/1/index.html b/posts/page/1/index.html
new file mode 100644
index 00000000..1238a547
--- /dev/null
+++ b/posts/page/1/index.html
@@ -0,0 +1,10 @@
+
+
+
+ https://www.zansara.dev/posts/
+
+
+
+
+
+
diff --git a/projects/booking-system/index.html b/projects/booking-system/index.html
new file mode 100644
index 00000000..3f65247f
--- /dev/null
+++ b/projects/booking-system/index.html
@@ -0,0 +1,225 @@
+
+
+
+
+
+ CAI Sovico's Website · Sara Zan
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Since my bachelor studies I have maintained the IT infrastructure of an alpine hut, Rifugio M. Del Grande - R. Camerini. I count this as my first important project, one that people, mostly older and not very tech savvy, depended on to run a real business.
+
The website went through several iterations as web technologies evolved, and well as the type of servers we could afford. Right now it features minimal HTML/CSS static pages, plus a reservations system written on a PHP 8 / MySQL backend with a vanilla JS frontend. It also includes an FTP server that supports a couple of ZanzoCams and a weather monitoring station.
With the rise of more and more powerful LLMs, I am experimenting with different ways to interact with them in ways that don’t necessarily involve a laptop, a keyboard or a screen.
+
I codenamed all of these experiments “Brekeke”, the sound frogs make in Hungarian (don’t ask why). The focus of these experiments is mostly small home automation tasks and run on a swarm of Raspberry Pis.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/projects/index.xml b/projects/index.xml
new file mode 100644
index 00000000..e89d4713
--- /dev/null
+++ b/projects/index.xml
@@ -0,0 +1,40 @@
+
+
+
+ Projects on Sara Zan
+ https://www.zansara.dev/projects/
+ Recent content in Projects on Sara Zan
+ Hugo
+ en
+ Fri, 01 Dec 2023 00:00:00 +0000
+
+
+ Brekeke
+ https://www.zansara.dev/projects/brekeke/
+ Fri, 01 Dec 2023 00:00:00 +0000
+ https://www.zansara.dev/projects/brekeke/
+ <p>With the rise of more and more powerful LLMs, I am experimenting with different ways to interact with them in ways that don’t necessarily involve a laptop, a keyboard or a screen.</p>
<p>I codenamed all of these experiments “Brekeke”, the sound frogs make in Hungarian (don’t ask why). The focus of these experiments is mostly small home automation tasks and run on a swarm of Raspberry Pis.</p>
+
+
+ Ebisu Flashcards - In Progress!
+ https://www.zansara.dev/projects/ebisu-flashcards/
+ Tue, 01 Jun 2021 00:00:00 +0000
+ https://www.zansara.dev/projects/ebisu-flashcards/
+
+
+
+ ZanzoCam
+ https://www.zansara.dev/projects/zanzocam/
+ Wed, 01 Jan 2020 00:00:00 +0000
+ https://www.zansara.dev/projects/zanzocam/
+ <p>Main website: <a href="https://zanzocam.github.io/" class="external-link" target="_blank" rel="noopener">https://zanzocam.github.io/</a></p>
<hr>
<p>ZanzoCam is a low-power, low-frequency camera based on Raspberry Pi, designed to operate autonomously in remote locations and under harsh conditions. It was designed and developed between 2019 and 2021 for <a href="https://www.cai.it/gruppo_regionale/gr-lombardia/" class="external-link" target="_blank" rel="noopener">CAI Lombardia</a> by a team of two people, with me as the software developer and the other responsible for the hardware design. CAI later deployed several of these devices on their affiliate huts.</p>
<p>ZanzoCams are designed to work reliably in the harsh conditions of alpine winters, be as power-efficient as possible, and tolerate unstable network connections: they feature a robust HTTP- or FTP-based picture upload strategy which is remotely configurable from a very simple, single-file web panel. The camera software also improves on the basic capabilities of picamera to take pictures in dark conditions, making ZanzoCams able to shoot good pictures for a few hours after sunset.</p>
+
+
+ CAI Sovico's Website
+ https://www.zansara.dev/projects/booking-system/
+ Fri, 01 Jan 2016 00:00:00 +0000
+ https://www.zansara.dev/projects/booking-system/
+ <p>Main website: <a href="https://www.caisovico.it" class="external-link" target="_blank" rel="noopener">https://www.caisovico.it</a></p>
<hr>
<p>Since my bachelor studies I have maintained the IT infrastructure of an alpine hut, <a href="https://maps.app.goo.gl/PwdVC82VHwdPZJDE6" class="external-link" target="_blank" rel="noopener">Rifugio M. Del Grande - R. Camerini</a>. I count this as my first important project, one that people, mostly older and not very tech savvy, depended on to run a real business.</p>
<p>The website went through several iterations as web technologies evolved, and well as the type of servers we could afford. Right now it features minimal HTML/CSS static pages, plus a reservations system written on a PHP 8 / MySQL backend with a vanilla JS frontend. It also includes an FTP server that supports a couple of <a href="https://www.zansara.dev/projects/zanzocam/" >ZanzoCams</a> and a <a href="http://www.meteoproject.it/ftp/stazioni/caisovico/" class="external-link" target="_blank" rel="noopener">weather monitoring station</a>.</p>
+
+
+
diff --git a/projects/page/1/index.html b/projects/page/1/index.html
new file mode 100644
index 00000000..9f1ef2b8
--- /dev/null
+++ b/projects/page/1/index.html
@@ -0,0 +1,10 @@
+
+
+
+ https://www.zansara.dev/projects/
+
+
+
+
+
+
diff --git a/static/projects/zanzocam.png b/projects/zanzocam.png
similarity index 100%
rename from static/projects/zanzocam.png
rename to projects/zanzocam.png
diff --git a/projects/zanzocam/index.html b/projects/zanzocam/index.html
new file mode 100644
index 00000000..60e6ea9e
--- /dev/null
+++ b/projects/zanzocam/index.html
@@ -0,0 +1,234 @@
+
+
+
+
+
+ ZanzoCam · Sara Zan
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
ZanzoCam is a low-power, low-frequency camera based on Raspberry Pi, designed to operate autonomously in remote locations and under harsh conditions. It was designed and developed between 2019 and 2021 for CAI Lombardia by a team of two people, with me as the software developer and the other responsible for the hardware design. CAI later deployed several of these devices on their affiliate huts.
+
ZanzoCams are designed to work reliably in the harsh conditions of alpine winters, be as power-efficient as possible, and tolerate unstable network connections: they feature a robust HTTP- or FTP-based picture upload strategy which is remotely configurable from a very simple, single-file web panel. The camera software also improves on the basic capabilities of picamera to take pictures in dark conditions, making ZanzoCams able to shoot good pictures for a few hours after sunset.
+
The camera is highly configurable: photo size and frequency, server address and protocol, all the overlays (color, size, position, text and images) and several other parameters can be configured remotely without the need to expose any ports of the device to the internet. They work reliably without the need for a VPN and at the same time are quite secure by design.
+
ZanzoCams mostly serve CAI and the hut managers for self-promotion, and help hikers and climbers assess the local conditions before attempting a hike. Pictures taken for this purposes are sent to RifugiLombardia, and you can see many of them at this page.
+
However, it has also been used by glaciologists to monitor glacier conditions, outlook and extension over the years. Here you can see their webcams, some of which are ZanzoCams.
ZanzoCam is fully open-source: check the GitHub repo. Due to this decision of open-sourcing the project, I was invited by Università di Pavia to hold a lecture about the project as part of their “Hardware and Software Codesign”. Check out the slides of the lecture here.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/publications/index.xml b/publications/index.xml
new file mode 100644
index 00000000..92ec4111
--- /dev/null
+++ b/publications/index.xml
@@ -0,0 +1,40 @@
+
+
+
+ Publications on Sara Zan
+ https://www.zansara.dev/publications/
+ Recent content in Publications on Sara Zan
+ Hugo
+ en
+ Tue, 01 Mar 2022 00:00:00 +0000
+
+
+ Adopting PyQt For Beam Instrumentation GUI Development At CERN
+ https://www.zansara.dev/publications/thpv014/
+ Tue, 01 Mar 2022 00:00:00 +0000
+ https://www.zansara.dev/publications/thpv014/
+ <h2 id="abstract">
Abstract
<a class="heading-link" href="#abstract">
<i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
<span class="sr-only">Link to heading</span>
</a>
</h2>
<p>As Java GUI toolkits become deprecated, the Beam Instrumentation (BI)group at CERN has investigated alternatives and selected PyQt as one of the suitable technologies for future GUIs, in accordance with the paper presented at ICALEPCS19. This paper presents tools created, or adapted, to seamlessly integrate future PyQt GUI development alongside current Java oriented workflows and the controls environment. This includes (a) creating a project template and a GUI management tool to ease and standardize our development process, (b) rewriting our previously Java-centric Expert GUI Launcher to be language-agnostic and (c) porting a selection of operational GUIs from Java to PyQt, to test the feasibility of the development process and identify bottlenecks. To conclude, the challenges we anticipate for the BI GUI developer community in adopting this new technology are also discussed.</p>
+
+
+ Evolution of the CERN Beam Instrumentation Offline Analysis Framework (OAF)
+ https://www.zansara.dev/publications/thpv042/
+ Sat, 11 Dec 2021 00:00:00 +0000
+ https://www.zansara.dev/publications/thpv042/
+ <h2 id="abstract">
Abstract
<a class="heading-link" href="#abstract">
<i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
<span class="sr-only">Link to heading</span>
</a>
</h2>
<p>The CERN accelerators require a large number of instruments, measuring different beam parameters like position, losses, current etc. The instruments’ associated electronics and software also produce information about their status. All these data are stored in a database for later analysis. The Beam Instrumentation group developed the Offline Analysis Framework some years ago to regularly and systematically analyze these data. The framework has been successfully used for nearly 100 different analyses that ran regularly by the end of the LHC run 2. Currently it is being updated for run 3 with modern and efficient tools to improve its usability and data analysis power. In particular, the architecture has been reviewed to have a modular design to facilitate the maintenance and the future evolution of the tool. A new web based application is being developed to facilitate the users’ access both to online configuration and to results. This paper will describe all these evolutions and outline possible lines of work for further improvements.</p>
+
+
+ Our Journey From Java to PyQt and Web For CERN Accelerator Control GUIs
+ https://www.zansara.dev/publications/tucpr03/
+ Sun, 30 Aug 2020 00:00:00 +0000
+ https://www.zansara.dev/publications/tucpr03/
+ <h2 id="abstract">
Abstract
<a class="heading-link" href="#abstract">
<i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
<span class="sr-only">Link to heading</span>
</a>
</h2>
<p>For more than 15 years, operational GUIs for accelerator controls and some lab applications for equipment experts have been developed in Java, first with Swing and more recently with JavaFX. In March 2018, Oracle announced that Java GUIs were not part of their strategy anymore*. They will not ship JavaFX after Java 8 and there are hints that they would like to get rid of Swing as well. This was a wakeup call for us. We took the opportunity to reconsider all technical options for developing operational GUIs. Our options ranged from sticking with JavaFX, over using the Qt framework (either using PyQt or developing our own Java Bindings to Qt), to using Web technology both in a browser and in native desktop applications. This article explains the reasons for moving away from Java as the main GUI technology and describes the analysis and hands-on evaluations that we went through before choosing the replacement.</p>
+
+
+ Evaluation of Qt as GUI Framework for Accelerator Controls
+ https://www.zansara.dev/publications/msc-thesis/
+ Thu, 20 Dec 2018 00:00:00 +0000
+ https://www.zansara.dev/publications/msc-thesis/
+ <p>This is the full-text of my MSc thesis, written in collaboration with
<a href="https://www.polimi.it/" class="external-link" target="_blank" rel="noopener">Politecnico di Milano</a> and <a href="https://home.cern/" class="external-link" target="_blank" rel="noopener">CERN</a>.</p>
<hr>
<p>Get the full text here: <a href="https://www.zansara.dev/publications/msc-thesis.pdf" >Evaluation of Qt as GUI Framework for Accelerator Controls</a></p>
<p>Publisher’s entry: <a href="https://hdl.handle.net/10589/144860" class="external-link" target="_blank" rel="noopener">10589/144860</a>.</p>
+
+
+
diff --git a/static/publications/msc-thesis.pdf b/publications/msc-thesis.pdf
similarity index 100%
rename from static/publications/msc-thesis.pdf
rename to publications/msc-thesis.pdf
diff --git a/static/publications/msc-thesis.png b/publications/msc-thesis.png
similarity index 100%
rename from static/publications/msc-thesis.png
rename to publications/msc-thesis.png
diff --git a/publications/msc-thesis/index.html b/publications/msc-thesis/index.html
new file mode 100644
index 00000000..4fb73d8e
--- /dev/null
+++ b/publications/msc-thesis/index.html
@@ -0,0 +1,235 @@
+
+
+
+
+
+ Evaluation of Qt as GUI Framework for Accelerator Controls · Sara Zan
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
As Java GUI toolkits become deprecated, the Beam Instrumentation (BI)group at CERN has investigated alternatives and selected PyQt as one of the suitable technologies for future GUIs, in accordance with the paper presented at ICALEPCS19. This paper presents tools created, or adapted, to seamlessly integrate future PyQt GUI development alongside current Java oriented workflows and the controls environment. This includes (a) creating a project template and a GUI management tool to ease and standardize our development process, (b) rewriting our previously Java-centric Expert GUI Launcher to be language-agnostic and (c) porting a selection of operational GUIs from Java to PyQt, to test the feasibility of the development process and identify bottlenecks. To conclude, the challenges we anticipate for the BI GUI developer community in adopting this new technology are also discussed.
The CERN accelerators require a large number of instruments, measuring different beam parameters like position, losses, current etc. The instruments’ associated electronics and software also produce information about their status. All these data are stored in a database for later analysis. The Beam Instrumentation group developed the Offline Analysis Framework some years ago to regularly and systematically analyze these data. The framework has been successfully used for nearly 100 different analyses that ran regularly by the end of the LHC run 2. Currently it is being updated for run 3 with modern and efficient tools to improve its usability and data analysis power. In particular, the architecture has been reviewed to have a modular design to facilitate the maintenance and the future evolution of the tool. A new web based application is being developed to facilitate the users’ access both to online configuration and to results. This paper will describe all these evolutions and outline possible lines of work for further improvements.
For more than 15 years, operational GUIs for accelerator controls and some lab applications for equipment experts have been developed in Java, first with Swing and more recently with JavaFX. In March 2018, Oracle announced that Java GUIs were not part of their strategy anymore*. They will not ship JavaFX after Java 8 and there are hints that they would like to get rid of Swing as well. This was a wakeup call for us. We took the opportunity to reconsider all technical options for developing operational GUIs. Our options ranged from sticking with JavaFX, over using the Qt framework (either using PyQt or developing our own Java Bindings to Qt), to using Web technology both in a browser and in native desktop applications. This article explains the reasons for moving away from Java as the main GUI technology and describes the analysis and hands-on evaluations that we went through before choosing the replacement.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/series/haystack-2.0-series/index.xml b/series/haystack-2.0-series/index.xml
new file mode 100644
index 00000000..fc693f7e
--- /dev/null
+++ b/series/haystack-2.0-series/index.xml
@@ -0,0 +1,61 @@
+
+
+
+ Haystack 2.0 Series on Sara Zan
+ https://www.zansara.dev/series/haystack-2.0-series/
+ Recent content in Haystack 2.0 Series on Sara Zan
+ Hugo
+ en
+ Thu, 09 Nov 2023 00:00:00 +0000
+
+
+ The World of Web RAG
+ https://www.zansara.dev/posts/2023-11-09-haystack-series-simple-web-rag/
+ Thu, 09 Nov 2023 00:00:00 +0000
+ https://www.zansara.dev/posts/2023-11-09-haystack-series-simple-web-rag/
+ <p><em>Last updated: 18/01/2023</em></p>
<hr>
<p>In an earlier post of the Haystack 2.0 series, we’ve seen how to build RAG and indexing pipelines. An application that uses these two pipelines is practical if you have an extensive, private collection of documents and need to perform RAG on such data only. However, in many cases, you may want to get data from the Internet: from news outlets, documentation pages, and so on.</p>
<p>In this post, we will see how to build a Web RAG application: a RAG pipeline that can search the Web for the information needed to answer your questions.</p>
+
+
+ Indexing data for RAG applications
+ https://www.zansara.dev/posts/2023-11-05-haystack-series-minimal-indexing/
+ Sun, 05 Nov 2023 00:00:00 +0000
+ https://www.zansara.dev/posts/2023-11-05-haystack-series-minimal-indexing/
+ <p><em>Last updated: 18/01/2023</em></p>
<hr>
<p>In the <a href="https://www.zansara.dev/posts/2023-10-27-haystack-series-rag" >previous post</a> of the Haystack 2.0 series, we saw how to build RAG pipelines using a generator, a prompt builder, and a retriever with its document store. However, the content of our document store wasn’t extensive, and populating one with clean, properly formatted data is not an easy task. How can we approach this problem?</p>
<p>In this post, I will show you how to use Haystack 2.0 to create large amounts of documents from a few web pages and write them a document store that you can then use for retrieval.</p>
+
+
+ RAG Pipelines from scratch
+ https://www.zansara.dev/posts/2023-10-27-haystack-series-rag/
+ Fri, 27 Oct 2023 00:00:00 +0000
+ https://www.zansara.dev/posts/2023-10-27-haystack-series-rag/
+ <p><em>Last updated: 18/01/2023 - Read it on the <a href="https://haystack.deepset.ai/blog/rag-pipelines-from-scratch" class="external-link" target="_blank" rel="noopener">Haystack Blog</a>.</em></p>
<hr>
<p>Retrieval Augmented Generation (RAG) is quickly becoming an essential technique to make LLMs more reliable and effective at answering any question, regardless of how specific. To stay relevant in today’s NLP landscape, Haystack must enable it.</p>
<p>Let’s see how to build such applications with Haystack 2.0, from a direct call to an LLM to a fully-fledged, production-ready RAG pipeline that scales. At the end of this post, we will have an application that can answer questions about world countries based on data stored in a private database. At that point, the knowledge of the LLM will be only limited by the content of our data store, and all of this can be accomplished without fine-tuning language models.</p>
+
+
+ A New Approach to Haystack Pipelines
+ https://www.zansara.dev/posts/2023-10-26-haystack-series-canals/
+ Thu, 26 Oct 2023 00:00:00 +0000
+ https://www.zansara.dev/posts/2023-10-26-haystack-series-canals/
+ <p><em>Updated on 21/12/2023</em></p>
<hr>
<p>As we have seen in <a href="https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/" class="external-link" target="_blank" rel="noopener">the previous episode of this series</a>, Haystack’s Pipeline is a powerful concept that comes with its set of benefits and shortcomings. In Haystack 2.0, the pipeline was one of the first items that we focused our attention on, and it was the starting point of the entire rewrite.</p>
<p>What does this mean in practice? Let’s look at what Haystack Pipelines in 2.0 will be like, how they differ from their 1.x counterparts, and the pros and cons of this new paradigm.</p>
+
+
+ Haystack's Pipeline - A Deep Dive
+ https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/
+ Sun, 15 Oct 2023 00:00:00 +0000
+ https://www.zansara.dev/posts/2023-10-15-haystack-series-pipeline/
+ <p>If you’ve ever looked at Haystack before, you must have come across the <a href="https://docs.haystack.deepset.ai/docs/pipelines" class="external-link" target="_blank" rel="noopener">Pipeline</a>, one of the most prominent concepts of the framework. However, this abstraction is by no means an obvious choice when it comes to NLP libraries. Why did we adopt this concept, and what does it bring us?</p>
<p>In this post, I go into all the details of how the Pipeline abstraction works in Haystack now, why it works this way, and its strengths and weaknesses. This deep dive into the current state of the framework is also a premise for the next episode, where I will explain how Haystack 2.0 addresses this version’s shortcomings.</p>
+
+
+ Why rewriting Haystack?!
+ https://www.zansara.dev/posts/2023-10-11-haystack-series-why/
+ Wed, 11 Oct 2023 00:00:00 +0000
+ https://www.zansara.dev/posts/2023-10-11-haystack-series-why/
+ <p>Before even diving into what Haystack 2.0 is, how it was built, and how it works, let’s spend a few words about the whats and the whys.</p>
<p>First of all, <em>what is</em> Haystack?</p>
<p>And next, why on Earth did we decide to rewrite it from the ground up?</p>
<h3 id="a-pioneer-framework">
A Pioneer Framework
<a class="heading-link" href="#a-pioneer-framework">
<i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
<span class="sr-only">Link to heading</span>
</a>
</h3>
<p>Haystack is a relatively young framework, its initial release dating back to <a href="https://github.com/deepset-ai/haystack/releases/tag/0.1.0" class="external-link" target="_blank" rel="noopener">November 28th, 2019</a>. Back then, Natural Language Processing was a field that had just started moving its first step outside of research labs, and Haystack was one of the first libraries that promised enterprise-grade, production-ready NLP features. We were proud to enable use cases such as <a href="https://medium.com/deepset-ai/what-semantic-search-can-do-for-you-ea5b1e8dfa7f" class="external-link" target="_blank" rel="noopener">semantic search</a>, <a href="https://medium.com/deepset-ai/semantic-faq-search-with-haystack-6a03b1e13053" class="external-link" target="_blank" rel="noopener">FAQ matching</a>, document similarity, document summarization, machine translation, language-agnostic search, and so on.</p>
+
+
+ Haystack 2.0: What is it?
+ https://www.zansara.dev/posts/2023-10-10-haystack-series-intro/
+ Tue, 10 Oct 2023 00:00:00 +0000
+ https://www.zansara.dev/posts/2023-10-10-haystack-series-intro/
+ <p>December is finally approaching, and with it the release of a <a href="https://github.com/deepset-ai/haystack" class="external-link" target="_blank" rel="noopener">Haystack</a> 2.0. At <a href="https://www.deepset.ai/" class="external-link" target="_blank" rel="noopener">deepset</a>, we’ve been talking about it for months, we’ve been iterating on the core concepts what feels like a million times, and it looks like we’re finally getting ready for the approaching deadline.</p>
<p>But what is it that makes this release so special?</p>
<p>In short, Haystack 2.0 is a complete rewrite. A huge, big-bang style change. Almost no code survived the migration unmodified: we’ve been across the entire 100,000+ lines of the codebase and redone everything in under a year. For our small team, this is a huge accomplishment.</p>
+
+
+
diff --git a/series/haystack-2.0-series/page/1/index.html b/series/haystack-2.0-series/page/1/index.html
new file mode 100644
index 00000000..6c3659f2
--- /dev/null
+++ b/series/haystack-2.0-series/page/1/index.html
@@ -0,0 +1,10 @@
+
+
+
+ https://www.zansara.dev/series/haystack-2.0-series/
+
+
+
+
+
+
diff --git a/series/index.html b/series/index.html
new file mode 100644
index 00000000..b4de7983
--- /dev/null
+++ b/series/index.html
@@ -0,0 +1,199 @@
+
+
+
+
+ Series · Sara Zan
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
The slides go through the entire lifecycle of the ZanzoCam project,
+from the very inception of it, the market research, our decision process, earlier prototypes, and
+then goes into a more detailed explanation of the the design and implementation of the project from
+a hardware and software perspective, with some notes about our financial situation and project management.
Search should not be limited to text only. Recently, Transformers-based NLP models started crossing the boundaries of text data and exploring the possibilities of other modalities, like tabular data, images, audio files, and more. Text-to-text generation models like GPT now have their counterparts in text-to-image models, like Stable Diffusion. But what about search? In this talk we’re going to experiment with CLIP, a text-to-image search model, to look for animals matching specific characteristics in a dataset of pictures. Does CLIP know which one is “The fastest animal in the world”?
+
+
For the 7th OpenNLP meetup I presented the topic of Image Retrieval, a feature that I’ve recently added to Haystack in the form of a MultiModal Retriever (see the Tutorial).
+
The talk consists of 5 parts:
+
+
An introduction of the topic of Image Retrieval
+
A mention of the current SOTA model (CLIP)
+
An overview of Haystack
+
A step-by-step description of how image retrieval applications can be implemented with Haystack
+
A live coding session where I start from a blank Colab notebook and build a fully working image retrieval system from the ground up, to the point where I can run queries live.
+
+
Towards the end I mention briefly an even more advanced version of this image retrieval system, which I had no time to implement live. However, I later built a notebook implementing such system and you can find it here: Cheetah.ipynb
+
The slides were generated from the linked Jupyter notebook with jupyter nbconvert Dec_1st_OpenNLP_Meetup.ipynb --to slides --post serve.
In this Office Hours I’ve presented for the first time to our Discord community a preview of the upcoming 2.0 release of Haystack, which has been in the works since the start of the year. As rumors started to arise at the presence of a preview module in the latest Haystack 1.x releases, we took the opportunity to share this early draft of the project to collect early feedback.
+
Haystack 2.0 is a total rewrite that rethinks many of the core concepts of the framework and makes LLMs support its primary concern, but makes sure to support all the usecases its predecessor enabled. The rewrite addresses some well-know, old issues about the pipeline’s design, the relationship between the pipeline, its components, and the document stores, and aims at improving drastically the developer experience and the framework’s extensibility.
+
As the main designer of this rewrite, I walked the community through a slightly re-hashed version of the slide deck I’ve presented internally just a few days earlier in an All Hands on the same topic.
In this Office Hours I walk through the LLM support offered by Haystack 2.0 to this date: Generator, PromptBuilder, and how to connect them to different types of Retrievers to build Retrieval Augmented Generation (RAG) applications.
+
In under 40 minutes we start from a simple query to ChatGPT up to a full pipeline that retrieves documents from the Internet, splits them into chunks and feeds them to an LLM to ground its replies.
+
The talk indirectly shows also how Pipelines can help users compose these systems quickly, to visualize them, and helps them connect together different parts by producing verbose error messages.
In this hour-long workshop organized by Analytics Vidhya I give an overview of what RAG is, what problems it solves, and how it works.
+
After a brief introduction to Haystack, I show in practice how to use Haystack 2.0 to assemble a Pipeline that performs RAG on a local database and then on the Web with a simple change.
+
I also mention how to use and implement custom Haystack components, and share a lot of resources on the topic of RAG and Haystack 2.0.
+
This was my most popular talk to date, with over a hundred attendees watching live and several questions.
Per concludere in grande stile il 2023, in questa puntata ci occupiamo delle LLM che sono state un argomento centrale della scena tech dell’anno che sta per terminare. Abbiamo invitato due esperti del settore, Sara Zanzottera e Stefano Fiorucci.
+
Entrambi i nostri ospiti lavorano per deepset come NLP Engineer. Deepset è l’azienda produttrice di Haystack uno dei framework opensource per LLM più noti, che ha da poco raggiunto la versione 2.0 beta. Proprio Haystack è stato uno degli argomenti di cui ci siamo occupati con i nostri ospiti, cercando di capirne le potenzialità.
+
Ma è possibile riuscire a lavorare ad un framework di questo tipo rimanendo anche aggiornati sulle novità di un mondo in costante evoluzione? Questa è una delle tante domande a cui Sara e Stefano hanno risposto. Vi interessa il mondo delle LLM? Non perdetevi questa puntata!
At ODSC East 2024 I talked about RAG: how it works, how it fails, and how to evaluate its performance objectively. I gave an overview of some useful open-source tools for RAG evalution and how to use them with Haystack, and then offered some ideas on how to expand your RAG architecture further than a simple two-step process.
Plus, shout-out to a very interesting LLM evaluation library I discovered at ODSC: continuous-eval. Worth checking out especially if SAS or answer correctness are too vague and high level for your domain.
At EuroPython 2024 I talked about RAG: how it works, how it fails, and how to evaluate its performance objectively. I gave an overview of some useful open-source tools for RAG evalution such as continuous-eval and how to use them with Haystack, and then offered some ideas on how to expand your RAG architecture further than a simple two-step process.
+
Some resources mentioned in the talk:
+
+
Haystack: open-source LLM framework for RAG and beyond.
Announcement,
+slides and
+notebook.
+All resources can also be found on ODSC’s website and in
+my archive.
+Did you miss the talk? Check out the write-up here.
+
+
(Note: this is a recording of the notebook walkthrough only. The full recording will be shared soon).
+
+
+
+
+
+
+
+
At ODSC Europe 2024 I talked about building modern and reliable voice bots using Pipecat,
+a recently released open source tool. I gave an overview of the general structure of voice bots, of the improvements
+their underlying tech recently saw, and the new challenges that developers face when implementing one of these systems.
+
The main highlight of the talk is the notebook
+where I implement first a simple Pipecat bot from scratch, and then I give an overview of how to blend intent detection
+and system prompt switching to improve our control of how LLM bots interact with users.
At the AMTA 2024 Virtual Tutorial Day I talked about controlling invariant translation elements with RAG. During the talk several speakers intervened on the topic, each bringing a different perspective of it.
+
Georg Kirchner introduced the concept of invariant translation elements, such as brand names, UI elements, and corporate slogans. Christian Lang gave a comprehensive overview of the challenges of handling invariant translation elements with existing tools and how LLMs can help at various stages of the translation, covering several approaches, including RAG. Building on his overview, I showed how to implement a simple RAG system to handle these invariants properly using Haystack: we run a Colab notebook live and checked how the translation changes by introducing context about the invariants to the LLM making the translation. Last, Bruno Bitter gave an overview of how you can use Blackbird to integrate a system like this with existing CAT tools and manage the whole lifecycle of content translation.
For the Springer Nature AI Lab Opening Day I talk about LLM frameworks: what they are, when they can be useful, and how to choose and compare one framework to the other.
+
After an overview of six application frameworks (LangChain, LlamaIndex, Haystack, txtai, DSPy and CrewAI), we run a notebook where we used RAGAS to compare four small RAG applications and see which one performs better.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/talks/index.xml b/talks/index.xml
new file mode 100644
index 00000000..2df186b1
--- /dev/null
+++ b/talks/index.xml
@@ -0,0 +1,96 @@
+
+
+
+ Talks on Sara Zan
+ https://www.zansara.dev/talks/
+ Recent content in Talks on Sara Zan
+ Hugo
+ en
+ Wed, 30 Oct 2024 00:00:00 +0000
+
+
+ [UPCOMING] ODSC West: Building Reliable Voice Agents with Open Source tools
+ https://www.zansara.dev/talks/2024-10-30-odsc-west-voice-agents/
+ Wed, 30 Oct 2024 00:00:00 +0000
+ https://www.zansara.dev/talks/2024-10-30-odsc-west-voice-agents/
+
+
+
+ SNAIL Opening Day: Should I use an LLM Framework? (Private Event)
+ https://www.zansara.dev/talks/2024-10-01-snail-opening-day-should-i-use-an-llm-framework/
+ Tue, 01 Oct 2024 00:00:00 +0000
+ https://www.zansara.dev/talks/2024-10-01-snail-opening-day-should-i-use-an-llm-framework/
+ <p><a href="https://drive.google.com/file/d/1GQJ1qEY2hXQ6EBF-rtqzJqZzidfS7HfI/view?usp=sharing" class="external-link" target="_blank" rel="noopener">Slides</a>,
<a href="https://colab.research.google.com/drive/11aOq-43wEWhSlxtkdXEAwPEarC0IQ3eN?usp=sharing" class="external-link" target="_blank" rel="noopener">notebook</a>, <a href="https://huggingface.co/datasets/ZanSara/seven-wonders" class="external-link" target="_blank" rel="noopener">RAG dataset</a> and <a href="https://huggingface.co/datasets/ZanSara/seven-wonders-eval" class="external-link" target="_blank" rel="noopener">evaluation dataset</a>
All resources can also be found in
<a href="https://drive.google.com/drive/folders/1anl3adpxgbwq5nsFn8QXuofIWXX0jRKo?usp=sharing" class="external-link" target="_blank" rel="noopener">my archive</a>.</p>
<hr>
<div class='iframe-wrapper'>
<iframe src="https://drive.google.com/file/d/1AORVusaHVBqNvJ5OtctyB5TWQZSadoqT/preview" width=100% height=100% allow="autoplay"></iframe>
</div>
<p>Find the transcript <a href="https://drive.google.com/file/d/1wwnTFmGOANVmxUaVd1PC3cfztzIfSCEa/view?usp=sharing" class="external-link" target="_blank" rel="noopener">here</a>.</p>
<hr>
<p>For the <a href="https://group.springernature.com/gp/group" class="external-link" target="_blank" rel="noopener">Springer Nature</a> AI Lab Opening Day I talk about LLM frameworks: what they are, when they can be useful, and how to choose and compare one framework to the other.</p>
<p>After an overview of six application frameworks (<a href="https://www.langchain.com/" class="external-link" target="_blank" rel="noopener">LangChain</a>, <a href="https://www.llamaindex.ai/" class="external-link" target="_blank" rel="noopener">LlamaIndex</a>, <a href="https://haystack.deepset.ai/" class="external-link" target="_blank" rel="noopener">Haystack</a>, <a href="https://neuml.github.io/txtai/" class="external-link" target="_blank" rel="noopener">txtai</a>, <a href="https://dspy-docs.vercel.app/" class="external-link" target="_blank" rel="noopener">DSPy</a> and <a href="https://www.crewai.com/" class="external-link" target="_blank" rel="noopener">CrewAI</a>), we run a notebook where we used <a href="https://docs.ragas.io/en/latest/" class="external-link" target="_blank" rel="noopener">RAGAS</a> to compare four small RAG applications and see which one performs better.</p>
+
+
+ AMTA 2024 Virtual Tutorial Day: Controlling LLM Translations of Invariant Elements with RAG
+ https://www.zansara.dev/talks/2024-09-18-amta-2024-controlling-invariants-rag/
+ Wed, 18 Sep 2024 00:00:00 +0000
+ https://www.zansara.dev/talks/2024-09-18-amta-2024-controlling-invariants-rag/
+ <p><a href="https://amtaweb.org/virtual-tutorial-day-program/" class="external-link" target="_blank" rel="noopener">Announcement</a>,
<a href="https://colab.research.google.com/drive/1VMgK3DcVny_zTtAG_V3QSSdfSFBWAgmb?usp=sharing" class="external-link" target="_blank" rel="noopener">notebook</a> and
<a href="https://docs.google.com/spreadsheets/d/1A1zk-u-RTSqBfE8LksZxihnp7KxWO7YK/edit?usp=sharing&ouid=102297935451395786183&rtpof=true&sd=true" class="external-link" target="_blank" rel="noopener">glossary</a>.
All resources can also be found in
<a href="https://drive.google.com/drive/folders/1Tdq92P_E_77sErGjz7jSPfJ-or9UZXvn?usp=drive_link" class="external-link" target="_blank" rel="noopener">my archive</a>.</p>
<hr>
<p><em>Recording coming soon.</em></p>
<hr>
<p>At the <a href="https://amtaweb.org/virtual-tutorial-day-program/" class="external-link" target="_blank" rel="noopener">AMTA 2024 Virtual Tutorial Day</a> I talked about controlling invariant translation elements with RAG. During the talk several speakers intervened on the topic, each bringing a different perspective of it.</p>
<p><a href="https://www.linkedin.com/in/georgkirchner/" class="external-link" target="_blank" rel="noopener">Georg Kirchner</a> introduced the concept of invariant translation elements, such as brand names, UI elements, and corporate slogans. <a href="https://www.linkedin.com/in/christian-lang-8942b0145/" class="external-link" target="_blank" rel="noopener">Christian Lang</a> gave a comprehensive overview of the challenges of handling invariant translation elements with existing tools and how LLMs can help at various stages of the translation, covering several approaches, including RAG. Building on his overview, I showed how to implement a simple RAG system to handle these invariants properly using <a href="https://haystack.deepset.ai/?utm_campaign=amta-2024" class="external-link" target="_blank" rel="noopener">Haystack</a>: we run a <a href="https://colab.research.google.com/drive/1VMgK3DcVny_zTtAG_V3QSSdfSFBWAgmb?usp=sharing" class="external-link" target="_blank" rel="noopener">Colab notebook</a> live and checked how the translation changes by introducing context about the invariants to the LLM making the translation. Last, <a href="https://www.linkedin.com/in/brunobitter/" class="external-link" target="_blank" rel="noopener">Bruno Bitter</a> gave an overview of how you can use <a href="https://www.blackbird.io/" class="external-link" target="_blank" rel="noopener">Blackbird</a> to integrate a system like this with existing CAT tools and manage the whole lifecycle of content translation.</p>
+
+
+ ODSC Europe: Building Reliable Voice Agents with Open Source tools
+ https://www.zansara.dev/talks/2024-09-05-odsc-europe-voice-agents/
+ Fri, 06 Sep 2024 00:00:00 +0000
+ https://www.zansara.dev/talks/2024-09-05-odsc-europe-voice-agents/
+ <p><a href="https://odsc.com/speakers/building-reliable-voice-agents-with-open-source-tools-2/" class="external-link" target="_blank" rel="noopener">Announcement</a>,
<a href="https://drive.google.com/file/d/1ubk7Q_l9C7epQgYrMttHMjW1AVfdm-LT/view?usp=sharing" class="external-link" target="_blank" rel="noopener">slides</a> and
<a href="https://colab.research.google.com/drive/1NCAAs8RB2FuqMChFKMIVWV0RiJr9O3IJ?usp=sharing" class="external-link" target="_blank" rel="noopener">notebook</a>.
All resources can also be found on ODSC’s website and in
<a href="https://drive.google.com/drive/folders/1rrXMTbfTZVuq9pMzneC8j-5GKdRQ6l2i?usp=sharing" class="external-link" target="_blank" rel="noopener">my archive</a>.
Did you miss the talk? Check out the write-up <a href="https://www.zansara.dev/posts/2024-09-05-odsc-europe-voice-agents-part-1/" >here</a>.</p>
<hr>
<p><em>(Note: this is a recording of the notebook walkthrough only. The full recording will be shared soon).</em></p>
<div class='iframe-wrapper'>
<iframe src="https://drive.google.com/file/d/15Kv8THmDsnnzfVBhHAf2O11RccpzAzYK/preview" width=100% height=100% allow="autoplay"></iframe>
</div>
<p>At <a href="https://odsc.com/europe/" class="external-link" target="_blank" rel="noopener">ODSC Europe 2024</a> I talked about building modern and reliable voice bots using Pipecat,
a recently released open source tool. I gave an overview of the general structure of voice bots, of the improvements
their underlying tech recently saw, and the new challenges that developers face when implementing one of these systems.</p>
+
+
+ EuroPython: Is RAG all you need? A look at the limits of retrieval augmented generation
+ https://www.zansara.dev/talks/2024-07-10-europython-rag/
+ Wed, 10 Jul 2024 00:00:00 +0000
+ https://www.zansara.dev/talks/2024-07-10-europython-rag/
+ <p><a href="https://ep2024.europython.eu/session/is-rag-all-you-need-a-look-at-the-limits-of-retrieval-augmented-generation" class="external-link" target="_blank" rel="noopener">Announcement</a>,
<a href="https://drive.google.com/file/d/13OXMLaBQr1I_za7sqVHJWxRj5xFAg7KV/view?usp=sharing" class="external-link" target="_blank" rel="noopener">slides</a>.
Did you miss the talk? Check out the recording on <a href="https://youtu.be/9wk7mGB_Gp4?feature=shared" class="external-link" target="_blank" rel="noopener">Youtube</a>
or on my <a href="https://drive.google.com/file/d/1OkYQ7WMt63QkdJTU3GIpSxBZmnLfZti6/view?usp=sharing" class="external-link" target="_blank" rel="noopener">backup</a> (cut from the <a href="https://www.youtube.com/watch?v=tcXmnCJIvFc" class="external-link" target="_blank" rel="noopener">original stream</a>),
or read the <a href="https://www.zansara.dev/posts/2024-04-29-odsc-east-rag" >write-up</a> of a previous edition of the same talk.</p>
<hr>
<div class='iframe-wrapper'>
<iframe src="https://drive.google.com/file/d/1OkYQ7WMt63QkdJTU3GIpSxBZmnLfZti6/preview" width=100% height=100% allow="autoplay"></iframe>
</div>
<p>At <a href="https://ep2024.europython.eu/" class="external-link" target="_blank" rel="noopener">EuroPython 2024</a> I talked about RAG: how it works, how it fails, and how to evaluate its performance objectively. I gave an overview of some useful open-source tools for RAG evalution such as <a href="https://docs.relari.ai/v0.3?utm_campaign=europython-2024" class="external-link" target="_blank" rel="noopener">continuous-eval</a> and how to use them with <a href="https://haystack.deepset.ai/?utm_campaign=europython-2024" class="external-link" target="_blank" rel="noopener">Haystack</a>, and then offered some ideas on how to expand your RAG architecture further than a simple two-step process.</p>
+
+
+ ODSC East: RAG, the bad parts (and the good!)
+ https://www.zansara.dev/talks/2024-04-25-odsc-east-rag/
+ Thu, 25 Apr 2024 00:00:00 +0000
+ https://www.zansara.dev/talks/2024-04-25-odsc-east-rag/
+ <p><a href="https://odsc.com/speakers/rag-the-bad-parts-and-the-good-building-a-deeper-understanding-of-this-hot-llm-paradigms-weaknesses-strengths-and-limitations/" class="external-link" target="_blank" rel="noopener">Announcement</a>,
<a href="https://drive.google.com/file/d/19EDFCqOiAo9Cvx5fxx6Wq1Z-EoMKwxbs/view?usp=sharing" class="external-link" target="_blank" rel="noopener">slides</a>.
Did you miss the talk? Check out the <a href="https://www.zansara.dev/posts/2024-04-29-odsc-east-rag" >write-up</a>.</p>
<hr>
<p>At <a href="https://odsc.com/boston/" class="external-link" target="_blank" rel="noopener">ODSC East 2024</a> I talked about RAG: how it works, how it fails, and how to evaluate its performance objectively. I gave an overview of some useful open-source tools for RAG evalution and how to use them with <a href="https://haystack.deepset.ai/?utm_campaign=odsc-east" class="external-link" target="_blank" rel="noopener">Haystack</a>, and then offered some ideas on how to expand your RAG architecture further than a simple two-step process.</p>
<p>Some resources mentioned in the talk:</p>
+
+
+ DataHour: Optimizing LLMs with Retrieval Augmented Generation and Haystack 2.0
+ https://www.zansara.dev/talks/2023-12-15-datahour-rag/
+ Fri, 15 Dec 2023 00:00:00 +0000
+ https://www.zansara.dev/talks/2023-12-15-datahour-rag/
+ <p><a href="https://drive.google.com/file/d/1OkFr4u9ZOraJRF406IQgQh4YC8GLHbzA/view?usp=drive_link" class="external-link" target="_blank" rel="noopener">Recording</a>, <a href="https://drive.google.com/file/d/1n1tbiUW2wZPGC49WK9pYEIZlZuCER-hu/view?usp=sharing" class="external-link" target="_blank" rel="noopener">slides</a>, <a href="https://drive.google.com/file/d/17FXuS7X70UF02IYmOr-yEDQYg_gp9cFv/view?usp=sharing" class="external-link" target="_blank" rel="noopener">Colab</a>, <a href="https://gist.github.com/ZanSara/6075d418c1494e780f7098db32bc6cf6" class="external-link" target="_blank" rel="noopener">gist</a>. All the material can also be found on <a href="https://community.analyticsvidhya.com/c/datahour/optimizing-llms-with-retrieval-augmented-generation-and-haystack-2-0" class="external-link" target="_blank" rel="noopener">Analytics Vidhya’s community</a> and on <a href="https://drive.google.com/drive/folders/1KwCEDTCsm9hrRaFUPHpzdTpVsOJSnvGk?usp=drive_link" class="external-link" target="_blank" rel="noopener">my backup</a>.</p>
<hr>
<div class='iframe-wrapper'>
<iframe src="https://drive.google.com/file/d/1OkFr4u9ZOraJRF406IQgQh4YC8GLHbzA/preview" width=100% height=100% allow="autoplay"></iframe>
</div>
<p>In this hour-long workshop organized by <a href="https://www.analyticsvidhya.com/" class="external-link" target="_blank" rel="noopener">Analytics Vidhya</a> I give an overview of what RAG is, what problems it solves, and how it works.</p>
<p>After a brief introduction to Haystack, I show in practice how to use Haystack 2.0 to assemble a Pipeline that performs RAG on a local database and then on the Web with a simple change.</p>
+
+
+ Pointer[183]: Haystack, creare LLM Applications in modo facile
+ https://www.zansara.dev/talks/2023-12-15-pointerpodcast-haystack/
+ Fri, 15 Dec 2023 00:00:00 +0000
+ https://www.zansara.dev/talks/2023-12-15-pointerpodcast-haystack/
+ <p><a href="https://pointerpodcast.it/p/pointer183-haystack-creare-llm-applications-in-modo-facile-con-stefano-fiorucci-e-sara-zanzottera" class="external-link" target="_blank" rel="noopener">Episode link</a>. Backup recording <a href="https://drive.google.com/file/d/1BOoAhfvWou_J4J7RstgKAHPs3Pre2YAw/view?usp=sharing" class="external-link" target="_blank" rel="noopener">here</a>.</p>
<hr>
<p><em>The podcast was recorded in Italian for <a href="https://pointerpodcast.it" class="external-link" target="_blank" rel="noopener">PointerPodcast</a> with <a href="https://www.linkedin.com/in/luca-corbucci-b6156a123/" class="external-link" target="_blank" rel="noopener">Luca Corbucci</a>, <a href="https://www.linkedin.com/in/eugenio-paluello-851b3280/" class="external-link" target="_blank" rel="noopener">Eugenio Paluello</a> and <a href="https://www.linkedin.com/in/stefano-fiorucci/" class="external-link" target="_blank" rel="noopener">Stefano Fiorucci</a>.</em></p>
<hr>
<div>
<audio controls src="https://hosting.pointerpodcast.it/records/pointer183.mp3" style="width: 100%"></audio>
</div>
<p>Per concludere in grande stile il 2023, in questa puntata ci occupiamo delle LLM che sono state un argomento centrale della scena tech dell’anno che sta per terminare. Abbiamo invitato due esperti del settore, Sara Zanzottera e Stefano Fiorucci.</p>
<p>Entrambi i nostri ospiti lavorano per deepset come NLP Engineer. Deepset è l’azienda produttrice di Haystack uno dei framework opensource per LLM più noti, che ha da poco raggiunto la versione 2.0 beta. Proprio Haystack è stato uno degli argomenti di cui ci siamo occupati con i nostri ospiti, cercando di capirne le potenzialità.</p>
+
+
+ Office Hours: RAG Pipelines
+ https://www.zansara.dev/talks/2023-10-12-office-hours-rag-pipelines/
+ Thu, 12 Oct 2023 00:00:00 +0000
+ https://www.zansara.dev/talks/2023-10-12-office-hours-rag-pipelines/
+ <p><a href="https://drive.google.com/file/d/1UXGi4raiCQmrxOfOexL-Qh0CVbtiSm89/view?usp=drive_link" class="external-link" target="_blank" rel="noopener">Recording</a>, <a href="https://gist.github.com/ZanSara/5975901eea972c126f8e1c2341686dfb" class="external-link" target="_blank" rel="noopener">notebook</a>. All the material can also be found <a href="https://drive.google.com/drive/folders/17CIfoy6c4INs0O_X6YCa3CYXkjRvWm7X?usp=drive_link" class="external-link" target="_blank" rel="noopener">here</a>.</p>
<hr>
<div class='iframe-wrapper'>
<iframe src="https://drive.google.com/file/d/1UXGi4raiCQmrxOfOexL-Qh0CVbtiSm89/preview" width="100%" height="100%" allow="autoplay"></iframe>
</div>
<p>In this <a href="https://discord.com/invite/VBpFzsgRVF" class="external-link" target="_blank" rel="noopener">Office Hours</a> I walk through the LLM support offered by Haystack 2.0 to this date: Generator, PromptBuilder, and how to connect them to different types of Retrievers to build Retrieval Augmented Generation (RAG) applications.</p>
<p>In under 40 minutes we start from a simple query to ChatGPT up to a full pipeline that retrieves documents from the Internet, splits them into chunks and feeds them to an LLM to ground its replies.</p>
+
+
+ Office Hours: Haystack 2.0
+ https://www.zansara.dev/talks/2023-08-03-office-hours-haystack-2.0-status/
+ Thu, 03 Aug 2023 00:00:00 +0000
+ https://www.zansara.dev/talks/2023-08-03-office-hours-haystack-2.0-status/
+ <p><a href="https://drive.google.com/file/d/1PyAlvJ22Z6o1bls07Do5kx2WMTdotsM7/view?usp=drive_link" class="external-link" target="_blank" rel="noopener">Recording</a>, <a href="https://drive.google.com/file/d/1QFNisUk2HzwRL_27bpr338maxLvDBr9D/preview" class="external-link" target="_blank" rel="noopener">slides</a>. All the material can also be found <a href="https://drive.google.com/drive/folders/1zmXwxsSgqDgvYf2ptjHocdtzOroqaudw?usp=drive_link" class="external-link" target="_blank" rel="noopener">here</a>.</p>
<hr>
<div class='iframe-wrapper'>
<iframe src="https://drive.google.com/file/d/1PyAlvJ22Z6o1bls07Do5kx2WMTdotsM7/preview" width="100%" height="100%" allow="autoplay"></iframe>
</div>
<p>In this <a href="https://discord.com/invite/VBpFzsgRVF" class="external-link" target="_blank" rel="noopener">Office Hours</a> I’ve presented for the first time to our Discord community a preview of the upcoming 2.0 release of Haystack, which has been in the works since the start of the year. As rumors started to arise at the presence of a <code>preview</code> module in the latest Haystack 1.x releases, we took the opportunity to share this early draft of the project to collect early feedback.</p>
+
+
+ OpenNLP Meetup: A Practical Introduction to Image Retrieval
+ https://www.zansara.dev/talks/2022-12-01-open-nlp-meetup/
+ Thu, 01 Dec 2022 00:00:00 +0000
+ https://www.zansara.dev/talks/2022-12-01-open-nlp-meetup/
+ <p><a href="https://www.youtube.com/watch?v=7Idjl3OR0FY" class="external-link" target="_blank" rel="noopener">Youtube link</a>,
<a href="https://gist.github.com/ZanSara/dc4b22e7ffe2a56647e0afba7537c46b" class="external-link" target="_blank" rel="noopener">slides</a>, <a href="https://gist.github.com/ZanSara/9e8557830cc866fcf43a2c5623688c74" class="external-link" target="_blank" rel="noopener">Colab</a> (live coding).
All the material can also be found <a href="https://drive.google.com/drive/folders/1_3b8PsvykHeM0jSHsMUWQ-4h_VADutcX?usp=drive_link" class="external-link" target="_blank" rel="noopener">here</a>.</p>
<hr>
<div class='iframe-wrapper'>
<iframe src="https://drive.google.com/file/d/19mxD-xUJ-14G-2XAqXEVpZfqR2MsSZTn/preview" width="100%" height="100%" allow="autoplay"></iframe>
</div>
<h2 id="a-practical-introduction-to-image-retrieval">
A Practical Introduction to Image Retrieval
<a class="heading-link" href="#a-practical-introduction-to-image-retrieval">
<i class="fa fa-link" aria-hidden="true" title="Link to heading"></i>
<span class="sr-only">Link to heading</span>
</a>
</h2>
<p><em>by Sara Zanzottera from deepset</em></p>
<p>Search should not be limited to text only. Recently, Transformers-based NLP models started crossing the boundaries of text data and exploring the possibilities of other modalities, like tabular data, images, audio files, and more. Text-to-text generation models like GPT now have their counterparts in text-to-image models, like Stable Diffusion. But what about search? In this talk we’re going to experiment with CLIP, a text-to-image search model, to look for animals matching specific characteristics in a dataset of pictures. Does CLIP know which one is “The fastest animal in the world”?</p>
+
+
+ ZanzoCam: An open-source alpine web camera
+ https://www.zansara.dev/talks/2021-05-24-zanzocam-pavia/
+ Mon, 24 May 2021 00:00:00 +0000
+ https://www.zansara.dev/talks/2021-05-24-zanzocam-pavia/
+ <p>Slides: <a href="https://www.zansara.dev/talks/2021-05-24-zanzocam-pavia.pdf" >ZanzoCam: An open-source alpine web camera</a></p>
<hr>
<p>On May 24th 2021 I held a talk about the <a href="https://zanzocam.github.io/en" class="external-link" target="_blank" rel="noopener">ZanzoCam project</a>
as invited speaker for the <a href="http://hsw2021.gnudd.com/" class="external-link" target="_blank" rel="noopener">“Hardware and Software Codesign”</a> course at
<a href="https://portale.unipv.it/it" class="external-link" target="_blank" rel="noopener">Università di Pavia</a>.</p>
<p>The slides go through the entire lifecycle of the <a href="https://zanzocam.github.io/en" class="external-link" target="_blank" rel="noopener">ZanzoCam project</a>,
from the very inception of it, the market research, our decision process, earlier prototypes, and
then goes into a more detailed explanation of the the design and implementation of the project from
a hardware and software perspective, with some notes about our financial situation and project management.</p>
+
+
+
diff --git a/talks/page/1/index.html b/talks/page/1/index.html
new file mode 100644
index 00000000..94491636
--- /dev/null
+++ b/talks/page/1/index.html
@@ -0,0 +1,10 @@
+
+
+
+ https://www.zansara.dev/talks/
+
+
+
+
+
+
diff --git a/themes/hugo-coder/.gitignore b/themes/hugo-coder/.gitignore
deleted file mode 100644
index 2be26dd2..00000000
--- a/themes/hugo-coder/.gitignore
+++ /dev/null
@@ -1,6 +0,0 @@
-.idea
-**/themes/
-exampleSite/public/
-exampleSite/resources/
-*.lock
-public
diff --git a/themes/hugo-coder/CONTRIBUTORS.md b/themes/hugo-coder/CONTRIBUTORS.md
deleted file mode 100644
index 3b29b679..00000000
--- a/themes/hugo-coder/CONTRIBUTORS.md
+++ /dev/null
@@ -1,136 +0,0 @@
-# Contributors
-
-- [Chip Senkbeil](https://github.com/chipsenkbeil)
-- [Dale Noe](https://github.com/dalenoe)
-- [Gabor Nagy](https://github.com/Aigeruth)
-- [Harry Khanna](https://github.com/hkhanna)
-- [Ihor Dvoretskyi](https://github.com/idvoretskyi)
-- [Jacob Wood](https://github.com/jacoblukewood)
-- [Jan Baudisch](https://github.com/flyingP0tat0)
-- [Jiri Hubacek](https://github.com/qeef)
-- [Khosrow Moossavi](https://github.com/khos2ow)
-- [Maikel](https://github.com/mbollemeijer)
-- [MetBril](https://github.com/metbril)
-- [Myles Johnson](https://github.com/MylesJohnson)
-- [Niels Reijn](https://github.com/reijnn)
-- [Padraic Renaghan](https://github.com/prenagha)
-- [peterrus](https://github.com/peterrus)
-- [Philipp Rintz](https://github.com/p-rintz)
-- [Ralf Junghanns](https://github.com/rabbl)
-- [rdhox](https://rdhox.io)
-- [tobaloidee](https://github.com/Tobaloidee)
-- [Tomasz Wąsiński](https://github.com/wasinski)
-- [Vinícius dos Santos Oliveira](https://github.com/vinipsmaker)
-- [Vlad Ionescu](https://github.com/Vlaaaaaaad)
-- [Joseph Ting](https://github.com/josephting)
-- [Abner Campanha](https://github.com/abnerpc)
-- [Martin Kiesel](https://github.com/Kyslik)
-- [John Tobin](https://www.johntobin.ie/)
-- [Thomas Nys](https://thomasnys.com)
-- [Piotr Januszewski](https://piojanu.github.io)
-- [Artem Khvastunov](https://artspb.me)
-- [Gabriel Nepomuceno](https://blog.nepomuceno.me)
-- [Salvatore Giordano](https://salvatore-giordano.github.io)
-- [Jeffrey Carpenter](https://uvolabs.me)
-- [Paul Lettington](https://github.com/plett)
-- [Thomas Vochten](https://github.com/thomasvochten)
-- [Caspar Krieger](https://www.asparck.com)
-- [D_DAndrew](https://d-dandrew.github.io)
-- [Wataru Mizukami](https://github.com/tarumzu)
-- [Yudi Widiyanto](https://github.com/yudiwdynto)
-- [Łukasz Mróz](https://github.com/mrozlukasz)
-- [Jia "Jay" Tan](https://github.com/j7an)
-- [Ryan](https://github.com/alrayyes)
-- [Naim A.](https://github.com/naim94a)
-- [Alexander Rohde](https://github.com/a1x42)
-- [Shreyansh Khajanchi](https://shreyanshja.in)
-- [Lionel Brianto](https://lionel.brianto.dev)
-- [Luis Zarate](https://github.com/jlzaratec)
-- [Ariejan de Vroom](https://www.devroom.io)
-- [Bobby Lindsey](https://bobbywlindsey.com)
-- [José Mª Escartín](https://github.com/jme52)
-- [John Schroeder](https://blog.schroedernet.software)
-- [Tobias Lindberg](https://github.com/tobiasehlert)
-- [KK](https://github.com/bebound)
-- [Eli W. Hunter](https://github.com/elihunter173)
-- [Víctor López](https://github.com/viticlick)
-- [Anson VanDoren](https://github.com/anson-vandoren)
-- [Michael Lynch](https://github.com/mtlynch)
-- [FIGBERT](https://figbert.com/)
-- [Yash Mehrotra](https://yashmehrotra.com)
-- [Paolo Mainardi](https://paolomainardi.com)
-- [Ka-Wai Lin](https://github.com/kwlin)
-- [Piotr Orzechowski](https://orzechowski.tech)
-- [Glenn Feunteun](https://github.com/gfeun)
-- [Santiago González](https://github.com/netrules)
-- [Codruț Constantin Gușoi](https://www.codrut.pro)
-- [Clément Pannetier](https://clementpannetier.dev)
-- [FantasticMao](https://github.com/FantasticMao)
-- [Utkarsh Gupta](https://utkarsh2102.com)
-- [Latiif Alsharif](https://latiif.se)
-- [Endormi](https://endormi.io)
-- [Rajiv Ranjan Singh](https://iamrajiv.github.io/)
-- [Pakhomov Alexander](https://github.com/PakhomovAlexander)
-- [Rhys Perry](https://rhysperry.com)
-- [Arunvel Sriram](https://github.com/arunvelsriram)
-- [Lorenzo Cameroni](https://github.com/came88)
-- [Jared Sturdy](https://github.com/jsturdy)
-- [Daniel Monteiro](https://github.com/dfamonteiro)
-- [Dave Rolsky](https://github.com/autarch)
-- [Joseph Sanders](https://github.com/jls83)
-- [Rabin Adhikari](https://github.com/rabinadk1/)
-- [Hussaini Zulkifli](https://github.com/hussaini/)
-- [Ellison Leão](https://github.com/ellisonleao)
-- [Lucas de Oliveira](https://github.com/lucas-dOliveira)
-- [Jian Loong Liew](https://github.com/JianLoong)
-- [earnest ma](https://github.com/earnestma)
-- [TMineCola](https://github.com/tminecola)
-- [Arafat Hasan](https://github.com/arafat-hasan)
-- [YUJI](https://yuji.ne.jp/)
-- [JaeSang Yoo](https://github.com/JSYoo5B)
-- [tianheg](https://github.com/tianheg)
-- [Felix](https://github.com/lazyyz)
-- [Peter Duchnovsky](https://pduchnovsky.com)
-- [Alex Miranda](https://ammiranda.com)
-- [Alphonse Mariya](https://github.com/alfunx)
-- [Ziwei Pan](https://github.com/PanZiwei/)
-- [Viktar Patotski](https://github.com/xp-vit)
-- [cuso4-5h2o](https://www.cuso4.me)
-- [freeformz](https://icanhazdowntime.org)
-- [Roberto Gongora](https://yourfavourite.blog)
-- [kuba86](https://kuba86.com)
-- [Vladislav Matus](https://github.com/matusvla)
-- [Kirill Feoktistov](https://feoktistoff.org)
-- [leins275](https://github.com/LanskovNV)
-- [Michael Weiss](https://mweiss.ch)
-- [Simon Pai](https://github.com/simonpai)
-- [Brenton Mallen](https://github.com/brentonmallen1)
-- [Xiaoyang Luo](https://github.com/ccviolett/)
-- [Michiel Appelman](https://appelman.se)
-- [Mark Wood](https://digitalnotions.net)
-- [Sam A.](https://samsapti.dev)
-- [John Feminella](https://jxf.me)
-- [zzsqwq](https://zzsqwq.cn)
-- [George Tsiokos](https://george.tsiokos.com)
-- [Eltjo](https://github.com/eltjo)
-- [Saurmandal](https://saur.neocities.org)
-- [Jneo8](https://github.com/jneo8)
-- [Daniel Nduati](https://github.com/DanNduati)
-- [Simon Hollingshead](https://github.com/simonhollingshead)
-- [yangyangdaji](https://github.com/yangyangdaji)
-- [xiaotianxt](https://github.com/xiaotianxt)
-- [Nour Agha](https://github.com/nourkagha)
-- [Brian Lachniet](https://github.com/blachniet)
-- [ShortArrow](https://github.com/ShortArrow)
-- [Martin Hellspong](https://github.com/marhel)
-- [Robert Tucker](https://github.com/robertwtucker)
-- [Michał Pawlik](https://michalp.net)
-- [Kilian Kluge](https://github.com/ionicsolutions)
-- [Jaroslaw Rozanski](https://jarekrozanski.eu)
-- [Easton Man](https://github.com/eastonman)
-- [Yiğit Altınay](https://altinay.xyz)
-- [Fei Kong](https://github.com/alpha0422)
-- [Ahmet Enes Bayraktar](https://github.com/aeb-dev)
-- [Todor Bogosavljević](https://github.com/tbx1b)
-- [Kemal Akkoyun](https://github.com/kakkoyun)
-- [Igetin](https://github.com/Igetin)
diff --git a/themes/hugo-coder/LICENSE.md b/themes/hugo-coder/LICENSE.md
deleted file mode 100644
index 29dbd757..00000000
--- a/themes/hugo-coder/LICENSE.md
+++ /dev/null
@@ -1,20 +0,0 @@
-The MIT License (MIT)
-
-Copyright (c) 2018 Luiz F. A. de Prá
-
-Permission is hereby granted, free of charge, to any person obtaining a copy of
-this software and associated documentation files (the "Software"), to deal in
-the Software without restriction, including without limitation the rights to
-use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
-the Software, and to permit persons to whom the Software is furnished to do so,
-subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
-FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
-COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
-IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
-CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/themes/hugo-coder/README.md b/themes/hugo-coder/README.md
deleted file mode 100644
index 482a132f..00000000
--- a/themes/hugo-coder/README.md
+++ /dev/null
@@ -1,53 +0,0 @@
-