Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In RAG, attached chunks are joined without separator, potentially leading to concatenations that are misleading for the LLM #229

Open
Boscop opened this issue Jan 21, 2025 · 0 comments

Comments

@Boscop
Copy link

Boscop commented Jan 21, 2025

There no separation inserted between the retrieved chunks when they get inserted into the user prompt (considering that the chunks could be split mid-sentence and are usually trimmed of whitespace)?

impl CompletionRequest {
    pub(crate) fn prompt_with_context(&self) -> String {
        if !self.documents.is_empty() {
            format!(
                "<attachments>\n{}</attachments>\n\n{}",
                self.documents
                    .iter()
                    .map(|doc| doc.to_string())
                    .collect::<Vec<_>>()
                    .join(""),
                self.prompt
            )

Without separation, two incomplete sentences (the ending of one chunk and the beginning of another chunk) could form a new sentence that the LLM will interpret, which can give semantically incorrect information!

Image
#184

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant