Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File Citation for File Search (Retrieval) #160

Open
marioseixas opened this issue Jun 23, 2024 · 3 comments
Open

File Citation for File Search (Retrieval) #160

marioseixas opened this issue Jun 23, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@marioseixas
Copy link

i managed to do that in openai assistant playground by:

  • before adding the documents to the vector store, log the network requests and the har file. from that we get the proper full list linking each pdf to its respectively “file_id”;
  • then, before running an assistant request at the OAI playground, again record the network requests and download the har file at the end;
  • with that last har file we get the “sources” that shows at the output linked to their corresponding “file_citation” and “file_id” and index position of the embedding of the excerpt used;
  • then construct an script to find and replace everything by regex for proper syntax referencing for each passage (APA, ABNT, MLA, etc)
@jackitaliano jackitaliano changed the title about “ Show the quote from the file used to generate the response when BIDARA uses knowledge retrieval. https://platform.openai.com/docs/assistants/how-it-works/message-annotations” File Citation for File Search (Retrieval) Jun 23, 2024
@jackitaliano
Copy link
Contributor

Nice to haves

...

@jackitaliano
Copy link
Contributor

jackitaliano commented Jun 23, 2024

File citations found during "thread.message.delta" event in event stream (with streaming).

Example:

{
  "id":"msg_rn9sYJnDGgP1CIliFFOjsnPm",
  "object":"thread.message.delta",
  "delta": {
    "content": [ 
      { 
        "index":0,
        "type":"text",
        "text": {
          "value":"【8:0†source】",
          "annotations": [
            {
              "index":0,
              "type":"file_citation",
              "text":"【8:0†source】",
              "start_index":536,
              "end_index":548,
              "file_citation": {
                  "file_id":"file-y6QXl1TdKON4MIoJKLlZ4cKf",
                  "quote":"<quote from file here>"
              }
            }
          ]
        }
      }
    ]
  }
}

Currently, bidara-deep-chat is not using steaming, so this will not be the same. However, streaming is likely to be implemented soon (see #73), so it might be beneficial to plan on implementing for that rather than having to update it again afterwards.

Assuming streaming is implemented, these object can be accessed via:
AssistantDeepChat.svelte

<script>
...
async function responseInterceptor(response) {
  if (response.object === "thread.message.delta") {
    const newContent = response.delta.content.map((content) => {
      const newContents = content.map((msg) => {
        if (msg.type !== "text") {
          return msg;
        }
        
        msg.annotations.forEach((annotation) => {
          if (annotation.type === "file_citation") {
            const quote = `Quote:\n"${annotation.file_citation.quote}"`;
            msg.text.replace(annotation.text, quote);
          }
        });
        
        return msg;
      });
      
      return newContents;
    });
    
    response.delta.content = newContent;
  }
  
  return response;
}
...
</script>

This would also need to be implemented in thread loading via:
threadUtils.js

async function convertThreadMessagesToMessages(threadId, threadMessages) { ... }

In both cases, likely best to add something like a handleCitations(...) function.

Not entirely sure what the best form of replacement is for these citations. Could do something like the inline quote, cite them as they are with quotes at the bottom, proper citations (MLA, APA, etc. like you mentioned) inline or at the bottom, or something else entirely if you had any ideas.

@jackitaliano jackitaliano added the enhancement New feature or request label Jun 23, 2024
@marioseixas
Copy link
Author

marioseixas commented Jun 23, 2024

enclose within a bibtex citation syntax ‘\cite{filename.pdf}’ or ‘\cite{filename}’ at the end of every sentence with a ‘source’ mark and at the end a bibtex entry with the document data keys and the quote as bibtex comment within

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant