Skip to content

RAG collection

John Pazarzis edited this page Oct 1, 2024 · 1 revision

Creating a Custom RAG Collection

Overview

A RAG collection is a fundamental component of the RAGit system. It is uniquely identified by a collection name or simply name. This document outlines the steps involved in creating and managing a custom RAG collection.

Definition: RAG Collection (or simply Collection)

A RAG Collection is a collection of documents stored under the shared directory (ragit-data). Assuming we have a collection called mydata, its related data will exist under the following directory:

~/ragit-data/mydata/documents

Prepare the Documents Directory

To create a new RAG collection, you need to prepare the documents directory where your collection's documents will be stored.

Create a Directory

Create a directory to store your collection's documents:

mkdir -p ~/ragit-data/<collection-name>/documents

Replace <collection-name> with the desired name for your collection.

Copy Relevant Documents

After you create the above directory, copy all relevant documents into the newly created documents directory:

cp path/to/your/documents/* ~/ragit-data/<collection-name>/documents/

Process Documents and Create Index

The ragit command is available from anywhere under the VM and can be used to interact with the backend of the RAGit service. More precisely, the following is the available functionality:

Display Available RAG Collections

List all available RAG collections using the following command:

ragit -l

Example output:

dummy
mycode
stories

Show the Statistics for a RAG Collection

Display statistics for a specific RAG collection using the following command, replacing <collection-name> with your collection's name:

ragit -n <collection-name>

Example output:

name.....................: stories
full path................: /home/vagrant/ragit-data/stories/documents
total documents..........: 4
total documents in db....: 4
total chunks.............: 21
with embeddings..........: 21
without embeddings.......: 0
inserted to vectordb.....: 21
to insert to vector db...: 0

Process the Data for a RAG Collection

Process the available documents for a specific RAG collection using the following command, replacing <collection-name> with your collection's name:

ragit -n <collection-name> -p

Example output:

Will insert all available chunks to the database.
Inserted 0 chunks.
Will insert all available embeddings to the database.
Inserted 0 embeddings.
updating the vector db.
Totally inserted records: 0
Inserted 0 chunks to the vector db.

Summary

By following these steps, you can create and manage a custom RAG collection within the RAGit framework. This process involves setting up the documents directory, copying relevant documents, and using RAGit's command-line tools to process and manage your collection. This ensures that your data is properly indexed and ready for use in RAG-based applications.