LLM training is based on a large corpus of text, but it is not exhaustive. It is useful to provide specific information to the model as context to get pertinent responses without additional training. As we saw in the previous exercise, LLM has some limitations in terms of token volume and RAG is a good way to overcome this limitation.
In this exercise, we will implement RAG pattern.
This pattern consists of three steps:
- Extract, Transform and Load (ETL) process to get document content, chunk it, and store it in a vector database
- Query vector database for similarities to retrieve piece of information about the question
- Query LLM with the question and the retrieved information as context
We will use ETL and transformers features from Sprint AI and Redis as a vector database.
Uncomment the following dependencies in the pom.xml
file:
- spring-ai-transformers-spring-boot-starter
- spring-ai-tika-document-reader
- spring-ai-redis
- jedis
Uncomment the following properties in the application.yml
file to configure connection to Redis instance:
spring:
ai:
vectorstore:
redis:
uri: redis://localhost:6379
index: "default-index"
prefix: "default:"
Modify the RAGDataService
class to complete the ETL implementation.
- Add constructor that accepts
VectorStore
object andResource
annotated with@Value("classpath:data/rental-general-conditions.pdf")
and set them to corresponding attributes.
private RAGDataService(VectorStore vectorStore,
@Value("classpath:data/rental-general-conditions.pdf")
Resource document) {
this.vectorStore = vectorStore;
this.document = document;
}
- Complete the
extract()
method to read document with a newTikaDocumentReader
object and return its content.
final TikaDocumentReader reader = new TikaDocumentReader(document);
return reader.get();
- Complete the
transform()
method to chunk the document content with a newTokenTextSplitter
object with the followings parameters and return the chunks.
TextSplitter textSplitter =
new TokenTextSplitter(
100, // defaultChunkSize
50, // minChunkSizeChars
20, // minChunkLengthToEmbed
100, // maxNumChunks
true // keepSeparator
);
return textSplitter.apply(documents);
- Complete the
load()
method to store the chunks in a vector database by callingaccept
method onvectorStore
attribute.
This ETL process will only be executed during application startup.
Still in the RAGDataService
class, complete the getContextForQuestion()
method to search for similar chunks to the question in the vector database and return them as string separated by line breaks.
public String getContextForQuestion(String question) {
List<String> chunks = vectorStore.similaritySearch(question)
.stream()
.map(Document::getContent).toList();
System.out.println(chunks.size() + " chunks found");
return String.join("\n", chunks);
}
Modify the RAGService
class.
Add a constructor that accepts ChatClient.Builder
, RAGDataService
and Resource
objects and set them to corresponding attributes.
And instantiate promptTemplate
attribute.
public RAGService(ChatClient.Builder builder, RAGDataService dataService, @Value("classpath:/prompt-system.md") Resource promptSystem) {
this.chatClient = builder
.defaultSystem(promptSystem)
.build();
this.dataService = dataService;
promptTemplate = new PromptTemplate("""
Context:
{context}
Question:
{question}
""");
}
Complete the getResponse()
method.
- Call
getContextForQuestion
method ondataService
attribute with the question as argument and store the result in acontext
variable.
String context = dataService.getContextForQuestion(question);
- Map the
context
and thequestion
with thepromptTemplate
usingcreateMessage()
method.
Message message = promptTemplate.createMessage(Map.of("context", context, "question", question));
- Call the LLM with this block of code and return the response.
Prompt prompt = new Prompt(message);
OllamaOptions options = OllamaOptions.builder()
.model("mistral:7b")
.temperature(0.9)
.build();
System.out.println("Preparing the answer...");
return chatClient.prompt(prompt).options(options)
.stream()
.chatResponse().toStream()
.map(ChatResponse::getResults)
.flatMap(List::stream)
.map(Generation::getOutput)
.map(AssistantMessage::getText);
If needed, the solution can be checked in the solution/exercise-4
folder.
- Make sure that ollama container is running
- Make sure that redis container is running
- Run the application
- In the application prompt, type
etl
(just once) command to load data. - In the application prompt, type
rag
command and ask a question about documents content. Here are some examples:rag list the vehicles categories available for rent
rag list the contract coverages
rag how long is the maximum rental duration ?
- Response can make time to be generated, please, be patient
The RAG pattern is a pragmatic way to query LLM with specific context information. LLM query itself is the same as in the previous exercises, but context is refined by post query processes. These processes need ETL utilities and vector database to be executed.
The solution implemented in this exercise is the naïve approach. It exists more efficient and scalable ways that can be adapted to the use case.
Here are some points to remember about this exercise:
- Vector representation of information allow to query by similarity
- Question must be formulated precisely to get pertinent search results
- Filters can be applied to extend query to vector database
- Chunks size is important and must be adapted to the use case
- Data can be processed before storage operation to be
RAG oriented
- Spring AI provides complete ETL process solution to handle many types of documents
- Spring AI provides vector databases abstraction API
- Spring AI provides QuestionAnswerAdvisor class that implements data retrieval from vector store and context preparation
This is the end of this workshop. It's time to conclude.