This second exercise will focus on the conversational memory. We will improve our LLM client to make it able to store conversation history and pass it as context in LLM query.
Modify the LLMService
class.
Add List<Message>
attribute called history
and instantiate it in the constructor
In the askQuestion
method, add user question to history before the return statement.
public Stream<String> askQuestion(final String question) {
Message userMessage = new UserMessage(question);
history.add(userMessage);
return getResponse(userMessage);
}
Create a new private method that will append an AssistantMessage
(passed as argument) to the history and return this AssistantMessage
.
private AssistantMessage appendToHistory(AssistantMessage assistantMessage) {
history.add(assistantMessage);
return assistantMessage;
}
In the getResponse
method, modify the return statement to append the response content to the conversation history by using stream().chatResponse()
and appendToHistory
methods.
return chatClient.prompt(prompt)
.options(options)
.stream()
.chatResponse().toStream()
.map(ChatResponse::getResults)
.flatMap(List::stream)
.map(Generation::getOutput)
.map(this::appendToHistory)
.map(AssistantMessage::getText);
In the getResponse
, add history
list content to existing messages
list (before user message).
List<Message> messages = new ArrayList<>();
messages.addAll(history);
messages.add(userMessage);
If needed, the solution can be checked in the solution/exercise-2
folder.
- Make sure that ollama container is running
- Run the application
- In the application prompt, type
llm Give me 8 famous dishes from Japan
- Check the response
- Try a new query
llm Classify them into categories with meat and without meat
- Is the response better now ? (We hope so! ;)
In this exercise, we added conversation history feature to our LLM client making it LLM able to follow a conversation. Here are some key points:
- Conversation history can be provided as context in query
- Token volume in limited for LLM input and there is a subject about how to compress history
- Conversation history is the first step to use LLM with step-by-step reflexion approach as few shots prompting or CoT, ToT prompting that consists in splitting reflexion and chaining multiple queries
- Spring AI provides some advisors classes to manage conversation history automatically (see bonus 1)
- Model Context Protocol (MCP) can be used to optimise conversation history management and solve token volume problem
Let's add some document content as context in the next exercise!