-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor LLMAgent to include response length controls and metrics #59
Conversation
…mat for easy analysis while making AI conversations more natural. Fixes #57
This commit refactors the `LLMAgent` class in `casa/01_politeness.py` to include response length controls and conversation metrics. The `response_limits` dictionary is added to the `ExperimentParams` class to define limits for response length, including maximum tokens, minimum tokens, maximum ratio of response to prompt length, and a recovery threshold. The `_apply_length_controls` method is implemented to truncate responses that exceed the maximum ratio, and the `_needs_recovery` method is added to check if response length needs rebalancing. Additionally, the `_calculate_turn_balance` method is introduced to calculate the turn balance score based on recent conversation history. The changes aim to improve the naturalness of AI conversations by controlling response length and monitoring conversation metrics. fixes As a CASA researcher, I want to enforce natural conversation patterns through length controls and response balancing to achieve more human-like interactions. fixes #58
Reviewer's Guide by SourceryThis PR implements response length controls and conversation metrics in the LLMAgent class to create more natural AI conversations. The implementation adds a response_limits configuration, methods for controlling response length, and comprehensive conversation metrics tracking with CSV logging. No diagrams generated as the changes look simple and do not need a visual representation. File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @leonvanbokhorst - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider making the response_limits configurable through environment variables or a config file rather than hardcoding the values in ExperimentParams
Here's what I looked at during the review
- 🟡 General issues: 2 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 2 issues found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
if current_ratio > self.params.response_limits["max_ratio"]: | ||
# Truncate response to maintain ratio | ||
max_length = int(len(message) * self.params.response_limits["max_ratio"]) | ||
response = response[:max_length].rsplit('.', 1)[0] + '.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Add fallback handling when no period is found in the response text
The current implementation assumes there's at least one period in the response. If there isn't, this will raise an IndexError. Consider adding a fallback to simple truncation if no period is found.
|
||
self.metrics_df = pd.concat([self.metrics_df, new_row], ignore_index=True) | ||
# Save after each update to prevent data loss | ||
self.metrics_df.to_csv(self.metrics_file, index=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (performance): Consider batching metrics saves for better performance
Saving to CSV after every update could impact performance. Consider batching saves (e.g., every N turns or using a timer-based approach) while maintaining data safety.
self._metrics_updates += 1
if self._metrics_updates >= 10 or time.time() - self._last_save > 60:
self.metrics_df.to_csv(self.metrics_file, index=False)
self._metrics_updates = 0
self._last_save = time.time()
for entry in self.conversation_history[-3:]: # Only include last 3 turns for context | ||
for entry in self.conversation_history[ | ||
-3: | ||
]: # Only include last 3 turns for context | ||
history.append(f"Input: {entry['input']}") | ||
history.append(f"Response: {entry['response']}\n") | ||
return "\n".join(history) | ||
|
||
async def send_message(self, message: str) -> Dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider splitting the send_message method into smaller focused methods for handling different responsibilities.
The send_message
method has become overly complex by mixing message handling with metrics collection and flow control. Consider extracting these into separate methods:
async def send_message(self, message: str) -> Dict:
"""Send message to Ollama and get response with metadata."""
metrics = self._collect_input_metrics(message)
response_data = await self._get_ollama_response(message)
flow_metrics = self._collect_flow_metrics(
message,
response_data["response"]
)
return {
**response_data,
**metrics,
**flow_metrics
}
def _collect_input_metrics(self, message: str) -> Dict:
"""Collect metrics about the input message."""
return {
"prompt_length": len(message),
"contains_question": "?" in message,
"references_previous": any(
ref in message.lower()
for ref in ["earlier", "previous", "before", "you said"]
),
"social_markers": self._extract_social_markers(message)
}
def _collect_flow_metrics(self, message: str, response: str) -> Dict:
"""Collect metrics about conversation flow."""
ratio = len(response) / len(message) if len(message) > 0 else 0
return {
"ratio": ratio,
"recovery_triggered": self._needs_recovery(ratio),
"recovery_successful": False,
"turn_balance_score": self._calculate_turn_balance()
}
This refactoring:
- Separates concerns into focused methods
- Makes the main flow easier to understand
- Keeps all functionality intact
- Improves maintainability
markers.extend(word.strip("*") for word in text.split("*")[1::2]) | ||
return markers | ||
|
||
def _apply_length_controls(self, message: str, response: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider consolidating the length control logic into a single streamlined method.
The length control system can be simplified while maintaining the same functionality. Consider consolidating the logic into a single method:
def _apply_length_controls(self, message: str, response: str) -> str:
"""Apply length controls to maintain natural conversation flow."""
# Apply absolute length limit
if len(response) > self.params.response_limits["max_tokens"]:
response = response[:self.params.response_limits["max_tokens"]].rsplit('.', 1)[0] + '.'
# Apply ratio-based limit if needed
if len(message) > 0:
max_ratio = self.params.response_limits["max_ratio"]
if len(response) > len(message) * max_ratio:
max_length = int(len(message) * max_ratio)
response = response[:max_length].rsplit('.', 1)[0] + '.'
return response
This simplified approach:
- Removes the complex recovery system while preserving ratio-based limits
- Consolidates length control logic in one place
- Maintains natural conversation flow with clearer code
- Eliminates need for state tracking of recovery attempts
@@ -69,6 +105,10 @@ async def send_message(self, message: str) -> Dict: | |||
error = str(e) | |||
|
|||
end_time = datetime.now() | |||
|
|||
# Calculate ratio first | |||
ratio = len(response_text) / len(message) if len(message) > 0 else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Simplify comparison to string length (simplify-str-len-comparison
)
|
||
def _apply_length_controls(self, message: str, response: str) -> str: | ||
"""Apply length controls to maintain natural conversation patterns.""" | ||
current_ratio = len(response) / len(message) if len(message) > 0 else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Simplify comparison to string length (simplify-str-len-comparison
)
This commit refactors the
LLMAgent
class incasa/01_politeness.py
to include response length controls and conversation metrics. Theresponse_limits
dictionary is added to theExperimentParams
class to define limits for response length, including maximum tokens, minimum tokens, maximum ratio of response to prompt length, and a recovery threshold. The_apply_length_controls
method is implemented to truncate responses that exceed the maximum ratio, and the_needs_recovery
method is added to check if response length needs rebalancing. Additionally, the_calculate_turn_balance
method is introduced to calculate the turn balance score based on recent conversation history.The changes aim to improve the naturalness of AI conversations by controlling response length and monitoring conversation metrics. fixes As a CASA researcher, I want to enforce natural conversation patterns through length controls and response balancing to achieve more human-like interactions. fixes #58
Summary by Sourcery
Refactor the LLMAgent class to introduce response length controls and conversation metrics, enhancing the naturalness of AI interactions. Update ExperimentParams to support these new features.
New Features:
Enhancements: