Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor LLMAgent to include response length controls and metrics #59

Merged
merged 2 commits into from
Nov 16, 2024

Conversation

leonvanbokhorst
Copy link
Owner

@leonvanbokhorst leonvanbokhorst commented Nov 16, 2024

This commit refactors the LLMAgent class in casa/01_politeness.py to include response length controls and conversation metrics. The response_limits dictionary is added to the ExperimentParams class to define limits for response length, including maximum tokens, minimum tokens, maximum ratio of response to prompt length, and a recovery threshold. The _apply_length_controls method is implemented to truncate responses that exceed the maximum ratio, and the _needs_recovery method is added to check if response length needs rebalancing. Additionally, the _calculate_turn_balance method is introduced to calculate the turn balance score based on recent conversation history.

The changes aim to improve the naturalness of AI conversations by controlling response length and monitoring conversation metrics. fixes As a CASA researcher, I want to enforce natural conversation patterns through length controls and response balancing to achieve more human-like interactions. fixes #58

Summary by Sourcery

Refactor the LLMAgent class to introduce response length controls and conversation metrics, enhancing the naturalness of AI interactions. Update ExperimentParams to support these new features.

New Features:

  • Introduce response length controls in the LLMAgent class to manage conversation naturalness by setting limits on response length and ratios.
  • Add conversation metrics tracking to monitor and improve the naturalness of AI interactions.

Enhancements:

  • Refactor the LLMAgent class to include methods for applying length controls and calculating conversation metrics.
  • Update the ExperimentParams class to include a response_limits dictionary for configuring response length parameters.

…mat for easy analysis while making AI conversations more natural.

Fixes #57
This commit refactors the `LLMAgent` class in `casa/01_politeness.py` to include response length controls and conversation metrics. The `response_limits` dictionary is added to the `ExperimentParams` class to define limits for response length, including maximum tokens, minimum tokens, maximum ratio of response to prompt length, and a recovery threshold. The `_apply_length_controls` method is implemented to truncate responses that exceed the maximum ratio, and the `_needs_recovery` method is added to check if response length needs rebalancing. Additionally, the `_calculate_turn_balance` method is introduced to calculate the turn balance score based on recent conversation history.

The changes aim to improve the naturalness of AI conversations by controlling response length and monitoring conversation metrics. fixes As a CASA researcher, I want to enforce natural conversation patterns through length controls and response balancing to achieve more human-like interactions. fixes #58
Copy link
Contributor

sourcery-ai bot commented Nov 16, 2024

Reviewer's Guide by Sourcery

This PR implements response length controls and conversation metrics in the LLMAgent class to create more natural AI conversations. The implementation adds a response_limits configuration, methods for controlling response length, and comprehensive conversation metrics tracking with CSV logging.

No diagrams generated as the changes look simple and do not need a visual representation.

File-Level Changes

Change Details Files
Added response length control system to maintain natural conversation patterns
  • Added response_limits configuration with max_tokens, min_tokens, max_ratio, and recovery_threshold
  • Implemented _apply_length_controls method to truncate responses exceeding max ratio
  • Added _needs_recovery method to check if length rebalancing is needed
  • Added flow_metrics tracking for turn balance and recovery attempts
casa/01_politeness.py
Implemented comprehensive conversation metrics tracking
  • Added metrics tracking for prompt length, response length, and ratios
  • Added detection of questions and previous reference mentions
  • Implemented social markers extraction
  • Added turn balance score calculation
  • Created metrics logging system with CSV output
casa/01_politeness.py
Enhanced experiment results management
  • Added dedicated results directory handling
  • Implemented structured metrics DataFrame with explicit dtypes
  • Added continuous metrics logging to prevent data loss
  • Updated Monte Carlo simulation to include new metrics
casa/01_politeness.py

Assessment against linked issues

Issue Objective Addressed Explanation
#58 Implement length controls with specified parameters (max_tokens, min_tokens, max_ratio, recovery_threshold)
#58 Implement balance recovery system to monitor ratios and force shorter responses when needed
#58 Add conversation flow metrics tracking (turn balance, sustained ratio, recovery attempts)

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai bot changed the title @sourcery-ai Refactor LLMAgent to include response length controls and metrics Nov 16, 2024
@leonvanbokhorst leonvanbokhorst self-assigned this Nov 16, 2024
@leonvanbokhorst leonvanbokhorst added the enhancement New feature or request label Nov 16, 2024
@leonvanbokhorst leonvanbokhorst added this to the Phase 1 milestone Nov 16, 2024
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leonvanbokhorst - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider making the response_limits configurable through environment variables or a config file rather than hardcoding the values in ExperimentParams
Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 2 issues found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

if current_ratio > self.params.response_limits["max_ratio"]:
# Truncate response to maintain ratio
max_length = int(len(message) * self.params.response_limits["max_ratio"])
response = response[:max_length].rsplit('.', 1)[0] + '.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Add fallback handling when no period is found in the response text

The current implementation assumes there's at least one period in the response. If there isn't, this will raise an IndexError. Consider adding a fallback to simple truncation if no period is found.


self.metrics_df = pd.concat([self.metrics_df, new_row], ignore_index=True)
# Save after each update to prevent data loss
self.metrics_df.to_csv(self.metrics_file, index=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Consider batching metrics saves for better performance

Saving to CSV after every update could impact performance. Consider batching saves (e.g., every N turns or using a timer-based approach) while maintaining data safety.

        self._metrics_updates += 1
        if self._metrics_updates >= 10 or time.time() - self._last_save > 60:
            self.metrics_df.to_csv(self.metrics_file, index=False)
            self._metrics_updates = 0
            self._last_save = time.time()

for entry in self.conversation_history[-3:]: # Only include last 3 turns for context
for entry in self.conversation_history[
-3:
]: # Only include last 3 turns for context
history.append(f"Input: {entry['input']}")
history.append(f"Response: {entry['response']}\n")
return "\n".join(history)

async def send_message(self, message: str) -> Dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider splitting the send_message method into smaller focused methods for handling different responsibilities.

The send_message method has become overly complex by mixing message handling with metrics collection and flow control. Consider extracting these into separate methods:

async def send_message(self, message: str) -> Dict:
    """Send message to Ollama and get response with metadata."""
    metrics = self._collect_input_metrics(message)

    response_data = await self._get_ollama_response(message)

    flow_metrics = self._collect_flow_metrics(
        message, 
        response_data["response"]
    )

    return {
        **response_data,
        **metrics,
        **flow_metrics
    }

def _collect_input_metrics(self, message: str) -> Dict:
    """Collect metrics about the input message."""
    return {
        "prompt_length": len(message),
        "contains_question": "?" in message,
        "references_previous": any(
            ref in message.lower() 
            for ref in ["earlier", "previous", "before", "you said"]
        ),
        "social_markers": self._extract_social_markers(message)
    }

def _collect_flow_metrics(self, message: str, response: str) -> Dict:
    """Collect metrics about conversation flow."""
    ratio = len(response) / len(message) if len(message) > 0 else 0
    return {
        "ratio": ratio,
        "recovery_triggered": self._needs_recovery(ratio),
        "recovery_successful": False,
        "turn_balance_score": self._calculate_turn_balance()
    }

This refactoring:

  • Separates concerns into focused methods
  • Makes the main flow easier to understand
  • Keeps all functionality intact
  • Improves maintainability

markers.extend(word.strip("*") for word in text.split("*")[1::2])
return markers

def _apply_length_controls(self, message: str, response: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider consolidating the length control logic into a single streamlined method.

The length control system can be simplified while maintaining the same functionality. Consider consolidating the logic into a single method:

def _apply_length_controls(self, message: str, response: str) -> str:
    """Apply length controls to maintain natural conversation flow."""
    # Apply absolute length limit
    if len(response) > self.params.response_limits["max_tokens"]:
        response = response[:self.params.response_limits["max_tokens"]].rsplit('.', 1)[0] + '.'

    # Apply ratio-based limit if needed
    if len(message) > 0:
        max_ratio = self.params.response_limits["max_ratio"]
        if len(response) > len(message) * max_ratio:
            max_length = int(len(message) * max_ratio)
            response = response[:max_length].rsplit('.', 1)[0] + '.'

    return response

This simplified approach:

  • Removes the complex recovery system while preserving ratio-based limits
  • Consolidates length control logic in one place
  • Maintains natural conversation flow with clearer code
  • Eliminates need for state tracking of recovery attempts

@@ -69,6 +105,10 @@ async def send_message(self, message: str) -> Dict:
error = str(e)

end_time = datetime.now()

# Calculate ratio first
ratio = len(response_text) / len(message) if len(message) > 0 else 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Simplify comparison to string length (simplify-str-len-comparison)


def _apply_length_controls(self, message: str, response: str) -> str:
"""Apply length controls to maintain natural conversation patterns."""
current_ratio = len(response) / len(message) if len(message) > 0 else 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Simplify comparison to string length (simplify-str-len-comparison)

@leonvanbokhorst leonvanbokhorst merged commit bc891a5 into main Nov 16, 2024
1 check failed
@leonvanbokhorst leonvanbokhorst deleted the leonvanbokhorst/issue57 branch November 16, 2024 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
1 participant