Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev->switch-to-pg #1010

Closed
wants to merge 57 commits into from
Closed

dev->switch-to-pg #1010

wants to merge 57 commits into from

Conversation

Vedantsahai18
Copy link
Member

@Vedantsahai18 Vedantsahai18 commented Jan 2, 2025

PR Type

Enhancement, Bug fix, Tests, Documentation


Description

  • Introduced MMR search and configurable parameters for enhanced document retrieval.

  • Refactored metrics with Prometheus Summary and Histogram for improved observability.

  • Added health check endpoint and integrated it into the API gateway.

  • Enhanced changelog generation with a new script and GitHub Action workflow.


Changes walkthrough 📝

Relevant files
Enhancement
16 files
gather_messages.py
Added MMR search and configurable parameters for document retrieval
+104/-31
generate_changelog.py
Added changelog generation script with Julep API integration
+189/-0 
counters.py
Refactored metrics with Prometheus Summary and Histogram 
+75/-19 
check_health.py
Added health check endpoint for API                                           
+19/-0   
generate-changelog.yml
Added GitHub Action for changelog generation                         
+83/-0   
Sessions.py
Updated RecallOptions model with new search parameters     
+33/-2   
create_session.py
Integrated RecallOptions into session creation                     
+7/-4     
utils.py
Added recall options handling and improved error handling
+10/-1   
search_docs_by_embedding.py
Updated document search logic with new indices                     
+6/-3     
update_session.py
Enhanced session update with recall options                           
+4/-3     
router.py
Added router for health check endpoint                                     
+3/-0     
get_session.py
Added transformation for session retrieval                             
+8/-1     
list_sessions.py
Enhanced session listing with updated transformations       
+7/-1     
prepare_session_data.py
Updated session data preparation logic                                     
+1/-0     
patch_session.py
Enhanced session patching with updated transformations     
+1/-1     
models.tsp
Updated RecallOptions model with additional fields             
+30/-1   
Documentation
2 files
header.html
Added HTML template for changelog header                                 
+24/-0   
CHANGELOG.md
Updated changelog with recent changes                                       
+102/-9 
Configuration changes
1 files
traefik.yml.template
Integrated health check endpoint into API gateway               
+11/-2   
Tests
1 files
test_chat_routes.py
Updated tests for chat routes with new recall options       
+12/-5   
Additional files
49 files
execute_system.py +57/-3   
prompt_step.py +3/-0     
utils.py +4/-3     
Docs.py +6/-0     
sessions.py +1/-0     
tasks.py +9/-1     
create_agent.py +2/-2     
create_or_update_agent.py +2/-2     
patch_agent.py +2/-2     
update_agent.py +2/-2     
prepare_chat_context.py +1/-0     
create_doc.py +2/-2     
search_docs_by_text.py +6/-3     
create_entries.py +2/-2     
create_execution.py +2/-2     
create_execution_transition.py +3/-3     
create_temporal_lookup.py +2/-2     
update_execution.py +3/-3     
create_file.py +2/-2     
create_or_update_session.py +2/-2     
create_or_update_task.py +2/-2     
create_task.py +2/-2     
patch_task.py +2/-2     
update_task.py +2/-2     
create_tools.py +2/-2     
patch_tool.py +2/-2     
update_tool.py +2/-2     
create_or_update_user.py +2/-2     
create_user.py +2/-2     
patch_user.py +2/-2     
update_user.py +2/-2     
__init__.py +3/-0     
web.py +2/-0     
transition.py +0/-1     
migrate_1733985509_add_columns_to_indices.py +26/-0   
test_execution_workflow.py +0/-1     
04-hook-generator-trending-reels.ipynb +1142/-336
04-hook-generator-trending-reels.py +24/-25 
README.md +1/-1     
Docs.py +6/-0     
Sessions.py +33/-2   
spider.py +1/-1     
spider.py +5/-5     
authors.md +9/-0     
changelog.yaml +111/-0 
node-sdk +1/-1     
python-sdk +1/-1     
models.tsp +4/-1     
openapi-1.0.0.yaml +75/-3   

💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

Vedantsahai18 and others added 30 commits December 8, 2024 18:20
feat(agents-api): Add missing columns to indices
feat: Add prometheus summary for cozo queries
feat(agents-api): Added mmr search and get history system tool + configurable doc search params in chat.py
fix: bug fixs for github action generate changelog
x/memory-store: revert docker cozo
fix(agents-api): Fix None-valued `passed_settings` from overriding agent's settings in `prompt` step
…nsitions

fix(agents-api): Allow consecutive `finish_branch` transitions
Vedantsahai18 and others added 24 commits December 14, 2024 00:15
fix(agents-api): Miscellaneous fixes in `sessions`
fix(agents-api): Bug fixes in sessions
fix(agents-api): Session & system tool fixes
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
chore(agents-api): Add `/healthz` endpoint
Backtick in the colab link URL (decompiles as UTF8 %60 in the URL)

This created an invalid path to the Google Colab File

just a single backtick edit. that's it ⛵
Copy link
Contributor

@standard-input standard-input bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues flagged.
Standard Input can make mistakes. Check important info.

Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Memory Leak

The MMR algorithm implementation may cause memory issues with large document sets since it loads all document embeddings into memory at once. Consider implementing batching or streaming for large result sets.

# We shouldn't be having references without embeddings.
doc_references = [
    doc for doc in doc_references if doc.snippet.embedding is not None
]

# Apply MMR
indices = maximal_marginal_relevance(
    np.asarray(query_embedding),
    [doc.snippet.embedding for doc in doc_references],
    k=recall_options.limit,
)
# Apply MMR
doc_references = [
    doc for i, doc in enumerate(doc_references) if i in set(indices)
]
Error Handling

Process pool executor error handling needs improvement. When handler throws an exception, the process dies silently and error is not properly captured.

# FIXME: When the handler throws an exception, the process dies and the error is not captured. Instead it throws:
# "concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending."
loop = asyncio.get_running_loop()
Resource Management

The metrics implementation may lead to resource leaks since histogram buckets are unbounded (INF). Consider adding upper bounds based on expected latency ranges.

buckets = (
    0.005,
    0.01,
    0.025,
    0.05,
    0.075,
    0.1,
    0.25,
    0.5,
    0.75,
    1.0,
    2.5,
    5.0,
    7.5,
    10.0,
    15.0,
    20.0,
    25.0,
    30.0,
    INF,
)

Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Score
Possible issue
Add proper validation and error handling for document embeddings before applying MMR

The code assumes doc.snippet.embedding is not None without proper error handling,
which could cause runtime errors if the assumption is wrong.

agents-api/agents_api/models/chat/gather_messages.py [203-207]

-# Apply MMR
+# Validate embeddings before applying MMR
+doc_embeddings = [doc.snippet.embedding for doc in doc_references if doc.snippet and doc.snippet.embedding is not None]
+if not doc_embeddings:
+    return doc_references[:recall_options.limit]
+    
 indices = maximal_marginal_relevance(
     np.asarray(query_embedding),
-    [doc.snippet.embedding for doc in doc_references],
-    k=recall_options.limit,
+    doc_embeddings,
+    k=min(recall_options.limit, len(doc_embeddings)),
 )
  • Apply this suggestion
Suggestion importance[1-10]: 9

Why: The suggestion addresses a critical potential runtime error by adding proper validation of embeddings and graceful fallback, preventing crashes when embeddings are missing.

9
Add error handling for critical authentication steps to prevent silent failures

Add error handling for the GitHub CLI authentication step. If authentication fails,
the workflow should fail early and clearly indicate the issue.

.github/workflows/generate-changelog.yml [18-20]

 - name: Setup GitHub CLI
   run: |
-    echo "${{ secrets.GITHUB_TOKEN }}" | gh auth login --with-token
+    if ! echo "${{ secrets.GITHUB_TOKEN }}" | gh auth login --with-token; then
+      echo "Failed to authenticate with GitHub CLI"
+      exit 1
+    fi
  • Apply this suggestion
Suggestion importance[1-10]: 8

Why: The suggestion adds crucial error handling for the GitHub CLI authentication step, which is a critical part of the workflow. Without proper error handling, authentication failures could lead to silent failures and hard-to-debug issues.

8
Add existence check before file deletion to prevent potential errors

The script deletes pr_data.json unconditionally without checking if the file exists
first, which could raise a FileNotFoundError.

scripts/generate_changelog.py [174-175]

-# delete the pr_data.json file
-os.remove('pr_data.json')
-logging.info("Deleted pr_data.json file")
+# delete the pr_data.json file if it exists
+if os.path.exists('pr_data.json'):
+    os.remove('pr_data.json')
+    logging.info("Deleted pr_data.json file")
  • Apply this suggestion
Suggestion importance[1-10]: 7

Why: The suggestion prevents potential FileNotFoundError by adding a necessary existence check before file deletion, making the code more robust and reliable.

7
Validate required secrets early in the workflow to prevent runtime failures

Add validation for required secrets before running the workflow to prevent runtime
failures due to missing credentials.

.github/workflows/generate-changelog.yml [67-70]

-env:
-  JULEP_API_KEY: ${{ secrets.JULEP_API_KEY }}
-  TASK_UUID: ${{ secrets.TASK_UUID }}
-  AGENT_UUID: ${{ secrets.AGENT_UUID }}
+- name: Check required secrets
+  run: |
+    if [ -z "${{ secrets.JULEP_API_KEY }}" ] || [ -z "${{ secrets.TASK_UUID }}" ] || [ -z "${{ secrets.AGENT_UUID }}" ]; then
+      echo "Error: Required secrets are missing"
+      exit 1
+    fi
+- name: Send PR data to Python script
+  if: steps.collect_prs.outputs.pr_data != '[]'
+  id: generate_changelog
+  env:
+    JULEP_API_KEY: ${{ secrets.JULEP_API_KEY }}
+    TASK_UUID: ${{ secrets.TASK_UUID }}
+    AGENT_UUID: ${{ secrets.AGENT_UUID }}
  • Apply this suggestion
Suggestion importance[1-10]: 7

Why: Early validation of required secrets is important to fail fast and provide clear error messages. This prevents the workflow from running partially and failing later with less clear error messages.

7
Security
Protect health check endpoints from potential abuse with rate limiting

Add rate limiting middleware to the health check endpoint to prevent potential DoS
attacks.

gateway/traefik.yml.template [51-58]

 agents-api-healthz:
   entryPoints:
     - web
   rule: Path(`/api/healthz`)
   middlewares:
     - agents-api-strip-prefix-api
+    - agents-api-ratelimit
   service: service-agents-api
   priority: 3
  • Apply this suggestion
Suggestion importance[1-10]: 9

Why: Adding rate limiting to the health check endpoint is a critical security measure to prevent potential DoS attacks. Health check endpoints are often targeted for abuse, making this a high-priority security enhancement.

9
General
Improve error handling in retry mechanism to provide user-friendly error messages on failure

The retry decorator should handle exceptions more gracefully. Currently, if a retry
fails after max attempts, it will raise the raw exception. Consider wrapping the
final exception with a more user-friendly HTTPException.

agents-api/agents_api/models/utils.py [228-232]

 @retry(
     stop=stop_after_attempt(4),
     wait=wait_exponential(multiplier=1, min=4, max=10),
     retry=retry_if_exception(is_resource_busy),
-    reraise=True,
+    reraise=False,
 )
+def wrapper(*args: P.args, client=None, **kwargs: P.kwargs) -> pd.DataFrame:
+    try:
+        return func(*args, client=client, **kwargs)
+    except Exception as e:
+        raise HTTPException(status_code=503, detail="Service temporarily unavailable") from e
  • Apply this suggestion
Suggestion importance[1-10]: 8

Why: The suggestion significantly improves error handling by wrapping retried exceptions in a user-friendly HTTPException, preventing raw internal errors from reaching the client.

8
Optimize code performance by moving dictionary filtering operation outside of loop scope

The passed_settings dictionary filtering should be moved before the message
processing loop to improve performance, as it's currently unnecessarily inside the
loop scope.

agents-api/agents_api/activities/task_steps/prompt_step.py [184-192]

+# Remove None values from passed_settings (avoid overwriting agent's settings)
+passed_settings = {k: v for k, v in passed_settings.items() if v is not None}
+
 for message in prompt
     ]
-
-# Remove None values from passed_settings (avoid overwriting agent's settings)
-passed_settings = {k: v for k, v in passed_settings.items() if v is not None}
 
 # Use litellm for other models
 completion_data: dict = {
  • Apply this suggestion
Suggestion importance[1-10]: 7

Why: Valid performance optimization that moves dictionary filtering outside the loop scope to avoid redundant operations. This change would improve code efficiency without affecting functionality.

7

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to a166d86 in 1 minute and 29 seconds

More details
  • Looked at 2580 lines of code in 68 files
  • Skipped 1 files when reviewing.
  • Skipped posting 8 drafted comments based on config settings.
1. .github/workflows/generate-changelog.yml:11
  • Draft comment:
    Consider removing 'ref: dev' if the workflow is intended to run on the default branch to avoid potential errors if the 'dev' branch is not available.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The code uses 'ref: dev' in the checkout step, which might not be necessary if the workflow is intended to run on the default branch. This could lead to confusion or errors if the 'dev' branch is not available.
2. .github/workflows/generate-changelog.yml:15
  • Draft comment:
    Check if 'gh auth login' is necessary, as GITHUB_TOKEN might already be available in the environment, making this step redundant.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The 'Setup GitHub CLI' step uses 'gh auth login' which might not be necessary if the GITHUB_TOKEN is already available in the environment. This could be redundant.
3. .github/workflows/generate-changelog.yml:45
  • Draft comment:
    Ensure all necessary dependencies are listed here to avoid runtime errors.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The 'Install dependencies' step installs 'PyYAML' and 'julep', but it's not clear if these are the only dependencies needed. This could lead to runtime errors if other dependencies are required.
4. .github/workflows/generate-changelog.yml:51
  • Draft comment:
    Ensure 'pr_data' is set as an output in the 'collect_prs' step to avoid logical errors.
  • Reason this comment was not posted:
    Comment did not seem useful.
5. .github/workflows/generate-changelog.yml:67
  • Draft comment:
    Ensure 'success()' correctly handles cases where no PRs are found to avoid unnecessary PR creation attempts.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The 'Create Pull Request' step uses 'success()' which might not correctly handle the case where no PRs are found. This could lead to unnecessary PR creation attempts.
6. agents-api/agents_api/activities/utils.py:373
  • Draft comment:
    Consider removing or implementing the 'delete_session' case to avoid confusion or errors.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The 'get_handler' function has a commented-out case for 'delete_session'. This could lead to confusion or errors if this functionality is needed.
7. agents-api/agents_api/models/agent/create_agent.py:60
  • Draft comment:
    Ensure all instances of 'increase_counter' are replaced with 'query_metrics_update' to maintain consistency and avoid errors. This applies to other files as well.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The 'query_metrics_update' decorator is used in multiple files, replacing 'increase_counter'. This is a consistent change, but ensure that all instances are correctly updated to avoid errors.
8. agents-api/agents_api/models/chat/gather_messages.py:28
  • Draft comment:
    Document the reasoning behind using 'recall_options.limit * 3' for vector and hybrid modes when MMR is enabled to avoid confusion.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The 'get_search_fn_and_params' function in 'gather_messages.py' uses 'recall_options.limit * 3' for vector and hybrid modes when MMR is enabled. This could lead to unexpected behavior if not properly documented.

Workflow ID: wflow_d2qRaksHNk5n0QSq


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants