feat(agents-api): Add Doc sql queries #979

Vedantsahai18 · 2024-12-20T01:24:53Z

PR Type

Enhancement, Tests

Description

Implemented comprehensive document management system with:
- Creation, retrieval, listing, and deletion operations
- Advanced search capabilities including hybrid search, text search, and embedding-based search
- Maximal Marginal Relevance (MMR) algorithm for diverse results
Added extensive test coverage for document operations
Optimized database schema and indexes across multiple tables
Added document embedding metadata support in models and API
Improved database functions with caching and hybrid search capabilities

Changes walkthrough 📝

Relevant files

Tests

1 files

test_docs_queries.py `Rewrite document query tests for PostgreSQL` agents-api/tests/test_docs_queries.py Rewrote test cases for document operations using PostgreSQL Added comprehensive tests for user and agent document operations Added tests for document search functionality Improved test coverage with more assertions and edge cases	+277/-163

Enhancement

19 files

search_docs_hybrid.py `Add hybrid document search functionality` agents-api/agents_api/queries/docs/search_docs_hybrid.py Added hybrid search functionality combining text and embedding search Implemented score normalization and fusion logic Added configurable weighting between text and embedding scores Added type hints and documentation	+158/-0
create_doc.py `Implement document creation functionality` agents-api/agents_api/queries/docs/create_doc.py Implemented document creation with metadata and embeddings support Added support for single and batch content creation Added owner association logic Added error handling and validation	+181/-0
list_docs.py `Add document listing functionality` agents-api/agents_api/queries/docs/list_docs.py Added document listing with filtering and pagination Implemented metadata filtering Added sorting options and owner filtering Added error handling and input validation	+152/-0
mmr.py `Add Maximal Marginal Relevance search` agents-api/agents_api/queries/docs/mmr.py Implemented Maximal Marginal Relevance (MMR) algorithm Added cosine similarity calculation Added configurable diversity parameter Added optimization for large result sets	+109/-0
search_docs_by_text.py `Add text-based document search` agents-api/agents_api/queries/docs/search_docs_by_text.py Implemented text-based document search Added metadata filtering support Added language-specific search capabilities Added error handling and validation	+91/-0
search_docs_by_embedding.py `Add embedding-based document search` agents-api/agents_api/queries/docs/search_docs_by_embedding.py Implemented vector-based document search Added confidence threshold filtering Added metadata filtering support Added error handling and validation	+86/-0
get_doc.py `Add document retrieval functionality` agents-api/agents_api/queries/docs/get_doc.py Implemented document retrieval functionality Added support for retrieving embeddings Added content aggregation logic Added error handling	+86/-0
delete_doc.py `Add document deletion functionality` agents-api/agents_api/queries/docs/delete_doc.py Implemented document deletion with ownership validation Added cascading deletion of related records Added error handling and validation Added soft delete support	+79/-0
000018_doc_search.up.sql `Add document search database functions` memory-store/migrations/000018_doc_search.up.sql Added document search functions and triggers Implemented caching for embeddings Added hybrid search capabilities Added text and vector search functions	+71/-87
000006_docs.up.sql `Add document storage database schema` memory-store/migrations/000006_docs.up.sql Created document tables and indexes Added ownership and metadata support Added full-text search capabilities Added validation triggers	+30/-58
000008_tools.up.sql `Optimize tools table schema and constraints` memory-store/migrations/000008_tools.up.sql Removed task_version column from tools table Modified primary key to only include developer_id, agent_id, tool_id Added unique constraint for agent_id, name, task_id Removed redundant indexes	+5/-10
000009_sessions.up.sql `Optimize sessions table indexes` memory-store/migrations/000009_sessions.up.sql Removed redundant session_id index Modified session_lookup index to include participant_type Removed redundant session_lookup_by_session index	+4/-10
000003_users.up.sql `Remove redundant indexes from users table` memory-store/migrations/000003_users.up.sql Removed redundant user_id sorted index Removed redundant developer_id index Cleaned up formatting	+4/-9
openapi-1.0.0.yaml `Add document embedding metadata fields` typespec/tsp-output/@typespec/openapi3/openapi-1.0.0.yaml Added embeddings as required field with null default Added new fields: modality, language, embedding_model, embedding_dimensions	+20/-0
000011_executions.up.sql `Optimize executions table indexes` memory-store/migrations/000011_executions.up.sql Removed redundant execution_id sorted index Modified task_id index to include task_version	+2/-5
models.tsp `Add document embedding metadata model fields` typespec/docs/models.tsp Modified embeddings field to be nullable with null default Added new document metadata fields for embeddings	+18/-2
000006_docs.down.sql `Improve docs table cleanup process` memory-store/migrations/000006_docs.down.sql Added CASCADE to table drops Removed redundant index drops	+3/-4
000016_entry_relations.up.sql `Optimize entry relations indexes` memory-store/migrations/000016_entry_relations.up.sql Removed redundant components index Simplified leaf index structure	+1/-3
000015_entries.up.sql `Simplify entries table index structure` memory-store/migrations/000015_entries.up.sql Simplified entries index by removing entry_id	+1/-1

Documentation

1 files

__init__.py `Add document module initialization` agents-api/agents_api/queries/docs/init.py Added module documentation Defined public API exports Added type hints and docstrings	+35/-0

Configuration changes

1 files

000007_ann.up.sql `Update VoyageAI embedding configuration` memory-store/migrations/000007_ann.up.sql Added 'document' parameter to embedding_voyageai function	+2/-2

💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

qodo-merge-pro-for-open-source · 2024-12-20T01:25:42Z

PR Reviewer Guide 🔍

(Review updated until commit `934db8a`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 Security concerns SQL injection: The search_docs_by_embedding query manually constructs SQL strings with user-provided UUIDs using string concatenation rather than proper parameterization, which could potentially allow SQL injection attacks if the UUID validation is bypassed.
⚡ Recommended focus areas for review Performance The embedding search implementation manually constructs query strings with string concatenation for UUID arrays which could lead to performance issues with large arrays. Consider using proper parameterization. # NOTE: Manually replace uuids list coz asyncpg isnt sending it correctly owner_ids_pg_str = f"ARRAY['{'\', \''.join(owner_ids)}']" query = search_docs_by_embedding_query.replace("$UUID_LIST", owner_ids_pg_str) Error Handling The maximal marginal relevance implementation assumes valid input embeddings but lacks proper validation and error handling for edge cases like empty or malformed embeddings. def maximal_marginal_relevance( query_embedding: np.ndarray, embedding_list: list, lambda_mult: float = 0.5, k: int = 4, ) -> list[int]: """Calculate maximal marginal relevance. Args: query_embedding: The query embedding. embedding_list: A list of embeddings. lambda_mult: The lambda parameter for MMR. Default is 0.5. k: The number of embeddings to return. Default is 4. Returns: A list of indices of the embeddings to return. Raises: ImportError: If numpy is not installed. """ if min(k, len(embedding_list)) <= 0: return [] if query_embedding.ndim == 1: query_embedding = np.expand_dims(query_embedding, axis=0) similarity_to_query = _cosine_similarity(query_embedding, embedding_list)[0] most_similar = int(np.argmax(similarity_to_query)) idxs = [most_similar] selected = np.array([embedding_list[most_similar]]) while len(idxs) < min(k, len(embedding_list)): best_score = -np.inf idx_to_add = -1 similarity_to_selected = _cosine_similarity(embedding_list, selected) for i, query_score in enumerate(similarity_to_query): if i in idxs: continue redundant_score = max(similarity_to_selected[i]) equation_score = ( lambda_mult * query_score - (1 - lambda_mult) * redundant_score ) if equation_score > best_score: best_score = equation_score idx_to_add = i idxs.append(idx_to_add) selected = np.append(selected, [embedding_list[idx_to_add]], axis=0) return idxs Validation The document creation logic allows empty or invalid metadata without proper validation. Consider adding metadata schema validation to prevent invalid document states. async def create_doc( *, developer_id: UUID, doc_id: UUID \| None = None, data: CreateDocRequest, owner_type: Literal["user", "agent"], owner_id: UUID, modality: Literal["text", "image", "mixed"] \| None = "text", embedding_model: str \| None = "voyage-3", embedding_dimensions: int \| None = 1024, language: str \| None = "english", index: int \| None = 0, ) -> list[tuple[str, list, Literal["fetch", "fetchmany", "fetchrow"]]]:

qodo-merge-pro-for-open-source · 2024-12-20T01:26:11Z

PR Code Suggestions ✨

Latest suggestions up to 934db8a
Explore these optional code suggestions:

Category	Suggestion	Score
Possible issue	Missing required function parameter could cause runtime errors The function `generate_canonical_name()` is called without any arguments, but based on the removed code it appears it should receive the task name as a parameter. Add the task name parameter to avoid potential errors. agents-api/agents_api/queries/tasks/create_or_update_task.py [170] -canonical_name = data.canonical_name or generate_canonical_name() +canonical_name = data.canonical_name or generate_canonical_name(data.name) Apply this suggestion Suggestion importance[1-10]: 9 Why: The suggestion fixes a critical bug where generate_canonical_name() is called without its required parameter, which would cause a runtime error. This is a direct regression from the old code.	9
	Validate embedding vector dimensions to prevent SQL errors Add validation for query_embedding length to ensure it matches embedding_dimensions. agents-api/agents_api/queries/docs/search_docs_by_embedding.py [62-63] if not query_embedding: raise HTTPException(status_code=400, detail="Empty embedding provided") +if len(query_embedding) != 1024: # Based on vector(1024) in SQL + raise HTTPException( + status_code=400, + detail=f"Embedding must have 1024 dimensions, got {len(query_embedding)}" + ) Apply this suggestion Suggestion importance[1-10]: 9 Why: The suggestion adds crucial validation to ensure embedding dimensions match the database schema requirements (1024d), preventing SQL errors that could crash the application.	9
	Improve array validation logic to handle empty arrays and prevent potential SQL errors Add error handling for empty arrays in the owner filter validation to prevent SQL errors. memory-store/migrations/000018_doc_search.up.sql [110-112] -IF owner_types IS NOT NULL AND owner_ids IS NOT NULL AND - array_length(owner_types, 1) != array_length(owner_ids, 1) AND - array_length(owner_types, 1) <= 0 THEN +IF owner_types IS NOT NULL AND owner_ids IS NOT NULL AND ( + array_length(owner_types, 1) != array_length(owner_ids, 1) OR + array_length(owner_types, 1) <= 0 OR + array_length(owner_ids, 1) <= 0) THEN Apply this suggestion Suggestion importance[1-10]: 8 Why: The suggestion improves input validation by also checking for empty owner_ids array and using OR instead of AND for the length checks, which prevents potential SQL errors from invalid array inputs.	8
General	Add validation to prevent empty or excessively long search queries Add input validation for the query parameter to ensure it's not empty and has a reasonable length limit. agents-api/agents_api/queries/docs/search_docs_by_text.py [46-54] async def search_docs_by_text( , developer_id: UUID, owners: list[tuple[Literal["user", "agent"], UUID]], query: str, k: int = 3, metadata_filter: dict[str, Any] = {}, search_language: str \| None = "english", ) -> tuple[str, list]: + if not query or len(query.strip()) == 0: + raise HTTPException(status_code=400, detail="Search query cannot be empty") + if len(query) > 1000: + raise HTTPException(status_code=400, detail="Search query too long") Apply this suggestion* Suggestion importance[1-10]: 8 Why: The suggestion adds important input validation to prevent potential issues with empty or overly long search queries that could impact performance or cause errors. This is a critical validation for a search endpoint.	8
	Add type validation for list content elements to prevent data inconsistency Add validation to ensure that when content is provided as a list, all elements are of the same type (string) to maintain data consistency and prevent potential database errors. agents-api/agents_api/queries/docs/create_doc.py [120-125] if isinstance(data.content, list): + if not all(isinstance(c, str) for c in data.content): + raise HTTPException(status_code=400, detail="All content elements must be strings") final_params_doc = [] final_params_owner = [] for idx, content in enumerate(data.content): Apply this suggestion Suggestion importance[1-10]: 7 Why: The suggestion adds important input validation to prevent potential database errors and maintain data consistency, which is crucial for the document creation functionality.	7
	Add parameter validation to ensure interpolation weight stays within valid range Add input validation for lambda_mult parameter to ensure it stays within valid range [0,1] since it's used as an interpolation weight in the MMR calculation. agents-api/agents_api/queries/docs/mmr.py [64-69] def maximal_marginal_relevance( query_embedding: np.ndarray, embedding_list: list, lambda_mult: float = 0.5, k: int = 4, ) -> list[int]: + if not 0 <= lambda_mult <= 1: + raise ValueError("lambda_mult must be between 0 and 1") Apply this suggestion Suggestion importance[1-10]: 6 Why: The suggestion adds important validation for the lambda_mult parameter to prevent invalid calculations in the MMR algorithm, improving robustness and preventing potential errors.	6
	Replace hardcoded dummy content with proper null value until implementation Add warning or placeholder for the hardcoded dummy content in file retrieval. agents-api/agents_api/queries/files/get_file.py [58] -"content": "DUMMY: NEED TO FETCH CONTENT FROM BLOB STORAGE", +"content": None, # TODO: Implement blob storage content retrieval Apply this suggestion Suggestion importance[1-10]: 3 Why: While the suggestion improves code clarity by using None instead of a dummy string, it's a minor improvement that doesn't affect functionality and only changes a placeholder value.	3

Previous suggestions

Suggestions up to commit 6c77490

Category	Suggestion	Score
Possible issue	Add validation for vector embedding dimensions to prevent invalid configurations Add input validation for embedding_dimensions to ensure it's a positive integer when embedding_model is specified, preventing invalid vector dimensions. agents-api/agents_api/queries/docs/create_doc.py [122-123] data.embedding_model or "none", -data.embedding_dimensions or 0, +data.embedding_dimensions if data.embedding_model != "none" and data.embedding_dimensions > 0 else 0, Suggestion importance[1-10]: 8 Why: The suggestion adds crucial validation to prevent invalid vector dimensions when an embedding model is specified, which could cause serious issues in downstream vector operations.	8
	Validate diversity-relevance tradeoff parameter to prevent invalid search results Add input validation for lambda_mult parameter to ensure it's between 0 and 1, as values outside this range would produce invalid diversity-relevance tradeoffs. agents-api/agents_api/queries/docs/mmr.py [64-69] def maximal_marginal_relevance( query_embedding: np.ndarray, embedding_list: list, lambda_mult: float = 0.5, k: int = 4, ) -> list[int]: + if not 0 <= lambda_mult <= 1: + raise ValueError("lambda_mult must be between 0 and 1") Suggestion importance[1-10]: 7 Why: Adding validation for lambda_mult is important as values outside [0,1] would produce invalid diversity-relevance tradeoffs, potentially breaking the MMR algorithm's functionality.	7
	Validate hybrid search weight parameter to prevent invalid result ranking Add validation for alpha parameter to ensure it's between 0 and 1, as this weight controls the balance between text and embedding search results. agents-api/agents_api/queries/docs/search_docs_hybrid.py [107-114] async def search_docs_hybrid( developer_id: UUID, text_query: str = "", embedding: List[float] = None, k: int = 10, alpha: float = 0.5, +) -> List[Doc]: + if not 0 <= alpha <= 1: + raise ValueError("alpha must be between 0 and 1") Suggestion importance[1-10]: 7 Why: The validation ensures alpha stays within [0,1], preventing incorrect weighting between text and embedding search results that could lead to invalid rankings.	7

qodo-merge-pro-for-open-source · 2024-12-20T01:26:20Z

CI Failure Feedback 🧐

(Checks updated until commit `934db8a`)

Action: Typecheck
Failed stage: Typecheck [❌]
Failed test name: pytype
Failure summary: The pytype check failed due to multiple issues: 1. Missing dependency: Cannot find module `pycozo` which is required in `agents_api/common/utils/cozo.py` 2. Type error in `agents_api/queries/docs/search_docs_hybrid.py`: - Trying to access `model_copy` attribute on a None value - Type annotation mismatch for `embedding` variable 3. Stray type comments in `tests/test_workflow_routes.py`
Relevant error logs: 1: ##[group]Operating System 2: Ubuntu ... 1186: [9/374] check agents_api.common.utils.datetime 1187: [10/374] check agents_api.autogen.Executions 1188: [11/374] check agents_api.autogen.Entries 1189: [12/374] check agents_api.clients.__init__ 1190: [13/374] check agents_api.autogen.Agents 1191: [14/374] check agents_api.env 1192: [15/374] check agents_api.autogen.Tasks 1193: [16/374] check agents_api.common.utils.cozo 1194: FAILED: /home/runner/work/julep/julep/agents-api/.pytype/pyi/agents_api/common/utils/cozo.pyi 1195: /home/runner/work/julep/julep/agents-api/.venv/bin/python -m pytype.main --disable pyi-error --imports_info /home/runner/work/julep/julep/agents-api/.pytype/imports/agents_api.common.utils.cozo.imports --module-name agents_api.common.utils.cozo --platform linux -V 3.12 -o /home/runner/work/julep/julep/agents-api/.pytype/pyi/agents_api/common/utils/cozo.pyi --analyze-annotated --nofail --none-is-not-bool --quick --strict-none-binding /home/runner/work/julep/julep/agents-api/agents_api/common/utils/cozo.py 1196: /home/runner/work/julep/julep/agents-api/agents_api/common/utils/cozo.py:9:1: error: in <module>: Can't find module 'pycozo'. [import-error] 1197: from pycozo import Client 1198: ~~~~~~~~~~~~~~~~~~~~~~~~~ 1199: For more details, see https://google.github.io/pytype/errors.html#import-error ... 1209: [26/374] check agents_api.common.utils.types 1210: [27/374] check agents_api.common.protocol.tasks 1211: [28/374] check agents_api.common.protocol.sessions 1212: [29/374] check agents_api.common.utils.messages 1213: [30/374] check agents_api.common.nlp 1214: [31/374] check agents_api.clients.async_s3 1215: [32/374] check agents_api.common.protocol.developers 1216: [33/374] check agents_api.queries.utils 1217: ERROR:pytype.matcher Invalid type: <class 'pytype.abstract.function.ParamSpecMatch'> ... 1314: [130/374] check agents_api.rec_sum.trim 1315: [131/374] check tests.sample_tasks.test_find_selector 1316: [132/374] check tests.test_chat_routes 1317: [133/374] check agents_api.common.__init__ 1318: [134/374] check agents_api.routers.internal.__init__ 1319: [135/374] check agents_api.rec_sum.entities 1320: [136/374] check agents_api.queries.developers.__init__ 1321: [137/374] check agents_api.queries.docs.search_docs_hybrid 1322: FAILED: /home/runner/work/julep/julep/agents-api/.pytype/pyi/agents_api/queries/docs/search_docs_hybrid.pyi 1323: /home/runner/work/julep/julep/agents-api/.venv/bin/python -m pytype.main --disable pyi-error --imports_info /home/runner/work/julep/julep/agents-api/.pytype/imports/agents_api.queries.docs.search_docs_hybrid.imports --module-name agents_api.queries.docs.search_docs_hybrid --platform linux -V 3.12 -o /home/runner/work/julep/julep/agents-api/.pytype/pyi/agents_api/queries/docs/search_docs_hybrid.pyi --analyze-annotated --nofail --none-is-not-bool --quick --strict-none-binding /home/runner/work/julep/julep/agents-api/agents_api/queries/docs/search_docs_hybrid.py 1324: /home/runner/work/julep/julep/agents-api/agents_api/queries/docs/search_docs_hybrid.py:98:15: error: in fuse_results: No attribute 'model_copy' on None [attribute-error] 1325: In Optional[agents_api.autogen.Docs.Doc] 1326: doc = doc.model_copy() # or copy if you are using Pydantic 1327: ~~~~~~~~~~~~~~ 1328: /home/runner/work/julep/julep/agents-api/agents_api/queries/docs/search_docs_hybrid.py:105:1: error: in <module>: Type annotation for embedding does not match type of assignment [annotation-type-mismatch] ... 1423: # fuse them 1424: ~~~~~~~~~~~~~~~ 1425: fused = fuse_results(text_results, embed_results, alpha) 1426: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1427: # Then pick top K overall 1428: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1429: return fused[:k] 1430: ~~~~~~~~~~~~~~~~~~~~ 1431: For more details, see https://google.github.io/pytype/errors.html ... 1439: [145/374] check agents_api.workflows.__init__ 1440: [146/374] check agents_api.metrics.__init__ 1441: [147/374] check agents_api.dependencies.__init__ 1442: [148/374] check agents_api.common.exceptions.sessions 1443: [149/374] check agents_api.common.utils.debug 1444: [150/374] check agents_api.queries.docs.mmr 1445: [151/374] check tests.test_user_routes 1446: [152/374] check tests.test_workflow_routes 1447: /home/runner/work/julep/julep/agents-api/tests/test_workflow_routes.py:65:1: error: : Stray type comment: object [ignored-type-comment] 1448: # type: object~~~~~~~~~~~~~~~ 1449: # type: object 1450: /home/runner/work/julep/julep/agents-api/tests/test_workflow_routes.py:114:1: error: : Stray type comment: object [ignored-type-comment] 1451: # type: object~~~~~~~~~~~~~~~ 1452: # type: object 1453: For more details, see https://google.github.io/pytype/errors.html#ignored-type-comment ... 1466: [165/374] check agents_api.queries.entries.__init__ 1467: [166/374] check tests.test_agent_routes 1468: [167/374] check agents_api.routers.sessions.exceptions 1469: [168/374] check agents_api.rec_sum.summarize 1470: [169/374] check tests.test_activities 1471: [170/374] check tests.test_sessions 1472: [171/374] check agents_api.queries.agents.__init__ 1473: [172/374] check agents_api.queries.tasks.__init__ 1474: ninja: build stopped: cannot make progress due to previous errors. 1475: Computing dependencies 1476: Generated API key since not set in the environment: 51505566278392106819051827099590 1477: Sentry DSN not found. Sentry will not be enabled. 1478: Analyzing 349 sources with 0 local dependencies 1479: Leaving directory '.pytype' 1480: ##[error]Process completed with exit code 1.

✨ CI feedback usage guide:

The CI feedback tool (/checks) automatically triggers when a PR has a failed check.
The tool analyzes the failed checks and provides several feedbacks:

Failed stage
Failed test name
Failure summary
Relevant error logs

In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:

/checks "https://github.com/{repo_name}/actions/runs/{run_number}/job/{job_number}"

where {repo_name} is the name of the repository, {run_number} is the run number of the failed check, and {job_number} is the job number of the failed check.

Configuration options

enable_auto_checks_feedback - if set to true, the tool will automatically provide feedback when a check is failed. Default is true.
excluded_checks_list - a list of checks to exclude from the feedback, for example: ["check1", "check2"]. Default is an empty list.
enable_help_text - if set to true, the tool will provide a help message with the feedback. Default is true.
persistent_comment - if set to true, the tool will overwrite a previous checks comment with the new feedback. Default is true.
final_update_message - if persistent_comment is true and updating a previous checks message, the tool will also create a new message: "Persistent checks updated to latest commit". Default is true.

See more information about the checks tool in the docs.

ellipsis-dev

❌ Changes requested. Reviewed everything up to 6c77490 in 1 minute and 30 seconds

More details

Looked at 848 lines of code in 10 files
Skipped 0 files when reviewing.
Skipped posting 8 drafted comments based on config settings.

1. agents-api/agents_api/queries/docs/search_docs_hybrid.py:13

Draft comment:
The import statement for run_concurrently is unnecessary since it's not used in the code. Consider removing it to clean up the imports.
Reason this comment was not posted:
Confidence changes required: 10%
The import statement for run_concurrently is unnecessary since it's not used in the code.

2. agents-api/agents_api/queries/docs/search_docs_hybrid.py:134

Draft comment:
Appending an empty list to tasks is incorrect. You should append a coroutine or task. Consider using None or a similar placeholder if you intend to skip adding a task.
Reason this comment was not posted:
Confidence changes required: 50%
The code uses tasks.append([]) which is incorrect for appending tasks. It should append a coroutine instead.

3. agents-api/agents_api/queries/docs/search_docs_hybrid.py:147

Draft comment:
Appending an empty list to tasks is incorrect. You should append a coroutine or task. Consider using None or a similar placeholder if you intend to skip adding a task.
Reason this comment was not posted:
Confidence changes required: 50%
The code uses tasks.append([]) which is incorrect for appending tasks. It should append a coroutine instead.

4. agents-api/agents_api/queries/docs/search_docs_hybrid.py:154

Draft comment:
Ensure that at least one valid task is added to tasks before calling gather. If both text_query and embedding are empty, tasks will contain only empty lists, leading to an error.
Reason this comment was not posted:
Confidence changes required: 50%
The code does not handle the case where both text_query and embedding are empty, which could lead to an error when calling gather.

5. agents-api/agents_api/queries/docs/search_docs_hybrid.py:110

Draft comment:
The embedding parameter should have a default value of an empty list [] instead of None to avoid type issues and simplify checks.
Reason this comment was not posted:
Confidence changes required: 50%
The embedding parameter should have a default value of an empty list instead of None to avoid type issues.

6. agents-api/agents_api/queries/docs/create_doc.py:69

Draft comment:
Consider adding more exception handling for other potential database errors, such as asyncpg.DataError or asyncpg.SyntaxOrAccessError, to make the function more robust.
Reason this comment was not posted:
Confidence changes required: 50%
The rewrap_exceptions decorator is used to handle exceptions, but it doesn't cover all possible exceptions that might occur during database operations.

7. agents-api/agents_api/queries/docs/delete_doc.py:43

Draft comment:
Consider adding more exception handling for other potential database errors, such as asyncpg.DataError or asyncpg.SyntaxOrAccessError, to make the function more robust.
Reason this comment was not posted:
Confidence changes required: 50%
The rewrap_exceptions decorator is used to handle exceptions, but it doesn't cover all possible exceptions that might occur during database operations.

8. agents-api/agents_api/queries/docs/get_doc.py:29

Draft comment:
Consider adding exception handling for potential database errors, such as asyncpg.DataError or asyncpg.SyntaxOrAccessError, to make the function more robust.
Reason this comment was not posted:
Confidence changes required: 50%
The rewrap_exceptions decorator is used to handle exceptions, but it doesn't cover all possible exceptions that might occur during database operations.

Workflow ID: wflow_0SgPnESfFL0Scwfr

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

agents-api/agents_api/queries/docs/mmr.py

ellipsis-dev

👍 Looks good to me! Incremental review on 93673b7 in 1 minute and 3 seconds

More details

Looked at 1195 lines of code in 21 files
Skipped 0 files when reviewing.
Skipped posting 5 drafted comments based on config settings.

1. agents-api/agents_api/queries/docs/get_doc.py:5

Draft comment:
The import statement for 'asyncpg' is unnecessary and can be removed to clean up the code. This is also applicable in other files like 'list_docs.py', 'search_docs_by_embedding.py', and 'search_docs_by_text.py'.
Reason this comment was not posted:
Confidence changes required: 50%
The import statement for 'asyncpg' is unnecessary in multiple files as it is not used anywhere in the code. Removing it will clean up the code.

2. agents-api/agents_api/queries/docs/list_docs.py:5

Draft comment:
The import statement for 'asyncpg' is unnecessary and can be removed to clean up the code. This is also applicable in other files like 'search_docs_by_embedding.py', and 'search_docs_by_text.py'.
Reason this comment was not posted:
Confidence changes required: 50%
The import statement for 'asyncpg' is unnecessary in multiple files as it is not used anywhere in the code. Removing it will clean up the code.

3. agents-api/agents_api/queries/docs/search_docs_by_embedding.py:6

Draft comment:
The import statement for 'asyncpg' is unnecessary and can be removed to clean up the code. This is also applicable in 'search_docs_by_text.py'.
Reason this comment was not posted:
Confidence changes required: 50%
The import statement for 'asyncpg' is unnecessary in multiple files as it is not used anywhere in the code. Removing it will clean up the code.

4. agents-api/agents_api/queries/docs/search_docs_by_text.py:6

Draft comment:
The import statement for 'asyncpg' is unnecessary and can be removed to clean up the code.
Reason this comment was not posted:
Confidence changes required: 50%
The import statement for 'asyncpg' is unnecessary in multiple files as it is not used anywhere in the code. Removing it will clean up the code.

5. agents-api/agents_api/queries/docs/create_doc.py:98

Draft comment:
The 'org' option is removed from the owner_type Literal, but this change is not reflected in the PR description. This should be documented for clarity. This change is also applicable in 'delete_doc.py', 'get_doc.py', and 'list_docs.py'.
Reason this comment was not posted:
Confidence changes required: 50%
The 'org' option is removed from the owner_type Literal in multiple files, but this change is not reflected in the PR description. This should be documented for clarity.

Workflow ID: wflow_pnbwiXEmTCJVnbrw

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

❌ Changes requested. Incremental review on dc0ec36 in 48 seconds

More details

Looked at 939 lines of code in 28 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. agents-api/agents_api/queries/users/delete_user.py:59

Draft comment:
The asyncpg.exceptions.UniqueViolationError should not be handled here as it is not relevant to delete operations. Consider removing this exception handling.
Reason this comment was not posted:
Marked as duplicate.

Workflow ID: wflow_WUrKdAU41fxw9Cd0

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

agents-api/agents_api/queries/agents/delete_agent.py

ellipsis-dev

👍 Looks good to me! Incremental review on 831e950 in 51 seconds

More details

Looked at 440 lines of code in 11 files
Skipped 0 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. agents-api/agents_api/queries/docs/get_doc.py:36

Draft comment:
Using ast.literal_eval on d["content"] can be unsafe if the content is not guaranteed to be a valid Python literal. Consider using a safer method to parse or handle the content.
Reason this comment was not posted:
Comment was on unchanged code.

2. agents-api/agents_api/queries/docs/embed_snippets.py:10

Draft comment:
vectorizer_query is set to None. This is a placeholder and should be replaced with an actual query before deployment.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is essentially repeating information that's already explicitly stated in a TODO comment one line above. The TODO comment is more visible and serves the same purpose. This makes the PR comment redundant and not adding any new information or value.
Perhaps the comment is trying to emphasize the importance of not deploying with a None query, which could be a critical issue.
While deployment concerns are valid, the existing TODO comment already makes it clear this needs to be replaced, and deployment issues would be caught by basic testing since the function would fail immediately.
Delete the comment as it's redundant with the existing TODO comment and doesn't provide additional actionable value.

3. agents-api/agents_api/queries/entries/list_entries.py:88

Draft comment:
Ensure sort_by and direction are validated and sanitized to prevent SQL injection, as they are used in string interpolation for SQL queries.
Reason this comment was not posted:
Comment did not seem useful.

Workflow ID: wflow_TPwdOI1YwvBHnQ9P

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 249513d in 1 minute and 4 seconds

More details

Looked at 1148 lines of code in 15 files
Skipped 0 files when reviewing.
Skipped posting 5 drafted comments based on config settings.

1. agents-api/agents_api/queries/docs/create_doc.py:153

Draft comment:
The function assumes data.content is always a list. Consider adding a check or conversion to handle cases where data.content might be a string or other type to prevent unexpected behavior.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment suggests adding a check that already exists in the code. The code already handles both list and non-list cases appropriately through the isinstance() check and separate logic paths. The comment appears to be incorrect in stating that the function "assumes data.content is always a list" when it clearly doesn't make that assumption.
Could there be other types besides list and non-list that need handling? Could the CreateDocRequest type definition enforce the content type making this check unnecessary?
The code's else branch handles any non-list type appropriately by treating it as a single content item. The type validation would be handled by FastAPI's request validation via CreateDocRequest if needed.
The comment should be deleted because it incorrectly suggests adding a check that already exists in the code. The code already properly handles both list and non-list content.

2. agents-api/agents_api/queries/docs/delete_doc.py:25

Draft comment:
The EXISTS clause in the SQL query might be redundant since the doc_owners entry is already deleted in the deleted_owners CTE. Consider revising the logic to ensure the docs entry is only deleted if the doc_owners entry existed prior to deletion.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment raises an interesting point about the query logic - we delete from doc_owners first, then check if that same record existed in doc_owners before deleting from docs. However, this could actually be intentional behavior to ensure atomicity and proper ordering. Without deeper knowledge of the data model and requirements, we can't be certain this is actually a problem vs a deliberate safeguard.
I may be missing important context about transaction isolation levels and race conditions that could make this pattern necessary. The EXISTS check might serve as an important guard rail.
While the comment raises an interesting point about query structure, we don't have enough context to confidently say this is incorrect or needs to be changed.
This comment is speculative and requires more context about the data model and requirements to validate. Following our rules, we should err on the side of removing speculative comments.

3. agents-api/agents_api/queries/docs/list_docs.py:135

Draft comment:
The metadata_filter is directly appended to the query string, which could lead to SQL injection if not properly handled. Ensure that metadata keys and values are safely included in the query to prevent SQL injection vulnerabilities.
Reason this comment was not posted:
Comment did not seem useful.

4. agents-api/agents_api/queries/docs/search_docs_by_text.py:19

Draft comment:
The owner_types and owner_ids are passed as JSONB arrays, which might not be correctly handled by the SQL function. Ensure that these arrays are properly converted to UUID arrays to prevent unexpected behavior.
Reason this comment was not posted:
Comment did not seem useful.

5. agents-api/tests/test_docs_queries.py:11

Draft comment:
Consider adding tests for search_docs_by_embedding and search_docs_hybrid to ensure comprehensive coverage of the search functionalities.
Reason this comment was not posted:
Confidence changes required: 50%
The test_docs_queries.py file has a test for search_docs_by_text but it lacks tests for search_docs_by_embedding and search_docs_hybrid. Adding these tests would ensure comprehensive coverage of the search functionalities.

Workflow ID: wflow_qeLRxNLHNTGhklta

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Signed-off-by: Diwank Singh Tomer <[email protected]>

ellipsis-dev

👍 Looks good to me! Reviewed everything up to 934db8a in 1 minute and 18 seconds

More details

Looked at 4562 lines of code in 79 files
Skipped 0 files when reviewing.
Skipped posting 6 drafted comments based on config settings.

1. agents-api/agents_api/queries/utils.py:32

Draft comment:
The generate_canonical_name function's docstring is outdated. It mentions converting a display name to a canonical name, but the function no longer takes a name parameter. Update the docstring to reflect the current functionality.
Reason this comment was not posted:
Comment looked like it was already resolved.

2. agents-api/agents_api/queries/docs/create_doc.py:181

Draft comment:
The create_doc function's docstring should mention the default value for the modality parameter, which is 'text'. This will help developers understand the default behavior of the function.
Reason this comment was not posted:
Confidence changes required: 50%
The create_doc function in create_doc.py has a potential issue with the modality parameter. It defaults to 'text', but the docstring does not mention this default value. This could lead to confusion for developers using this function.

3. agents-api/agents_api/queries/docs/delete_doc.py:79

Draft comment:
The delete_doc function's docstring should explicitly mention that the owner_type parameter is required. This will help developers understand the necessary inputs for the function.
Reason this comment was not posted:
Confidence changes required: 50%
The delete_doc function in delete_doc.py has a potential issue with the owner_type parameter. It is a required parameter, but the docstring does not explicitly mention this requirement. This could lead to confusion for developers using this function.

4. agents-api/agents_api/queries/docs/list_docs.py:152

Draft comment:
The list_docs function's docstring should mention the default value for the include_without_embeddings parameter, which is False. This will help developers understand the default behavior of the function.
Reason this comment was not posted:
Confidence changes required: 50%
The list_docs function in list_docs.py has a potential issue with the include_without_embeddings parameter. It defaults to False, but the docstring does not mention this default value. This could lead to confusion for developers using this function.

5. agents-api/agents_api/queries/docs/search_docs_by_embedding.py:86

Draft comment:
The search_docs_by_embedding function's docstring should mention the default value for the confidence parameter, which is 0.5. This will help developers understand the default behavior of the function.
Reason this comment was not posted:
Confidence changes required: 50%
The search_docs_by_embedding function in search_docs_by_embedding.py has a potential issue with the confidence parameter. It defaults to 0.5, but the docstring does not mention this default value. This could lead to confusion for developers using this function.

6. agents-api/agents_api/queries/docs/search_docs_by_text.py:91

Draft comment:
The search_docs_by_text function's docstring should mention the default value for the search_language parameter, which is 'english'. This will help developers understand the default behavior of the function.
Reason this comment was not posted:
Confidence changes required: 50%
The search_docs_by_text function in search_docs_by_text.py has a potential issue with the search_language parameter. It defaults to 'english', but the docstring does not mention this default value. This could lead to confusion for developers using this function.

Workflow ID: wflow_8XNb0cnAXVtyPRMM

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

qodo-merge-pro-for-open-source · 2024-12-23T05:25:15Z

Persistent review updated to latest commit 934db8a

wip(agents-api): Doc queries

6c77490

qodo-merge-pro-for-open-source bot added the Review effort [1-5]: 4 label Dec 20, 2024

refactor: Lint agents-api (CI)

b427e38

ellipsis-dev bot reviewed Dec 20, 2024

View reviewed changes

agents-api/agents_api/queries/docs/mmr.py Show resolved Hide resolved

Vedantsahai18 and others added 2 commits December 20, 2024 14:43

fix: fixed the CRD doc queries + added tests

93673b7

refactor: Lint agents-api (CI)

7b0be5c

ellipsis-dev bot reviewed Dec 20, 2024

View reviewed changes

Vedantsahai18 and others added 2 commits December 20, 2024 15:25

wip: initial set of exceptions added

dc0ec36

refactor: Lint agents-api (CI)

32d67bc

ellipsis-dev bot reviewed Dec 20, 2024

View reviewed changes

agents-api/agents_api/queries/agents/delete_agent.py Show resolved Hide resolved

Vedantsahai18 and others added 2 commits December 20, 2024 16:40

chore: added embedding reading + doctrings updates

831e950

refactor: Lint agents-api (CI)

74add36

ellipsis-dev bot reviewed Dec 20, 2024

View reviewed changes

Vedantsahai18 and others added 2 commits December 21, 2024 03:12

chore: updated migrations + added indices support

249513d

refactor: Lint agents-api (CI)

d7d9cd4

ellipsis-dev bot reviewed Dec 21, 2024

View reviewed changes

Merge branch 'f/switch-to-pg' into f/doc-queries

3600a92

creatorrr marked this pull request as draft December 21, 2024 08:24

creatorrr and others added 9 commits December 21, 2024 16:37

fix(memory-store): Remove redundant indices

2900786

Signed-off-by: Diwank Singh Tomer <[email protected]>

fix: Miscellaneous fixes

c2d54a4

Signed-off-by: Diwank Singh Tomer <[email protected]>

Merge branch 'f/switch-to-pg' into f/doc-queries

65641cb

refactor: Lint agents-api (CI)

f80ff87

fix: Fix search_by_text; remove tools.task_version column

747aceb

Signed-off-by: Diwank Singh Tomer <[email protected]>

fix: Fix canonical name collisions in tests

b946119

Signed-off-by: Diwank Singh Tomer <[email protected]>

refactor: Lint agents-api (CI)

4fc4f0e

fix: Fix search by embedding

e2181fb

Signed-off-by: Diwank Singh Tomer <[email protected]>

refactor: Lint agents-api (CI)

934db8a

creatorrr changed the title ~~wip(agents-api): Add Doc sql queries~~ feat(agents-api): Add Doc sql queries Dec 23, 2024

creatorrr marked this pull request as ready for review December 23, 2024 05:23

creatorrr merged commit 791a8a9 into f/switch-to-pg Dec 23, 2024
1 check passed

creatorrr deleted the f/doc-queries branch December 23, 2024 05:23

ellipsis-dev bot reviewed Dec 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agents-api): Add Doc sql queries #979

feat(agents-api): Add Doc sql queries #979

Vedantsahai18 commented Dec 20, 2024 •

edited by qodo-merge-pro-for-open-source bot

Loading

qodo-merge-pro-for-open-source bot commented Dec 20, 2024 •

edited

Loading

qodo-merge-pro-for-open-source bot commented Dec 20, 2024 •

edited

Loading

qodo-merge-pro-for-open-source bot commented Dec 20, 2024 •

edited

Loading

Configuration options

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

qodo-merge-pro-for-open-source bot commented Dec 23, 2024

feat(agents-api): Add Doc sql queries #979

feat(agents-api): Add Doc sql queries #979

Conversation

Vedantsahai18 commented Dec 20, 2024 • edited by qodo-merge-pro-for-open-source bot Loading

PR Type

Description

Changes walkthrough 📝

qodo-merge-pro-for-open-source bot commented Dec 20, 2024 • edited Loading

PR Reviewer Guide 🔍

(Review updated until commit 934db8a)

qodo-merge-pro-for-open-source bot commented Dec 20, 2024 • edited Loading

PR Code Suggestions ✨

Previous suggestions

qodo-merge-pro-for-open-source bot commented Dec 20, 2024 • edited Loading

CI Failure Feedback 🧐

(Checks updated until commit 934db8a)

Configuration options

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

qodo-merge-pro-for-open-source bot commented Dec 23, 2024

Vedantsahai18 commented Dec 20, 2024 •

edited by qodo-merge-pro-for-open-source bot

Loading

qodo-merge-pro-for-open-source bot commented Dec 20, 2024 •

edited

Loading

(Review updated until commit `934db8a`)

qodo-merge-pro-for-open-source bot commented Dec 20, 2024 •

edited

Loading

qodo-merge-pro-for-open-source bot commented Dec 20, 2024 •

edited

Loading

(Checks updated until commit `934db8a`)