-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(agents-api): Add Doc sql queries #979
Conversation
PR Reviewer Guide 🔍(Review updated until commit 934db8a)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 934db8a
Previous suggestionsSuggestions up to commit 6c77490
|
CI Failure Feedback 🧐(Checks updated until commit 934db8a)
✨ CI feedback usage guide:The CI feedback tool (
In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:
where Configuration options
See more information about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to 6c77490 in 1 minute and 30 seconds
More details
- Looked at
848
lines of code in10
files - Skipped
0
files when reviewing. - Skipped posting
8
drafted comments based on config settings.
1. agents-api/agents_api/queries/docs/search_docs_hybrid.py:13
- Draft comment:
The import statement forrun_concurrently
is unnecessary since it's not used in the code. Consider removing it to clean up the imports. - Reason this comment was not posted:
Confidence changes required:10%
The import statement forrun_concurrently
is unnecessary since it's not used in the code.
2. agents-api/agents_api/queries/docs/search_docs_hybrid.py:134
- Draft comment:
Appending an empty list totasks
is incorrect. You should append a coroutine or task. Consider usingNone
or a similar placeholder if you intend to skip adding a task. - Reason this comment was not posted:
Confidence changes required:50%
The code usestasks.append([])
which is incorrect for appending tasks. It should append a coroutine instead.
3. agents-api/agents_api/queries/docs/search_docs_hybrid.py:147
- Draft comment:
Appending an empty list totasks
is incorrect. You should append a coroutine or task. Consider usingNone
or a similar placeholder if you intend to skip adding a task. - Reason this comment was not posted:
Confidence changes required:50%
The code usestasks.append([])
which is incorrect for appending tasks. It should append a coroutine instead.
4. agents-api/agents_api/queries/docs/search_docs_hybrid.py:154
- Draft comment:
Ensure that at least one valid task is added totasks
before callinggather
. If bothtext_query
andembedding
are empty,tasks
will contain only empty lists, leading to an error. - Reason this comment was not posted:
Confidence changes required:50%
The code does not handle the case where bothtext_query
andembedding
are empty, which could lead to an error when callinggather
.
5. agents-api/agents_api/queries/docs/search_docs_hybrid.py:110
- Draft comment:
Theembedding
parameter should have a default value of an empty list[]
instead ofNone
to avoid type issues and simplify checks. - Reason this comment was not posted:
Confidence changes required:50%
Theembedding
parameter should have a default value of an empty list instead ofNone
to avoid type issues.
6. agents-api/agents_api/queries/docs/create_doc.py:69
- Draft comment:
Consider adding more exception handling for other potential database errors, such asasyncpg.DataError
orasyncpg.SyntaxOrAccessError
, to make the function more robust. - Reason this comment was not posted:
Confidence changes required:50%
Therewrap_exceptions
decorator is used to handle exceptions, but it doesn't cover all possible exceptions that might occur during database operations.
7. agents-api/agents_api/queries/docs/delete_doc.py:43
- Draft comment:
Consider adding more exception handling for other potential database errors, such asasyncpg.DataError
orasyncpg.SyntaxOrAccessError
, to make the function more robust. - Reason this comment was not posted:
Confidence changes required:50%
Therewrap_exceptions
decorator is used to handle exceptions, but it doesn't cover all possible exceptions that might occur during database operations.
8. agents-api/agents_api/queries/docs/get_doc.py:29
- Draft comment:
Consider adding exception handling for potential database errors, such asasyncpg.DataError
orasyncpg.SyntaxOrAccessError
, to make the function more robust. - Reason this comment was not posted:
Confidence changes required:50%
Therewrap_exceptions
decorator is used to handle exceptions, but it doesn't cover all possible exceptions that might occur during database operations.
Workflow ID: wflow_0SgPnESfFL0Scwfr
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 93673b7 in 1 minute and 3 seconds
More details
- Looked at
1195
lines of code in21
files - Skipped
0
files when reviewing. - Skipped posting
5
drafted comments based on config settings.
1. agents-api/agents_api/queries/docs/get_doc.py:5
- Draft comment:
The import statement for 'asyncpg' is unnecessary and can be removed to clean up the code. This is also applicable in other files like 'list_docs.py', 'search_docs_by_embedding.py', and 'search_docs_by_text.py'. - Reason this comment was not posted:
Confidence changes required:50%
The import statement for 'asyncpg' is unnecessary in multiple files as it is not used anywhere in the code. Removing it will clean up the code.
2. agents-api/agents_api/queries/docs/list_docs.py:5
- Draft comment:
The import statement for 'asyncpg' is unnecessary and can be removed to clean up the code. This is also applicable in other files like 'search_docs_by_embedding.py', and 'search_docs_by_text.py'. - Reason this comment was not posted:
Confidence changes required:50%
The import statement for 'asyncpg' is unnecessary in multiple files as it is not used anywhere in the code. Removing it will clean up the code.
3. agents-api/agents_api/queries/docs/search_docs_by_embedding.py:6
- Draft comment:
The import statement for 'asyncpg' is unnecessary and can be removed to clean up the code. This is also applicable in 'search_docs_by_text.py'. - Reason this comment was not posted:
Confidence changes required:50%
The import statement for 'asyncpg' is unnecessary in multiple files as it is not used anywhere in the code. Removing it will clean up the code.
4. agents-api/agents_api/queries/docs/search_docs_by_text.py:6
- Draft comment:
The import statement for 'asyncpg' is unnecessary and can be removed to clean up the code. - Reason this comment was not posted:
Confidence changes required:50%
The import statement for 'asyncpg' is unnecessary in multiple files as it is not used anywhere in the code. Removing it will clean up the code.
5. agents-api/agents_api/queries/docs/create_doc.py:98
- Draft comment:
The 'org' option is removed from the owner_type Literal, but this change is not reflected in the PR description. This should be documented for clarity. This change is also applicable in 'delete_doc.py', 'get_doc.py', and 'list_docs.py'. - Reason this comment was not posted:
Confidence changes required:50%
The 'org' option is removed from the owner_type Literal in multiple files, but this change is not reflected in the PR description. This should be documented for clarity.
Workflow ID: wflow_pnbwiXEmTCJVnbrw
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Incremental review on dc0ec36 in 48 seconds
More details
- Looked at
939
lines of code in28
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. agents-api/agents_api/queries/users/delete_user.py:59
- Draft comment:
Theasyncpg.exceptions.UniqueViolationError
should not be handled here as it is not relevant to delete operations. Consider removing this exception handling. - Reason this comment was not posted:
Marked as duplicate.
Workflow ID: wflow_WUrKdAU41fxw9Cd0
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 831e950 in 51 seconds
More details
- Looked at
440
lines of code in11
files - Skipped
0
files when reviewing. - Skipped posting
3
drafted comments based on config settings.
1. agents-api/agents_api/queries/docs/get_doc.py:36
- Draft comment:
Usingast.literal_eval
ond["content"]
can be unsafe if the content is not guaranteed to be a valid Python literal. Consider using a safer method to parse or handle the content. - Reason this comment was not posted:
Comment was on unchanged code.
2. agents-api/agents_api/queries/docs/embed_snippets.py:10
- Draft comment:
vectorizer_query
is set toNone
. This is a placeholder and should be replaced with an actual query before deployment. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is essentially repeating information that's already explicitly stated in a TODO comment one line above. The TODO comment is more visible and serves the same purpose. This makes the PR comment redundant and not adding any new information or value.
Perhaps the comment is trying to emphasize the importance of not deploying with a None query, which could be a critical issue.
While deployment concerns are valid, the existing TODO comment already makes it clear this needs to be replaced, and deployment issues would be caught by basic testing since the function would fail immediately.
Delete the comment as it's redundant with the existing TODO comment and doesn't provide additional actionable value.
3. agents-api/agents_api/queries/entries/list_entries.py:88
- Draft comment:
Ensuresort_by
anddirection
are validated and sanitized to prevent SQL injection, as they are used in string interpolation for SQL queries. - Reason this comment was not posted:
Comment did not seem useful.
Workflow ID: wflow_TPwdOI1YwvBHnQ9P
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 249513d in 1 minute and 4 seconds
More details
- Looked at
1148
lines of code in15
files - Skipped
0
files when reviewing. - Skipped posting
5
drafted comments based on config settings.
1. agents-api/agents_api/queries/docs/create_doc.py:153
- Draft comment:
The function assumesdata.content
is always a list. Consider adding a check or conversion to handle cases wheredata.content
might be a string or other type to prevent unexpected behavior. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment suggests adding a check that already exists in the code. The code already handles both list and non-list cases appropriately through the isinstance() check and separate logic paths. The comment appears to be incorrect in stating that the function "assumes data.content is always a list" when it clearly doesn't make that assumption.
Could there be other types besides list and non-list that need handling? Could the CreateDocRequest type definition enforce the content type making this check unnecessary?
The code's else branch handles any non-list type appropriately by treating it as a single content item. The type validation would be handled by FastAPI's request validation via CreateDocRequest if needed.
The comment should be deleted because it incorrectly suggests adding a check that already exists in the code. The code already properly handles both list and non-list content.
2. agents-api/agents_api/queries/docs/delete_doc.py:25
- Draft comment:
TheEXISTS
clause in the SQL query might be redundant since thedoc_owners
entry is already deleted in thedeleted_owners
CTE. Consider revising the logic to ensure thedocs
entry is only deleted if thedoc_owners
entry existed prior to deletion. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment raises an interesting point about the query logic - we delete from doc_owners first, then check if that same record existed in doc_owners before deleting from docs. However, this could actually be intentional behavior to ensure atomicity and proper ordering. Without deeper knowledge of the data model and requirements, we can't be certain this is actually a problem vs a deliberate safeguard.
I may be missing important context about transaction isolation levels and race conditions that could make this pattern necessary. The EXISTS check might serve as an important guard rail.
While the comment raises an interesting point about query structure, we don't have enough context to confidently say this is incorrect or needs to be changed.
This comment is speculative and requires more context about the data model and requirements to validate. Following our rules, we should err on the side of removing speculative comments.
3. agents-api/agents_api/queries/docs/list_docs.py:135
- Draft comment:
Themetadata_filter
is directly appended to the query string, which could lead to SQL injection if not properly handled. Ensure that metadata keys and values are safely included in the query to prevent SQL injection vulnerabilities. - Reason this comment was not posted:
Comment did not seem useful.
4. agents-api/agents_api/queries/docs/search_docs_by_text.py:19
- Draft comment:
Theowner_types
andowner_ids
are passed as JSONB arrays, which might not be correctly handled by the SQL function. Ensure that these arrays are properly converted to UUID arrays to prevent unexpected behavior. - Reason this comment was not posted:
Comment did not seem useful.
5. agents-api/tests/test_docs_queries.py:11
- Draft comment:
Consider adding tests forsearch_docs_by_embedding
andsearch_docs_hybrid
to ensure comprehensive coverage of the search functionalities. - Reason this comment was not posted:
Confidence changes required:50%
Thetest_docs_queries.py
file has a test forsearch_docs_by_text
but it lacks tests forsearch_docs_by_embedding
andsearch_docs_hybrid
. Adding these tests would ensure comprehensive coverage of the search functionalities.
Workflow ID: wflow_qeLRxNLHNTGhklta
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Signed-off-by: Diwank Singh Tomer <[email protected]>
Signed-off-by: Diwank Singh Tomer <[email protected]>
Signed-off-by: Diwank Singh Tomer <[email protected]>
Signed-off-by: Diwank Singh Tomer <[email protected]>
Signed-off-by: Diwank Singh Tomer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 934db8a in 1 minute and 18 seconds
More details
- Looked at
4562
lines of code in79
files - Skipped
0
files when reviewing. - Skipped posting
6
drafted comments based on config settings.
1. agents-api/agents_api/queries/utils.py:32
- Draft comment:
Thegenerate_canonical_name
function's docstring is outdated. It mentions converting a display name to a canonical name, but the function no longer takes aname
parameter. Update the docstring to reflect the current functionality. - Reason this comment was not posted:
Comment looked like it was already resolved.
2. agents-api/agents_api/queries/docs/create_doc.py:181
- Draft comment:
Thecreate_doc
function's docstring should mention the default value for themodality
parameter, which is 'text'. This will help developers understand the default behavior of the function. - Reason this comment was not posted:
Confidence changes required:50%
Thecreate_doc
function increate_doc.py
has a potential issue with themodality
parameter. It defaults to 'text', but the docstring does not mention this default value. This could lead to confusion for developers using this function.
3. agents-api/agents_api/queries/docs/delete_doc.py:79
- Draft comment:
Thedelete_doc
function's docstring should explicitly mention that theowner_type
parameter is required. This will help developers understand the necessary inputs for the function. - Reason this comment was not posted:
Confidence changes required:50%
Thedelete_doc
function indelete_doc.py
has a potential issue with theowner_type
parameter. It is a required parameter, but the docstring does not explicitly mention this requirement. This could lead to confusion for developers using this function.
4. agents-api/agents_api/queries/docs/list_docs.py:152
- Draft comment:
Thelist_docs
function's docstring should mention the default value for theinclude_without_embeddings
parameter, which isFalse
. This will help developers understand the default behavior of the function. - Reason this comment was not posted:
Confidence changes required:50%
Thelist_docs
function inlist_docs.py
has a potential issue with theinclude_without_embeddings
parameter. It defaults toFalse
, but the docstring does not mention this default value. This could lead to confusion for developers using this function.
5. agents-api/agents_api/queries/docs/search_docs_by_embedding.py:86
- Draft comment:
Thesearch_docs_by_embedding
function's docstring should mention the default value for theconfidence
parameter, which is 0.5. This will help developers understand the default behavior of the function. - Reason this comment was not posted:
Confidence changes required:50%
Thesearch_docs_by_embedding
function insearch_docs_by_embedding.py
has a potential issue with theconfidence
parameter. It defaults to 0.5, but the docstring does not mention this default value. This could lead to confusion for developers using this function.
6. agents-api/agents_api/queries/docs/search_docs_by_text.py:91
- Draft comment:
Thesearch_docs_by_text
function's docstring should mention the default value for thesearch_language
parameter, which is 'english'. This will help developers understand the default behavior of the function. - Reason this comment was not posted:
Confidence changes required:50%
Thesearch_docs_by_text
function insearch_docs_by_text.py
has a potential issue with thesearch_language
parameter. It defaults to 'english', but the docstring does not mention this default value. This could lead to confusion for developers using this function.
Workflow ID: wflow_8XNb0cnAXVtyPRMM
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Persistent review updated to latest commit 934db8a |
PR Type
Enhancement, Tests
Description
Changes walkthrough 📝
1 files
test_docs_queries.py
Rewrite document query tests for PostgreSQL
agents-api/tests/test_docs_queries.py
19 files
search_docs_hybrid.py
Add hybrid document search functionality
agents-api/agents_api/queries/docs/search_docs_hybrid.py
create_doc.py
Implement document creation functionality
agents-api/agents_api/queries/docs/create_doc.py
list_docs.py
Add document listing functionality
agents-api/agents_api/queries/docs/list_docs.py
mmr.py
Add Maximal Marginal Relevance search
agents-api/agents_api/queries/docs/mmr.py
search_docs_by_text.py
Add text-based document search
agents-api/agents_api/queries/docs/search_docs_by_text.py
search_docs_by_embedding.py
Add embedding-based document search
agents-api/agents_api/queries/docs/search_docs_by_embedding.py
get_doc.py
Add document retrieval functionality
agents-api/agents_api/queries/docs/get_doc.py
delete_doc.py
Add document deletion functionality
agents-api/agents_api/queries/docs/delete_doc.py
000018_doc_search.up.sql
Add document search database functions
memory-store/migrations/000018_doc_search.up.sql
000006_docs.up.sql
Add document storage database schema
memory-store/migrations/000006_docs.up.sql
000008_tools.up.sql
Optimize tools table schema and constraints
memory-store/migrations/000008_tools.up.sql
000009_sessions.up.sql
Optimize sessions table indexes
memory-store/migrations/000009_sessions.up.sql
000003_users.up.sql
Remove redundant indexes from users table
memory-store/migrations/000003_users.up.sql
openapi-1.0.0.yaml
Add document embedding metadata fields
typespec/tsp-output/@typespec/openapi3/openapi-1.0.0.yaml
embedding_dimensions
000011_executions.up.sql
Optimize executions table indexes
memory-store/migrations/000011_executions.up.sql
models.tsp
Add document embedding metadata model fields
typespec/docs/models.tsp
000006_docs.down.sql
Improve docs table cleanup process
memory-store/migrations/000006_docs.down.sql
000016_entry_relations.up.sql
Optimize entry relations indexes
memory-store/migrations/000016_entry_relations.up.sql
000015_entries.up.sql
Simplify entries table index structure
memory-store/migrations/000015_entries.up.sql
1 files
__init__.py
Add document module initialization
agents-api/agents_api/queries/docs/init.py
1 files
000007_ann.up.sql
Update VoyageAI embedding configuration
memory-store/migrations/000007_ann.up.sql