You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Oftentimes, I'll get an error when calling assign_document_to_collection because it cannot find the document in the database.
Looking at the code, the row in document_info is not created before the result of assign_document_to_collection is returned. It is done in the ingest-files workflow:
SELECT 1 FROM {self._get_table_name('document_info')}
WHERE document_id = $1
"""
document_exists=awaitself.fetchrow_query(
document_check_query, [document_id]
)
ifnotdocument_exists:
raiseR2RException(
status_code=404, message="Document not found"
)
How to solve this? Would it be possible to pass also the collection_id when calling the ingest_files? I noticed this workflow already added the file to the default collection.
The text was updated successfully, but these errors were encountered:
We are planning on extending the ingest_files endpoint to support exactly the behavior you outline above. Several other developers have requested this exact same functionality.
As for your other question around document info creation, there is a specific reason behind this implementation. In order to properly assign a document to a collection we must update the collection ids of the underlying chunks. It would have required non-trivial engineering work to have the implementation align with what you describe, so instead we add the document to the collection after ingestion is complete.
But as the default collection is also added to the chunk, the other collection_id could be in a context like for the metadata and added at the same time, no?
Describe the bug
My code is like this:
Oftentimes, I'll get an error when calling
assign_document_to_collection
because it cannot find the document in the database.Looking at the code, the row in
document_info
is not created before the result ofassign_document_to_collection
is returned. It is done in theingest-files
workflow:R2R/py/core/main/api/ingestion_router.py
Lines 164 to 174 in c9be2c5
So when calling the
assign_document_to_collection
, the document_info record does not exist yet:R2R/py/core/providers/database/collection.py
Lines 451 to 462 in c9be2c5
How to solve this? Would it be possible to pass also the collection_id when calling the
ingest_files
? I noticed this workflow already added the file to the default collection.The text was updated successfully, but these errors were encountered: