-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
graphrag CLI Update command issue #1600
Comments
We are hitting a similar issue. It looks like the code considers that not having any documents to add is an error: # Fail on empty delta dataset
if delta_dataset.new_inputs.empty:
error_msg = "Incremental Indexing Error: No new documents to process."
raise ValueError(error_msg)
It would be nice if instead the update would succeed and just keep the data as-is. We are running indexing on scheduled jobs and it is OK if there is nothing new since the last run -- the job can be a noop. If helpful, I could send a PR to update the logic of that piece of code. Besides failing when there are no new document to add, it looks like the code will also fail if there are only documents to delete, and the code might not handle document deletions at all. Likely a different problem, so opening a separate issue to track. |
One workaround in our case has been to replicate the logic in
document_current = {doc for doc in os.listdir(f'{root_dir}/input') if doc.endswith('.txt')}
document_previous = {doc for doc in pd.read_parquet(f'{root_dir}/output/create_final_documents.parquet')['title'].unique().tolist()}
document_count = len(document_current)
document_added = len([doc for doc in document_current if doc not in document_previous])
document_removed = len([doc for doc in document_previous if doc not in document_current])
if document_added == 0:
mssparkutils.notebook.exit('{}') |
We've added this to the backlog - we'll make sure updates without new content can exit safely |
Discussed in #1599
Originally posted by ajain85 January 9, 2025
HI All, I have running graphrag cli for update command for blob but getting below error , I am using azure blob and azure search AI services to save parquet file in blob and update indexing in search ai. but getting below error , Can anyone suggest me the solution does it graphrag library error or anything I am missing in setting .yml file .
Error->
ValueError: Incremental Indexing Error: No new documents to process.
this is how I have updated my update file setting
update_index_storage:
type: "blob" # or blob
connection_string: ""
container_name: "graphrag"
base_dir: "output"
storage_account_blob_url: "https://*.blob.core.windows.net/"
error_msg = 'Incremental Indexing Error: No new documents to process.' │ │
│ │ is_update_run = True │ │
│ │ logger = <graphrag.logger.rich_progress.RichProgressLogger object at 0x000001E99D45B2D0> │ │
│ │ progress_logger = <graphrag.logger.rich_progress.RichProgressLogger object at 0x000001E99D45B2D0> │ │
│ │ root_dir = 'C:\Users\JAINAB\UNHCR Workspace\test_graphrag\cligraphrag' │ │
│ │ run_id = '20250109-135013' │ │
│ │ storage = <graphrag.storage.blob_pipeline_storage.BlobPipelineStorage object at │ │
│ │ 0x000001E99D472250> │ │
│ │ storage_config = { │ │
│ │ │ 'type': "blob", │ │
│ │ │ 'base_dir': 'output', │ │
│ │ │ 'connection_string': │ │
│ │ 'DefaultEndpointsProtocol=https;AccountName=d1hcrstgenaisharedxfc;AccountKey=K9Ya'… │ │
│ │ │ 'container_name': 'graphrag', │ │
│ │ │ 'storage_account_blob_url': │ │
│ │ 'https://d1hcrstgenaisharedxfc.blob.core.windows.net/', │ │
│ │ │ 'cosmosdb_account_url': None │ │
│ │ } │ │
│ │ update_index_storage = <graphrag.storage.file_pipeline_storage.FilePipelineStorage object at │ │
│ │ 0x000001E99F017710> │ │
│ │ update_storage_config = { │ │
│ │ │ 'type': "file", │ │
│ │ │ 'base_dir': 'C:\Users\JAINAB\UNHCR │ │
│ │ Workspace\test_graphrag\cligraphrag\update_output', │ │
│ │ │ 'connection_string': None, │ │
│ │ │ 'container_name': None, │ │
│ │ │ 'storage_account_blob_url': None, │ │
│ │ │ 'cosmosdb_account_url': None │ │
│ │ } │ │
│ │ workflows = [ │ │
│ │ │ 'create_base_text_units', │ │
│ │ │ 'create_final_documents', │ │
│ │ │ 'extract_graph', │ │
│ │ │ 'compute_communities', │ │
│ │ │ 'create_final_entities', │ │
│ │ │ 'create_final_relationships', │ │
│ │ │ 'create_final_nodes', │ │
│ │ │ 'create_final_communities', │ ││ │ │ 'create_final_text_units', │ ││ │ │ 'create_final_community_reports', │ ││ │ │ ... +1 │ ││ │ ] │ ││ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ValueError: Incremental Indexing Error: No new documents to process.
The text was updated successfully, but these errors were encountered: