You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues and this feature is not already filed.
My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
I believe this is a legitimate feature request, not just a question. If this is a question, please use the Discussions area.
Is your feature request related to a problem? Please describe.
Hi,
I have around 15,000 chunks stored in individual .txt files, and I've used batches of 500 .txt files to generate graphs. In the output folder, various timestamped folders were created. When processing the new batches of 500 .txt files, I removed the older .txt files from the input folder. As previously mentioned, the new timestamp folders were expected to utilize cached data for indexing, but that hasn't been the case; the answers from earlier batches are not accurate.
I would like to understand how to merge all the timestamped folders into a single artifacts folder so that I can use the search API effectively.
Describe the solution you'd like
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Is there a particular reason you need to remove the old txt files from your input folder? The cache is invoked when the system encounters content that would produce an identical request - i.e., the same LLM parameters, and the same prompt (which will be identical for text chunks that were already processed). I would not expect the cache to be utilized for your new chunks, so that behavior sounds correct. But if you leave the old files, they should utilize the cache and therefore save on LLM calls.
That said, the later steps will recompute, because presumably your new content will result in updates to the graph, and therefore the community composition and summaries.
There is more info on incremental indexing at #741
I was under the impression that the old input data cache would be utilized with the new input files so I removed old files. Now it's clear. will keep old files and continue adding new data.
Would you like me to use --resume command while using CLI to consider previously generated cached data and parquet files?
python -m graphrag.index --init --root ./test --resume "20240820-143721"
Do you need to file an issue?
Is your feature request related to a problem? Please describe.
Hi,
I have around 15,000 chunks stored in individual .txt files, and I've used batches of 500 .txt files to generate graphs. In the output folder, various timestamped folders were created. When processing the new batches of 500 .txt files, I removed the older .txt files from the input folder. As previously mentioned, the new timestamp folders were expected to utilize cached data for indexing, but that hasn't been the case; the answers from earlier batches are not accurate.
I would like to understand how to merge all the timestamped folders into a single artifacts folder so that I can use the search API effectively.
Describe the solution you'd like
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: