How to Handle Incremental Updates to Indexed Data? #511
Replies: 3 comments 2 replies
-
Check out this discussion - #354 |
Beta Was this translation helpful? Give feedback.
-
I will probably have to defer the final answer to someone from our graphrag research team, and they should correct me if I'm wrong, but I can at least offer my understanding of the BLUE/RED scenario you brought up. Dealing with this sort of ambiguity and contradiction is actually one of the use cases for GraphRAG. Think of the graphrag indexing process not so much as a typical RAG vector indexer. It is not indexing to facilitate look-up, but 'reading for meaning.' Much like you or I, as we take in new information, we abstract it and relate it to things we already know. When confronted with new, conflicting information, we evaluate it in the context of what we know and make meaning about it and grow our understanding of it. GraphRAG works similarly, in that it would add new nodes to its knowledge graph about the claim that the sky was RED, add this information to its "Sky-related information" community cluster, and then update its mental model about what the sky is. So, if the GraphRAG index has tons of information supporting the claim that the sky is BLUE, it will still tell you the sky is BLUE when asked, but it may caveat it and mention that there's at least one instance where someone said it was really RED. These are definitely in line with the use cases we are trying to target and improve upon with GraphRAG, so please share your findings with the team as you experiment with red-filled skies. If you've not checked out the MSR Blog Post, please do. It gives some examples of how GraphRAG handles conflicting information - https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/ Does this help a little? |
Beta Was this translation helpful? Give feedback.
-
Tracking incremental indexing with #741 |
Beta Was this translation helpful? Give feedback.
-
Hey GraphRAG Community,
I’m loving GraphRAG for building knowledge graphs, but I’ve hit a bit of a snag with updating already indexed data. Here’s the situation:
With standard RAG, adding new documents to the index is simple since they’re unrelated. But with GraphRAG, everything’s interconnected, so I’m wondering how to update parts of the data without re-indexing everything.
Imagine I’ve got a text file with 5 paragraphs: A, B, C, D, and E. These are all indexed, and the graph is built. Now, if the content of paragraph C changes, what’s the best way to update this without messing up the entire graph?
Specific questions:
Is there a way to do a partial rebuild of the index for just the changed part of the graph?
How do we update paragraph C without causing conflicts between the old indexed data and the new data?
Any best practices for keeping the graph consistent and accurate when only parts of the data get updated?
For example, if paragraph C had important relationships or entities, will updating it cause issues? How can we manage and resolve these conflicts effectively?
Would really appreciate any tips or pointers on this!
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions