How to Handle Incremental Updates to Indexed Data? #511

silhouettehustler · 2024-07-11T14:28:55Z

silhouettehustler
Jul 11, 2024

Hey GraphRAG Community,

I’m loving GraphRAG for building knowledge graphs, but I’ve hit a bit of a snag with updating already indexed data. Here’s the situation:

With standard RAG, adding new documents to the index is simple since they’re unrelated. But with GraphRAG, everything’s interconnected, so I’m wondering how to update parts of the data without re-indexing everything.

Imagine I’ve got a text file with 5 paragraphs: A, B, C, D, and E. These are all indexed, and the graph is built. Now, if the content of paragraph C changes, what’s the best way to update this without messing up the entire graph?

Specific questions:

Is there a way to do a partial rebuild of the index for just the changed part of the graph?
How do we update paragraph C without causing conflicts between the old indexed data and the new data?
Any best practices for keeping the graph consistent and accurate when only parts of the data get updated?
For example, if paragraph C had important relationships or entities, will updating it cause issues? How can we manage and resolve these conflicts effectively?

Would really appreciate any tips or pointers on this!

Thanks!

timothymeyers · 2024-07-11T17:26:48Z

timothymeyers
Jul 11, 2024

Check out this discussion - #354

1 reply

silhouettehustler Jul 11, 2024
Author

Hi @timothymeyers, thank you for a very prompt reply.

I have had a look at that particular discussion first before opening a new one but I felt like it was a different scenario compared to what I'm trying to get to.

To simplify my question let me try and use an example:

Lets say I had a text document I ran the indexing process on and now I got the graph built. Inside the input text document there was a paragraph that stated the sky was BLUE.

Now, If I was to add another document to the input folder and run the indexing process only on that document, but instead this document now states the sky is RED, which colour is the sky at this point?

We have not removed the previously indexed information from the graph where we learnt the sky was BLUE we just added another contradicting statement instead, how does this conflict get resolved behind the scenes or does it get resolved at all?

timothymeyers · 2024-07-11T19:09:18Z

timothymeyers
Jul 11, 2024

I will probably have to defer the final answer to someone from our graphrag research team, and they should correct me if I'm wrong, but I can at least offer my understanding of the BLUE/RED scenario you brought up. Dealing with this sort of ambiguity and contradiction is actually one of the use cases for GraphRAG.

Think of the graphrag indexing process not so much as a typical RAG vector indexer. It is not indexing to facilitate look-up, but 'reading for meaning.' Much like you or I, as we take in new information, we abstract it and relate it to things we already know. When confronted with new, conflicting information, we evaluate it in the context of what we know and make meaning about it and grow our understanding of it.

GraphRAG works similarly, in that it would add new nodes to its knowledge graph about the claim that the sky was RED, add this information to its "Sky-related information" community cluster, and then update its mental model about what the sky is.

So, if the GraphRAG index has tons of information supporting the claim that the sky is BLUE, it will still tell you the sky is BLUE when asked, but it may caveat it and mention that there's at least one instance where someone said it was really RED.

These are definitely in line with the use cases we are trying to target and improve upon with GraphRAG, so please share your findings with the team as you experiment with red-filled skies.

If you've not checked out the MSR Blog Post, please do. It gives some examples of how GraphRAG handles conflicting information - https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

Does this help a little?

1 reply

adairgj Jul 11, 2024

This might be a different question all together but what does happen when an existing document gets updated? Does the index get recreated or is there a partial rebuild as @silhouettehustler mentioned?

natoverse · 2024-07-26T20:24:54Z

natoverse
Jul 26, 2024
Maintainer

Tracking incremental indexing with #741

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Handle Incremental Updates to Indexed Data? #511

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

How to Handle Incremental Updates to Indexed Data? #511

silhouettehustler Jul 11, 2024

Replies: 3 comments · 2 replies

timothymeyers Jul 11, 2024

silhouettehustler Jul 11, 2024 Author

timothymeyers Jul 11, 2024

adairgj Jul 11, 2024

natoverse Jul 26, 2024 Maintainer

silhouettehustler
Jul 11, 2024

Replies: 3 comments 2 replies

timothymeyers
Jul 11, 2024

silhouettehustler Jul 11, 2024
Author

timothymeyers
Jul 11, 2024

natoverse
Jul 26, 2024
Maintainer