You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there any documentation of what the various deduplication strategies actually do? My use case is that I have a corpus of texts and the document have certain template texts that reoccur with slight variations to them. I don't know if any of the strategies here fit for that, and the code examples and whatever documentation there is don't help. For instance what does "only_dedup_in_index" mean for SentenceDedup?
Maybe it's just because I'm not very familiar with deduplication in general that I'm struggling with this, so would appreciate help.
The text was updated successfully, but these errors were encountered:
Is there any documentation of what the various deduplication strategies actually do? My use case is that I have a corpus of texts and the document have certain template texts that reoccur with slight variations to them. I don't know if any of the strategies here fit for that, and the code examples and whatever documentation there is don't help. For instance what does "only_dedup_in_index" mean for SentenceDedup?
Maybe it's just because I'm not very familiar with deduplication in general that I'm struggling with this, so would appreciate help.
The text was updated successfully, but these errors were encountered: