Skip to content

Out of memory when creating customized dense index from a large collection #1238

Answered by crystina-z
dayuyang1999 asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @dayuyang1999! Thanks for your interest in our framework. Sorry for letting you wait for this long.

When it requires large memory to encode, we usually shard the indexing process by adding --shard-id $currend_id --shard-num $total under the input argument. Then perform retrieval on each sub-index independently and finally merge the retrieval results. (You can also merge the sub-index but that would also required large memory)

Example of shard indexing can be found under - https://github.com/castorini/pyserini#dense-indexes. and the retrieval command would be same (just change the path to each sub-index).

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by lintool
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #1229 on July 21, 2022 21:53.