Doubt Regarding Retrieving Documents in Stemmed Version #937
-
Hello, One very naive way is to index the msmarco-passage corpora with stemmed and remove the stop words. Do you have any other solutions/workarounds for the above? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The Lucene "Analyzer" is responsible for handling stemming, stopword removal, etc. This is where Anserini chooses its analyzer: https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/index/IndexCollection.java#L738 By default, we stem and remove stopwords. |
Beta Was this translation helpful? Give feedback.
The Lucene "Analyzer" is responsible for handling stemming, stopword removal, etc.
This is where Anserini chooses its analyzer: https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/index/IndexCollection.java#L738
By default, we stem and remove stopwords.