Skip to content

Commit

Permalink
Experiment with dropping stopwords from Lucene
Browse files Browse the repository at this point in the history
to enable searches like @awmarrs’ “not found attached” - since “not” is currently dropped both from source data and from query strings
  • Loading branch information
joewiz committed Jul 25, 2017
1 parent 590915f commit fc3f8b3
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion volumes.xconf
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,12 @@

<!-- Lucene index configuration -->
<lucene>
<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer">
<!-- Specify stop words - or remove them entirely -->
<param name="stopwords" type="org.apache.lucene.analysis.util.CharArraySet">
<!--<value>the</value>-->
</param>
</analyzer>
<text qname="tei:div"/>
</lucene>

Expand Down

3 comments on commit fc3f8b3

@joewiz
Copy link
Member Author

@joewiz joewiz commented on fc3f8b3 Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's @awmarrs' search before:

screen shot 2017-07-25 at 2 49 06 pm

and after (on my local machine):

screen shot 2017-07-25 at 2 48 59 pm

And the first hit in the "after" contains the exact search string - this screenshot from https://history.state.gov/historicaldocuments/frus1943v01/d888:

screen shot 2017-07-25 at 2 51 01 pm

@joewiz
Copy link
Member Author

@joewiz joewiz commented on fc3f8b3 Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in good news, this change didn't alter the time required to index the FRUS collection. Deploying hsg-project from scratch took the same amount of time as before - 20m.

@WaxCylinderRevival
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic!

Please sign in to comment.