Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update lucene to version 8.11.2 #16

Merged
merged 27 commits into from
Nov 14, 2024
Merged

Update lucene to version 8.11.2 #16

merged 27 commits into from
Nov 14, 2024

Conversation

tuomas2
Copy link

@tuomas2 tuomas2 commented Jul 19, 2024

Replaces #15

This gave access to some new features in Lucene, such as Regular Expression search. This is a major refactor because I updated Lucene 5 major versions.

I tested several languages, English, Czech, Chinese, Japanese, Thai and search works in these languages. I am not capable to test if the stemming is good for all languages, so some more testing by native speakers is necessary.

@tuomas2 tuomas2 changed the base branch from master to develop July 19, 2024 16:01
@tuomas2 tuomas2 changed the base branch from develop to master July 19, 2024 16:02
@tuomas2
Copy link
Author

tuomas2 commented Jul 19, 2024

So summarizing @JJK96 , I would like that we try to:

  • Remove AbstractBookAnalyzer alltogether, and all custom analyzers that are based on that.
  • Use StopwordAnalyzer as a baseclass for our custom analyzers (KeyAnalyzer etc)
  • Modify properties file / factory accordingly to use classes from core and other libs.
  • Change filter classes (used by some analyzers like KeyAnalyzer) so that they do not store book (as it does not seem to be used)

(related to discussion started here: #15 (comment))

JJK96 added 6 commits August 19, 2024 21:17
Also removed LuceneAnalyzer and moved it's functionality into AnalyzerFactory
AnalyzerFactory now returns a real subclass of Analyzer, instead of a wrapper.

For all languages, language-specific analyzers are used, instead of Snowball Analyzers
Removed EnglishAnalyzer test in AnalyzerFactoryTest
@JJK96
Copy link

JJK96 commented Oct 14, 2024

  • Remove AbstractBookAnalyzer alltogether, and all custom analyzers that are based on that.
  • Use StopwordAnalyzer as a baseclass for our custom analyzers (KeyAnalyzer etc)
    • I used Analyzer as the base, since stopwording was not used by these classes.
  • Modify properties file / factory accordingly to use classes from core and other libs.
  • Change filter classes (used by some analyzers like KeyAnalyzer) so that they do not store book (as it does not seem to be used)

JJK96 added 3 commits October 14, 2024 21:08
Added check for index version when getting index status.
This ensures that the status correctly represents if the index is invalid.
Copy link
Author

@tuomas2 tuomas2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

Will fix most of them myself in an upcoming commit.

notes.md Outdated
@@ -0,0 +1,18 @@
Functionality AbstractBookAnalyzer
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file should be deleted before merging, right?

@@ -44,6 +44,7 @@ public interface Index {
* @throws BookException
*/
Key find(String query) throws BookException;
Key find(String query, boolean full_text) throws BookException;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Key find(String query, boolean full_text) throws BookException;
Key find(String query, boolean fullText) throws BookException;

Field headingField = new Field(FIELD_HEADING, "", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.NO);
Field headingStemField = new Field(FIELD_HEADING_STEM, "", Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.NO);
Field morphologyField = new Field(FIELD_MORPHOLOGY , "", Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.NO);
FieldType stored_not_analyzed = new FieldType(StringField.TYPE_STORED);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

camelCaps

return createAnalyzer(book, false);
}

public Analyzer createAnalyzer(Book book, Boolean stopwording) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopWording

analyzerPerField.put(LuceneIndex.FIELD_INTRO_STEM, analyzer);
analyzerPerField.put(LuceneIndex.FIELD_HEADING_STEM, analyzer);
//analyzerPerField.put(LuceneIndex.FIELD_HEADING, myNaturalLanguageAnalyzer); //heading to use same analyzer as BODY
//analyzerPerField.put(LuceneIndex.FIELD_INTRO, myNaturalLanguageAnalyzer);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary comments?

Latest.Index.Version=1.2
Lucene.Version=3.0

Latest.Index.Version=1.3
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add comment above (as there seem to be version history)

@tuomas2 tuomas2 changed the base branch from master to develop November 14, 2024 12:45
@tuomas2
Copy link
Author

tuomas2 commented Nov 14, 2024

I'll merge these both to develop and start preparing a beta release. Looks good so far, but haven't tested yet in practice.

@tuomas2 tuomas2 merged commit 1445a75 into develop Nov 14, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Closed
Development

Successfully merging this pull request may close these issues.

2 participants