Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solr textContainsPhrase & Solr Client Tokenizer Alignment #4166

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

criminosis
Copy link
Contributor


Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there an issue associated with this PR? Is it referenced in the commit message?
  • Does your PR body contain #xyz where xyz is the issue number you are trying to resolve?
  • Has your PR been rebased against the latest commit within the target branch (typically master)?
  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you written and/or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE.txt file, including the main LICENSE.txt file in the root of this repository?
  • If applicable, have you updated the NOTICE.txt file, including the main NOTICE.txt file found in the root of this repository?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Fixes #4165
Closes #4164

Closes JanusGraph#4164

Added new test document and updated assertions

Use tokenizer that mimics Solr's standardized tokenizer

Geo predicate tweaking

Implemented TextContainsPhrase for Solr

Signed-off-by: Allan Clements <[email protected]>
@janusgraph-bot janusgraph-bot added the cla: external Externally-managed CLA label Dec 4, 2023
@li-boxuan
Copy link
Member

@VladimirBogomolov Would you like to review?

Copy link
Member

@li-boxuan li-boxuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix and the new pushdown rule! I have one question about the tokenizer fix:

In your issue description #4165 you mentioned that you witnessed the bug when migrating from Elasticsearch to Solr. Is it possible to write a test in JanusGraphIndexTest such that Elasticsearch backend would pass, and Solr backend wouldn't without your fix? I understand you added a new unit test for Solr backend, but it would be even better if we could have a test to ensure the consistent behaviour across index backends.

@criminosis
Copy link
Contributor Author

I believe running the test class I modified without the modified Solr client should demonstrate it. The test class I modified appeared to be executed upon all providers if I'm not mistaken.

@li-boxuan
Copy link
Member

li-boxuan commented Dec 6, 2023

The test class I modified appeared to be executed upon all providers if I'm not mistaken.

Oh yeah you are right, never mind then!

@criminosis
Copy link
Contributor Author

Restoring the original client will opt it out of the containsPhrase testing based on the support flags evaluated by the supports method. But I believe it should demonstrate the breakage with regards to the textContains predicate.

If not I woefully missed up in my preliminary investigation 😅

Copy link
Member

@li-boxuan li-boxuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh gosh sorry I completely forgot about this PR. If there's no other review, I'll merge this PR in a week following lazy merge consensus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: external Externally-managed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pre-Tokenizer Is Incongruent With Solr Default StandardTokenizer Solr Support For textContainsPhrase
3 participants