-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hibernate Search ORM ElasticSearch fails search for offset larger than index.max_result_window #45440
Comments
/cc @gsmet (elasticsearch,hibernate-search), @loicmathieu (elasticsearch), @marko-bekhta (elasticsearch,hibernate-search), @yrodiere (elasticsearch,hibernate-search) |
Hello, Thanks for reporting.
I'm sorry but that's unrealistic expectations. Hibernate Search cannot do what Elasticsearch cannot. And Elasticsearch cannot handle larger pages when searching -- except with workarounds, see below.
This on the other hand, is expected to work. I'd like to know more.
Relatedly, a way to work around this limitation in Elasticsearch is to use Elasticsearch's "search_after" feature, which ignores all documents with a sort key below a given one, excluding them from the result window. It's not exposed yet in Hibernate Search (HSEARCH-2601), but we gladly accept PRs ;) and worst case it can possibly be implemented in your application with custom JSON. |
By the way, to clarify the limitation: what matters is not the number of indexed entities. You can have millions of them. What matters is the number of hits of your query, and how far down that list of hits you want to go. A query with 10,000,000 hits is fine, if you only inspect the 10,000 first hits. That's where |
Oh and FYI, the best way (for you) to work around this limitation is to... not offer the feature to your users in the first place. Just propagate the limitation. That won't work if your use case is automated processing of the whole index (or a large part of it), of course. In that case I'd suggest using scrolling if possible, combined with |
My hope was that either the scroll API or
This is my current solution of choice as well, thank you! Because this is not a bug but rather a limitation of the used endpoints, I will close this ticket. |
Thanks for the update!
I think the scroll API has the same limitation (result window limited to 10,000 hits) so I do not think that would help. The Anyway, in either case, an implicit workaround for this limitation an Elasticserach would incur a significant performance hit. I would personally not implement that, or only as an opt-in feature -- if only to give you an excuse to propagate the limitation :)
Works for me, thanks. Anyone reading this, you probably want https://hibernate.atlassian.net/browse/HSEARCH-2601 to be implemented -- and you can send a pull request! |
Describe the bug
When using quarkus-hibernate-search-orm-elasticsearch (3.17.5) with ElasticSearch 8.17.0 and more than max_result_window (default 10000) indexed entities, the search with large offset > 10000 (
searchSession.search(...).fetch(searchOffset, searchLimit);
) of the search fails with following exception:Additionally, I did not manage to employ the workaround of increasing the
index.max_result_window
setting using a custom setting viaquarkus.hibernate-search-orm.elasticsearch.schema-management.settings-file
with custom setting file content:{ "max_result_window": 100000 }
, yieldingInvalid value. Expected '100000', actual is 'null'
Expected behavior
Hibernate-search should return indexed entities for larger pages when searching.
Actual behavior
Hibernate-Search exception as stated above when performing paginated search with a large offset
How to Reproduce?
Take the hibernate-search-orm-elasticsearch-quickstart project, add following function to
LibraryResource
:and following test to
LibraryResourceTest
:Run the test (it takes some time to perform the insertions via HTTP), and you will encounter the same exception as described:
Output of
uname -a
orver
Linux Penguin 6.11.0-13-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Sat Nov 30 23:51:51 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Output of
java -version
openjdk version "23.0.1" 2024-10-15 OpenJDK Runtime Environment (build 23.0.1+11-Ubuntu-1ubuntu124.10.1) OpenJDK 64-Bit Server VM (build 23.0.1+11-Ubuntu-1ubuntu124.10.1, mixed mode, sharing)
Quarkus version or git rev
3.17.5
Build tool (ie. output of
mvnw --version
orgradlew --version
)Maven home: /home/dahoc/.m2/wrapper/dists/apache-maven-3.9.9/3477a4f1 Java version: 23.0.1, vendor: Ubuntu, runtime: /usr/lib/jvm/java-23-openjdk-amd64 Default locale: de_DE, platform encoding: UTF-8 OS name: "linux", version: "6.11.0-13-generic", arch: "amd64", family: "unix"
Additional information
This might be related to #45164 in that Hibernate-Search is apparently using a discouraged or outdated API (also negatively impacting memory) instead of the recommended one as stated in the exception.
The text was updated successfully, but these errors were encountered: