-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some questions for clarification #401
Comments
Thanks for looking into the code and experimenting with nrtsearch. Below is my initial attempt to try to answer your questions
Do go through our blog post, it includes a lot of details about our design. |
Thanks @umeshdangat for good clarified answers Adding more questions for clarification
|
|
After some review of the code and little experimenting with nrtsearch I came up with a few findings and questions I like to ask and clarify:
Is it correct that there is no communication between replica nodes (like forwarding search requests from clients)?
Is it correct that the client need to know which index is on which replica and then connect to this replica to execute a search request?
It seems there is no kind of built-in loadbalancing mechanism so that client search requests are executed on the replica with lowest utilization (or in round robin fashion), true? Is loadbalancing something which should be done by the surrounding infrastructure like kubernetes? Any plans to leverage the gRPC-LB protocol (https://grpc.io/blog/grpc-load-balancing/)?
Looks like HA (high availibility) is not a primary goal (which is fine) because it seems that there is no automatic fail-over (especially for primary nodes) mechanism. Is this something which should be handled by the surrounding infrastructure like kubernetes or by the client (queuing up index requests until primary is back) ?
It seems possible to issue search requests against primary nodes. Is this something which is just not recommended or are there plans to forbid this technically.
From what I saw in the code it appears that there is always only one shard per index (shard0). Is it possible (or planned) to have more than one shard per index. If not that would mean that all limits for lucene indices are also holding for "nrtsearch indices" and an index cannot exceed resources of a single physical machine (disk, max open files and memory).
To let an "nrtsearch index" use resources from multiple physical machines it would be necessary to split the index into multiple shards (=multiple lucene indices) and distribute them across different machines. From my current understanding i think thats not something with nrtsearch was developed for, right? But to server as a replacement candidate for elasticsearch or solr that would be a neccessary functionality. I just like to get a better understanding what are the goals nrtsearch want to reach and what are the use cases nrtsearch is built to support.
Can you elaborate briefly what "virtual sharding" is and what problem it solve? Is it related to 6)?
I understand the purpose of primary and replica nodes and how they interact. But there is also a "STANDALONE" mode and I am not sure about why? Is this for testing purposes? For me it looks like a single primary node would be also sufficient given the fact that a client can issue a search request also agains a primary node, see 5)
The text was updated successfully, but these errors were encountered: