-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multilingual text indexation [JIRA: RIAK-2439] #620
Comments
I'd suggest taking on the separate field approach @Guibod, but making sure to distribute that search index across various types/buckets. Did you try that and run into a bottleneck after 20 langs? Across indexing or querying? As per something like http://pavelbogomolenko.github.io/multi-language-handling-in-solr.html, but we can discuss how to best tune your configuration on your needs/expectations? |
Thanks @zeeshanlakhani , my only issue with the sharding per lang is that I don't know how to search across multiples indexes for the time being. I'm pretty new to Solr, and rely a lot on the python library at the moment. See: # Python API explicitly requires ONE index
results = bucket.search("counter:[10 TO *]", index='website',
sort="counter desc", rows=5) Should I use map/reduce on the search results ? If so, how can I do that ? |
@Guibod you can't do multi-index search w/ riak search, but I was wondering what your bottlenecks would look like using one search_index/core, but creating a bucket-type per lang (associating each bucket-type w/ the one search_index). |
The main issue is that I would be stuck with mono-lingual search. Most of the time i will aggregate data for data visualisation, I can map/reduce results from Riak into a proper aggregation by my own means. |
Hi guys,
Is there a proper way to define an index with language specific stemming and tokenization on a single field and a single index ?
I'm struggling to find a proper solution that is Riak compatible, but nothing seems clear to me.
Here is the copy of my Stackoverflow question
I want to store multilanguage (for illustration purpose english, french, spanish, but that's much more) in Riak, I want to use Riak search to help me grouping, stemming, tokenizing the text values.
In my Schema.yml i have:
And :
Each fieldType enable language specific optimisation.
There is no DynamicFieldType in Solr, as stated in this other help request at stackoverflow: http://stackoverflow.com/questions/23747373/solr-dynamic-field-types
As suggested above I have three solutions:
Separate field
Would force me to store each data in different fields in my Riak document. That's not scalable up to 20 or more languages.
Separate indexes
That's pretty simple, I can configure my Solr index for a given language, keep only one field. That's an interesting solution since it will allow me a language sharding that's pretty convenient or maintenance.
BUT that imply that I cannot search across multiple languages anymore since I can't find multi-index search feature in my python library or in the documentation.
Custom code
Which I don't understand, most probably start my own java class that can handle my case. That's clearly NOT my preference.
Is there another way around this problem ?
The text was updated successfully, but these errors were encountered: