Description
Hi guys,
Is there a proper way to define an index with language specific stemming and tokenization on a single field and a single index ?
I'm struggling to find a proper solution that is Riak compatible, but nothing seems clear to me.
Here is the copy of my Stackoverflow question
I want to store multilanguage (for illustration purpose english, french, spanish, but that's much more) in Riak, I want to use Riak search to help me grouping, stemming, tokenizing the text values.
In my Schema.yml i have:
<field name="text" type="string" indexed="true" stored="true" multiValued="false"/>
And :
<fieldType name="text_en" class="solr.TextField" />
<fieldType name="text_es" class="solr.TextField" />
<fieldType name="text_fr" class="solr.TextField" />
Each fieldType enable language specific optimisation.
There is no DynamicFieldType in Solr, as stated in this other help request at stackoverflow: http://stackoverflow.com/questions/23747373/solr-dynamic-field-types
As suggested above I have three solutions:
- Separate field per language - load into separate fields (not dynamic) that have appropriate tokenizers and filters per language
- Separate index/core per language -
- Everything in one field, custom code to manage -
Separate field
Would force me to store each data in different fields in my Riak document. That's not scalable up to 20 or more languages.
<field name="text_en" type="text_en" indexed="true" stored="true" multiValued="false"/>
<field name="text_es" type="text_en" indexed="true" stored="true" multiValued="false"/>
<field name="text_fr" type="text_en" indexed="true" stored="true" multiValued="false"/>
Separate indexes
That's pretty simple, I can configure my Solr index for a given language, keep only one field. That's an interesting solution since it will allow me a language sharding that's pretty convenient or maintenance.
BUT that imply that I cannot search across multiple languages anymore since I can't find multi-index search feature in my python library or in the documentation.
Custom code
Which I don't understand, most probably start my own java class that can handle my case. That's clearly NOT my preference.
Is there another way around this problem ?