You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We would need to ensure in the KNNVectorFieldType we handle cases of model and legacy elegantly.
We could even take it a step further and have the engine return the index creator that the underlying index creation code could use. But maybe one step at a time
Potential benefits:
Simplification of parametrized codec code - we would not need to do branching or parsing of attributes in the index creation logic. Instead, we could properly encapsulate this somewhere else
Minute improvement in performance - because codec is created once per shard, we wouldnt constantly have to configure the index creation logic for each segment
Unknowns:
One challenge may be dynamic mapping parameters that impact indexing - we dont support this now, but may want to in the future. Itd be tricky to do this - some more research would need to be done.
Description
Right now, we add several different attributes in the Lucene fieldType in order to pass parameters from the fieldmapper (opensearch extension) to the KNN80DocValuesConsumer (lucene extension). https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesConsumer.java. The problem with this is that attributes have to be key-value string pairs, so additional complexity is required to properly configure (just see KNN80DocValuesConsumer).
I think we could drastically simplify this by leveraging the KNNVectorFieldType (https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/mapper/KNNVectorFieldMapper.java#L471 - opensearch concept). KNNVectorFieldType has the KNNMethodContext for the field and so can be used to specify pretty much everything we need in order to build the underlying index.
To get these values to the lucene extension (i.e. KNN80DocValuesConsumer and/or NativeEngines990KnnVectorsWriter), we can leverage the per-field formats that have access to the MapperService (and therefore the KnnVectorFieldType and therefore the KNNMethodContext) - https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/codec/KNNCodecService.java#L33. On search, we already have access to the KnnVectorFieldType and KNNMethodContext (https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/query/KNNQueryBuilder.java#L347).
We would need to ensure in the KNNVectorFieldType we handle cases of model and legacy elegantly.
We could even take it a step further and have the engine return the index creator that the underlying index creation code could use. But maybe one step at a time
Potential benefits:
Unknowns:
BWC concerns
On write, because the field attributes are per segment (see https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/FieldInfo.java#L26), they should not need to be used by us on upgrade, so no issue.
On read, in search we do lookup attributes based on field (see https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/query/KNNWeight.java#L222) but this info can actually be held in the KNNVectorFieldType, so I dont think we need it.
The text was updated successfully, but these errors were encountered: