Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move away from using field info to pass parameters #1914

Open
jmazanec15 opened this issue Jul 31, 2024 · 0 comments
Open

Move away from using field info to pass parameters #1914

jmazanec15 opened this issue Jul 31, 2024 · 0 comments
Labels
Refactoring Improve the design, structure, and implementation while preserving its functionality

Comments

@jmazanec15
Copy link
Member

Description

Right now, we add several different attributes in the Lucene fieldType in order to pass parameters from the fieldmapper (opensearch extension) to the KNN80DocValuesConsumer (lucene extension). https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesConsumer.java. The problem with this is that attributes have to be key-value string pairs, so additional complexity is required to properly configure (just see KNN80DocValuesConsumer).

I think we could drastically simplify this by leveraging the KNNVectorFieldType (https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/mapper/KNNVectorFieldMapper.java#L471 - opensearch concept). KNNVectorFieldType has the KNNMethodContext for the field and so can be used to specify pretty much everything we need in order to build the underlying index.

To get these values to the lucene extension (i.e. KNN80DocValuesConsumer and/or NativeEngines990KnnVectorsWriter), we can leverage the per-field formats that have access to the MapperService (and therefore the KnnVectorFieldType and therefore the KNNMethodContext) - https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/codec/KNNCodecService.java#L33. On search, we already have access to the KnnVectorFieldType and KNNMethodContext (https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/query/KNNQueryBuilder.java#L347).

We would need to ensure in the KNNVectorFieldType we handle cases of model and legacy elegantly.

We could even take it a step further and have the engine return the index creator that the underlying index creation code could use. But maybe one step at a time

Potential benefits:

  1. Simplification of parametrized codec code - we would not need to do branching or parsing of attributes in the index creation logic. Instead, we could properly encapsulate this somewhere else
  2. Minute improvement in performance - because codec is created once per shard, we wouldnt constantly have to configure the index creation logic for each segment

Unknowns:

  1. One challenge may be dynamic mapping parameters that impact indexing - we dont support this now, but may want to in the future. Itd be tricky to do this - some more research would need to be done.

BWC concerns

On write, because the field attributes are per segment (see https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/FieldInfo.java#L26), they should not need to be used by us on upgrade, so no issue.

On read, in search we do lookup attributes based on field (see https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/query/KNNWeight.java#L222) but this info can actually be held in the KNNVectorFieldType, so I dont think we need it.

@jmazanec15 jmazanec15 added the Refactoring Improve the design, structure, and implementation while preserving its functionality label Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactoring Improve the design, structure, and implementation while preserving its functionality
Projects
Status: Backlog
Development

No branches or pull requests

1 participant