Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing bug where mapping accepts both dimension and model-id #2410

Merged
merged 5 commits into from
Jan 24, 2025

Conversation

markwu-sde
Copy link
Contributor

Description

Current Problem

  • KNN indices can be created with both model_id and dimension parameters simultaneously
  • This is incorrect as dimensions should come from either:
    • The training index (when using model_id)
    • OR user-defined dimensions (when using dimension parameter)

Implemented Solution

  • Raise an exception when both model_id and dimension parameters are present in the index creation request

Version Scope

  • Applies to all indices created on or after version 2.19

Testing

wumr@88665a07f65b ~ %  curl -XPUT "http://localhost:9200/my-knn-index-1" -H 'Content-Type: application/json' -d'                                                                                                                                                           
{
  "settings": {
    "index": {
      "knn":true
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector", "model_id": "1234",
        "dimension": 2
      }}}}'
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"Failed to parse mapping [_doc]: Dimension and model can not be both specified in the mapping: my_vector1"}],"type":"mapper_parsing_exception","reason":"Failed to parse mapping [_doc]: Dimension and model can not be both specified in the mapping: my_vector1","caused_by":{"type":"illegal_argument_exception","reason":"Dimension and model can not be both specified in the mapping: my_vector1"}},"status":400}%  

Normal input:

wumr@88665a07f65b ~ % curl -XPUT "http://localhost:9200/target-index" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "index.knn": true
  },
  "mappings": {
    "properties": {
      "target-field": {
        "type": "knn_vector",
        "model_id": "my-model"
      }
    }
  }
}
'
{"acknowledged":true,"shards_acknowledged":true,"index":"target-index"}% 

On 3.0:

wumr@88665a07f65b ~ % curl localhost:9200
{
  "name" : "integTest-0",
  "cluster_name" : "integTest",
  "cluster_uuid" : "pMFCf1AqRpak3k_OTJD12Q",
  "version" : {
    "distribution" : "opensearch",
    "number" : "3.0.0-SNAPSHOT",
    "build_type" : "tar",
    "build_hash" : "a609e634a348b76386fb11936bbe8c4b38ea72d0",
    "build_date" : "2025-01-14T04:33:30.338382Z",
    "build_snapshot" : true,
    "lucene_version" : "9.12.1",
    "minimum_wire_compatibility_version" : "2.19.0",
    "minimum_index_compatibility_version" : "2.0.0"
  },
  "tagline" : "The OpenSearch Project: https://opensearch.org/"
}

Related Issues

Resolves #2318

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@markwu-sde
Copy link
Contributor Author

markwu-sde commented Jan 21, 2025

There's a separate unit test(unrelated to this) that had both parameters (dimension and model-id) specified.

I rolled back my change locally and figured out what the expected failure case in this unit test was. Seems as though it's trying to reach the validation here which was why the model-id was provided. Assuming this was supposed to test the omission of compression and mode when a model is present I'm not sure why dimension was also set here. Removing the dimension field succeeded the test. I've reran the debugger to make sure the code coverage would still yield the same result.

If my mental model here isn't accurate please let me know and I can make the appropriate changes.

navneet1v
navneet1v previously approved these changes Jan 22, 2025
&& builder.modelId.get() != null
&& parserContext.indexVersionCreated().onOrAfter(Version.V_2_19_0)) {
throw new IllegalArgumentException(
String.format(Locale.ROOT, "Dimension and model can not be both specified in the mapping: %s", name)
Copy link
Member

@VijayanB VijayanB Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "Cannot specify both a modelId and dimension in the mapping"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

Copy link
Member

@VijayanB VijayanB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for making the changes and addressing feedback comments.

@jmazanec15 jmazanec15 merged commit d142366 into opensearch-project:main Jan 24, 2025
36 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Mapping accepts both model id and dimension
4 participants