Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sparse index support #95

Merged
merged 9 commits into from
Feb 4, 2025

Conversation

austin-denoble
Copy link
Contributor

@austin-denoble austin-denoble commented Jan 25, 2025

Problem

2025-01 includes changes to support sparse indexes. We need to adjust several types and functions to support working with these kinds of indexes.

Solution

  • Update CreateServerlessIndexRequest to allow creating sparse or dense serverless indexes.
  • Update Index struct to support VectorType and IndexEmbed. Add new IndexEmbed type.
  • Add new unit tests / update existing unit & integration tests.
package main

import (
	"context"
	"fmt"
	"github.com/pinecone-io/go-pinecone/v2/pinecone"
	"log"
	"os"
)

func main() {
	ctx := context.Background()

	clientParams := pinecone.NewClientParams{
		ApiKey: os.Getenv("PINECONE_API_KEY"),
	}

	pc, err := pinecone.NewClient(clientParams)
	if err != nil {
		log.Fatalf("Failed to create Client: %v", err)
	} else {
		fmt.Println("Successfully created a new Client object!")
	}

	indexName := "my-serverless-index"
	vectorType := "sparse"

	idx, err := pc.CreateServerlessIndex(ctx, &pinecone.CreateServerlessIndexRequest{
		Name:       indexName,
		Metric:     pinecone.Dotproduct,
		VectorType: &vectorType,
		Cloud:      pinecone.Aws,
		Region:     "us-east-1",
		Tags:       &pinecone.IndexTags{"environment": "development"},
	})

	if err != nil {
		log.Fatalf("Failed to create serverless index: %v", err)
	} else {
		fmt.Printf("Successfully created serverless index: %s", idx.Name)
	}
}

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

CI / CD (unit / integration tests)


… and associated methods, add some additional error validation for the different scenarios, update unit / integration tests
@austin-denoble austin-denoble marked this pull request as ready for review January 28, 2025 04:55
@@ -540,7 +540,7 @@ func (req CreatePodIndexRequest) TotalCount() int {
// fmt.Printf("Successfully created pod index: %s", idx.Name)
// }
func (c *Client) CreatePodIndex(ctx context.Context, in *CreatePodIndexRequest) (*Index, error) {
if in.Name == "" || in.Dimension == 0 || in.Metric == "" || in.Environment == "" || in.PodType == "" {
if in.Name == "" || in.Dimension <= 0 || in.Metric == "" || in.Environment == "" || in.PodType == "" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refining the check on Dimension here - if it's negative we don't really need to make a request.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Will Dimension <= 0 throw an error message "fields Name, Dimension, Metric, Environment, and Podtype must be included in CreatePodIndexRequest" even though the Dimension is included as a part of the request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I can update to mention "positive Dimension" to make it a bit more clear.

@@ -549,6 +549,7 @@ func (c *Client) CreatePodIndex(ctx context.Context, in *CreatePodIndexRequest)
pods := in.TotalCount()
replicas := in.ReplicaCount()
shards := in.ShardCount()
vectorType := "dense"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sparse cannot be selected for pod indexes so we default for the pod creation path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

require.ErrorContainsf(t, err, "fields Name, Metric, Cloud, and Region must be included in CreateServerlessIndexRequest", err.Error())
}

func TestCreateServerlessIndexInvalidSparseDimensionUnit(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following are all unit tests to check the validation in the serverless creation path.

@@ -7,7 +7,7 @@ import (
"google.golang.org/protobuf/types/known/structpb"
)

func TestMarshalIndexStatus(t *testing.T) {
func TestMarshalIndexStatusUnit(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed all these tests were missing the "Unit" suffix, which is needed for them to be included when running just test-unit.

README.md Outdated
}

indexName := "my-serverless-index"
vectorType := "dense"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to show an example for vectorType = sparse?

Copy link

@rohanshah18 rohanshah18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a few comments!!

@austin-denoble austin-denoble merged commit b688b60 into 2025-01 Feb 4, 2025
2 checks passed
@austin-denoble austin-denoble deleted the adenoble/implement-sparse-indexes branch February 4, 2025 04:07
Copy link
Contributor

@haruska haruska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

if in.Dimension != nil {
return nil, fmt.Errorf("dimension should not be specified when VectorType is 'sparse'")
} else if in.Metric != Dotproduct {
return nil, fmt.Errorf("metric should be 'dotproduct' when VectorType is 'sparse'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metric should be optional for sparse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants