Change language around records --> vectors where appropriate (#288)

## Problem We believe it's more accurate to refer to dense-vector objects as "vectors" (vs "records") when we are talking about vectors we'd like to upsert into an index, etc. This is because they are primarily (first and foremost) _vectors_, with IDs, metadata, etc. ## Solution Change instances of "record(s)" to "vector(s)" where appropriate. Note: I opted _not_ to change instances of "record(s)" in cases where we are talking about the entire object as a whole (as in, "Update a record", since you could be updating the metadata, the vector values, etc.). ## Type of Change - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] This change requires a documentation update - [ ] Infrastructure change (CI configs, etc) - [x] Non-code change (docs, etc) - [ ] None of the above: (explain here) --- - To see the specific tasks where the Asana app for GitHub is being used, see below: - https://app.asana.com/0/0/1208381566824289
pinecone-io · Sep 25, 2024 · 9dcd99e · 9dcd99e
1 parent 73d98b8
commit 9dcd99e
Showing 1 changed file with 50 additions and 39 deletions.
diff --git a/README.md b/README.md
@@ -141,7 +141,9 @@ indexes().then((response) => {
 
 #### Create a serverless index with minimal configuration
 
-At a minimum, to create a serverless index you must specify a `name`, `dimension`, and `spec`. The `dimension` indicates the size of the records you intend to store in the index. For example, if your intention was to store and query embeddings generated with OpenAI's [textembedding-ada-002](https://platform.openai.com/docs/guides/embeddings/second-generation-models) model, you would need to create an index with dimension `1536` to match the output of that model.
+At a minimum, to create a serverless index you must specify a `name`, `dimension`, and `spec`. The `dimension`
+indicates the size of the vectors you intend to store in the index. For example, if your intention was to store and
+query embeddings (vectors) generated with OpenAI's [textembedding-ada-002](https://platform.openai.com/docs/guides/embeddings/second-generation-models) model, you would need to create an index with dimension `1536` to match the output of that model.
 
 The `spec` configures how the index should be deployed. For serverless indexes, you define only the cloud and region where the index should be hosted. For pod-based indexes, you define the environment where the index should be hosted, the pod type and size to use, and other index characteristics. For more information on serverless and regional availability, see [Understanding indexes](https://docs.pinecone.io/guides/indexes/understanding-indexes#serverless-indexes).
 
@@ -649,7 +651,7 @@ await index.fetch(['1']);
 
 See [Use namespaces](https://docs.pinecone.io/guides/indexes/use-namespaces) for more information.
 
-### Upsert records
+### Upsert vectors
 
 Pinecone expects records inserted into indexes to have the following form:
 
@@ -662,7 +664,7 @@ type PineconeRecord = {
 };
 ```
 
-To upsert some records, you can use the client like so:
+To upsert some vectors, you can use the client like so:
 
 ```typescript
 import { Pinecone } from '@pinecone-database/pinecone';
@@ -674,7 +676,7 @@ const index = pc.index('sample-index');
 // Prepare your data. The length of each array
 // of vector values must match the dimension of
 // the index where you plan to store them.
-const records = [
+const vectors = [
   {
     id: '1',
     values: [0.236, 0.971, 0.559],
@@ -688,13 +690,13 @@ const records = [
 ];
 
 // Upsert the data into your index
-await index.upsert(records);
+await index.upsert(vectors);
 ```
 
 ### Seeing index statistics
 
-When experimenting with data operations, it's sometimes helpful to know how many records are stored in each namespace. In that case,
-target the index and use the `describeIndexStats()` command.
+When experimenting with data operations, it's sometimes helpful to know how many records/vectors are stored in each
+namespace. In that case, target the index and use the `describeIndexStats()` command.
 
 ```typescript
 import { Pinecone } from '@pinecone-database/pinecone';
@@ -800,7 +802,7 @@ const index = pc.index('my-index');
 const results = await index.query({ topK: 10, id: '1' });
 ```
 
-#### Hybrid search with sparseVector
+#### Hybrid search with sparse vectors
 
 If you are working with [sparse-dense vectors](https://docs.pinecone.io/guides/data/understanding-hybrid-search#sparse-dense-workflow), you can add sparse vector values to perform a hybrid search.
 
@@ -809,38 +811,47 @@ import { Pinecone } from '@pinecone-database/pinecone';
 const pc = new Pinecone();
 
 await pc.createIndex({
-    name: 'hyrbid-image-search',
-    metric: 'dotproduct',
-    dimension: 512,
-    spec: {
-        pod: {
-            environment: 'us-west4-gcp',
-            pods: 1,
-            podType: 's1.x1',
-        }
+  name: 'hybrid-search-index',
+  metric: 'dotproduct', // Note: dot product is the only distance metric supported for hybrid search
+  dimension: 2,
+  spec: {
+    pod: {
+      environment: 'us-west4-gcp',
+      podType: 'p2.x1',
     },
-    waitUntilReady: true
+  },
+  waitUntilReady: true,
 });
-const index = pc.index('hybrid-image-search');
-
-// Create some vector embeddings using your model of choice.
-const records = [...]
-
-// Upsert data
-await index.upsert(records)
-
-// Prepare query values. In a more realistic example, these would both come out of a model.
-const vector = [
-    // The dimension of this index needs to match the index dimension.
-    // Pretend this is a 512 dimension vector.
-]
-const sparseVector = {
-    indices: [23, 399, 251, 17],
-    values: [ 0.221, 0.967, 0.016, 0.572]
-}
 
-// Execute the query
-const results = await index.query({ topK: 10, vector, sparseVector, includeMetadata: true })
+const index = pc.index('hybrid-search-index');
+
+const hybridRecords = [
+  {
+    id: '1',
+    values: [0.236, 0.971], // dense vectors
+    sparseValues: { indices: [0, 1], values: [0.236, 0.34] }, // sparse vectors
+  },
+  {
+    id: '2',
+    values: [0.685, 0.111],
+    sparseValues: { indices: [0, 1], values: [0.887, 0.243] },
+  },
+];
+
+await index.upsert(hybridRecords);
+
+const query = 'What is the most popular red dress?';
+// ... send query to dense vector embedding model and save those values in `denseQueryVector`
+// ... send query to sparse vector embedding model and save those values in `sparseQueryVector`
+const denseQueryVector = [0.236, 0.971];
+const sparseQueryVector = { indices: [0, 1], values: [0.0, 0.34] };
+
+// Execute a hybrid search
+await index.query({
+  topK: 3,
+  vector: denseQueryVector,
+  sparseVector: sparseQueryVector,
+});
 ```
 
 ### Update a record
@@ -889,7 +900,7 @@ await index.listPaginated({
 });
 ```
 
-### Fetch records by their IDs
+### Fetch records by ID(s)
 
 ```typescript
 import { Pinecone } from '@pinecone-database/pinecone';
@@ -913,7 +924,7 @@ const index = pc.index('my-index');
 await index.deleteOne('id-to-delete');
 ```
 
-#### Delete many by id
+#### Delete many by ID
 
 ```typescript
 import { Pinecone } from '@pinecone-database/pinecone';