-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
95 additions
and
112 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,116 +1,67 @@ | ||
# Quickstart: Mastering Search in Spice | ||
|
||
Welcome to this quickstart guide! Here, you'll learn how to use Spice's powerful search capabilities, combining both SQL-based and advanced vector-based search functionalities. Whether you're new or experienced, follow these steps to get started and unlock the full potential of your data. | ||
|
||
## Introduction to Searching with Spice | ||
|
||
**Spice** integrates traditional SQL and cutting-edge vector-based search technologies to empower users with flexible and efficient data exploration. | ||
|
||
--- | ||
|
||
## Getting Started | ||
|
||
### Setting Up Your Environment | ||
|
||
1. **Install Spice:** | ||
- Ensure you have the Spice CLI installed. Follow the [Spice installation guide](link_to_installation_guide) if you haven't done so. | ||
|
||
2. **Create a New Spice Pod:** | ||
- Initialize a new spicepod to organize your datasets and configurations: | ||
```bash | ||
spice create my_first_spicepod | ||
cd my_first_spicepod | ||
``` | ||
|
||
3. **Configure Your Spicepod:** | ||
- Edit the `spicepod.yaml` configuration file: | ||
```yaml | ||
embeddings: | ||
- from: openai | ||
name: remote_service | ||
params: | ||
openai_api_key: ${ secrets:SPICE_OPENAI_API_KEY } | ||
- name: local_embedding_model | ||
from: huggingface:huggingface.co/sentence-transformers/all-MiniLM-L6-v2 | ||
``` | ||
|
||
4. **Load Sample Datasets:** | ||
- Add datasets to your Spice pod. For an example, creating a dataset from GitHub issues: | ||
```yaml | ||
datasets: | ||
- from: github:github.com/spiceai/spiceai/issues | ||
name: spiceai.issues | ||
acceleration: | ||
enabled: true | ||
embeddings: | ||
- column: body | ||
use: local_embedding_model | ||
``` | ||
|
||
### **Ensure API Keys Are Set:** | ||
- Export or set necessary environment variables before proceeding (e.g., `SPICE_OPENAI_API_KEY`). | ||
|
||
--- | ||
|
||
## Performing SQL-Based Search | ||
|
||
Spice allows you to perform traditional SQL searches efficiently: | ||
|
||
### Execute a Basic SQL Query | ||
|
||
1. **Run a Query:** | ||
- Use SQL to perform keyword searches within your dataset: | ||
```sql | ||
SELECT id, text_column | ||
FROM spice.public.quickstarts | ||
WHERE | ||
LOWER(text_column) LIKE '%search_term%' | ||
AND | ||
date_published > '2021-01-01' | ||
``` | ||
|
||
Run this via your SQL interface connected to Spice. | ||
|
||
--- | ||
|
||
## Utilizing Vector-Based Search | ||
|
||
Vector-based search in Spice enables semantic and similarity-based searches, enhancing your search capabilities beyond traditional keywords. | ||
|
||
### Configure Vector Search | ||
|
||
1. **Embedding Configuration:** | ||
- Make sure your dataset column is configured for vector search in your `spicepod.yaml`. | ||
|
||
2. **Perform a Search Query:** | ||
- Execute a vector-based query using curl from the command line: | ||
```shell | ||
curl -XPOST http://localhost:8090/v1/search \\ | ||
-H 'Content-Type: application/json' \\ | ||
-d '{ | ||
"datasets": ["spiceai.issues"], | ||
"text": "cutting edge AI", | ||
"where": "author=\"jeadie\"", | ||
"additional_columns": ["title", "state"], | ||
"limit": 2 | ||
}' | ||
``` | ||
|
||
This command returns results based on semantic similarities in your data. | ||
# Quickstart: Searching with Spice | ||
|
||
## Prerequistes | ||
- Ensure you have the Spice CLI installed. Follow the [Spice installation guide](link_to_installation_guide) if you haven't done so. | ||
- Populate `.env`. | ||
|
||
### SQL Search | ||
1. Execute a Basic SQL Query to perform keyword searches within your dataset: | ||
```shell | ||
spice sql | ||
``` | ||
|
||
Then: | ||
```sql | ||
SELECT path | ||
FROM spiceai.files | ||
WHERE | ||
LOWER(content) LIKE '%errors%' | ||
AND NOT contains(path, 'docs/release_notes') | ||
``` | ||
|
||
### Utilizing Vector-Based Search | ||
|
||
1. In the `spicepod.yaml`, uncomment the `datasets[0].embeddings`. | ||
2. Restart the spiced. | ||
3. Perform a basic search | ||
```shell | ||
curl -XPOST http://localhost:8090/v1/search \ | ||
-H "Content-Type: application/json" \ | ||
-d "{ | ||
\"datasets\": [\"spiceai.files\"], | ||
\"text\": \"testing\", | ||
\"where\": \"not contains(path, 'docs/release_notes')\", | ||
\"additional_columns\": [\"download_url\"], | ||
\"limit\": 2 | ||
}" | ||
``` | ||
|
||
### Additional Configuration - Chunking | ||
|
||
- Spice supports chunking large text fields for more precise searches. | ||
|
||
Example configuration: | ||
```yaml | ||
datasets: | ||
- ... | ||
embeddings: | ||
- column: body | ||
use: local_embedding_model | ||
chunking: | ||
enabled: true | ||
target_chunk_size: 512 | ||
1. Update the spicepod `datasets[0].embeddings.chunking.enabled: true`. | ||
2. Restart the spiced. | ||
3. Rerun the search | ||
```shell | ||
curlie -XPOST http://localhost:8090/v1/search \ | ||
-H 'Content-Type: application/json' \ | ||
-d "{ | ||
\"datasets\": [\"spiceai.files\"], | ||
\"text\": \"errors\", | ||
\"where\": \"not contains(path, 'docs/release_notes')\", | ||
\"additional_columns\": [\"download_url\"], | ||
\"limit\": 2 | ||
}" | ||
``` | ||
|
||
4. Rerun the search, and retrieve the full document (as an entry in `additional_coluumns`). | ||
```shell | ||
curlie -XPOST http://localhost:8090/v1/search \ | ||
-H 'Content-Type: application/json' \ | ||
-d "{ | ||
\"datasets\": [\"spiceai.files\"], | ||
\"text\": \"errors\", | ||
\"where\": \"not contains(path, 'docs/release_notes')\", | ||
\"additional_columns\": [\"download_url\" , \"content\"], | ||
\"limit\": 2 | ||
}" | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
version: v1beta1 | ||
kind: Spicepod | ||
name: sharepoint-qs | ||
|
||
models: | ||
- from: openai | ||
name: remote_service | ||
params: | ||
openai_api_key: ${ secrets:SPICE_OPENAI_API_KEY } | ||
|
||
embeddings: | ||
- name: local_embedding_model | ||
from: huggingface:huggingface.co/sentence-transformers/all-MiniLM-L6-v2 | ||
|
||
datasets: | ||
- from: github:github.com/spiceai/spiceai/files/trunk | ||
name: spiceai.files | ||
params: | ||
github_token: ${secrets:GITHUB_TOKEN} | ||
include: 'docs/**/*.md' | ||
acceleration: | ||
enabled: true | ||
embeddings: | ||
- column: content | ||
use: local_embedding_model | ||
column_pk: | ||
- path | ||
chunking: | ||
enabled: true | ||
target_chunk_size: 256 | ||
overlap_size: 64 | ||
file_format: md |