Skip to content

Commit

Permalink
Merge branch 'trunk' into graphql_update
Browse files Browse the repository at this point in the history
  • Loading branch information
digadeesh authored Jul 22, 2024
2 parents 68244c4 + 03c7a35 commit 861b95f
Show file tree
Hide file tree
Showing 34 changed files with 236 additions and 94 deletions.
4 changes: 2 additions & 2 deletions acceleration/data-refresh/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,15 +98,15 @@ curl -i -X PATCH \
-d '{
"refresh_sql": "SELECT * FROM taxi_trips WHERE passenger_count = 3"
}' \
localhost:3000/v1/datasets/taxi_trips/acceleration
localhost:8090/v1/datasets/taxi_trips/acceleration
```

The updated `refresh_sql` will be applied on the _next_ refresh (as determined by `refresh_check_interval`).

Make an additional call to trigger a refresh now:

```bash
curl -i -X POST localhost:3000/v1/datasets/taxi_trips/acceleration/refresh
curl -i -X POST localhost:8090/v1/datasets/taxi_trips/acceleration/refresh
```

Swap to the Spice SQL REPL and enter:
Expand Down
4 changes: 2 additions & 2 deletions caching/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ spice add spiceai/tpch
The following output is shown in the Spice runtime terminal:

```bash
2024-05-23T21:50:44.372314Z INFO spiced: Metrics listening on 127.0.0.1:9000
2024-05-23T21:50:44.372314Z INFO spiced: Metrics listening on 127.0.0.1:9090
2024-05-23T21:50:44.372986Z INFO runtime: Initialized results cache; max size: 128.00 MiB, item ttl: 1s
2024-05-23T21:50:45.861161Z INFO runtime: Registered dataset customer
2024-05-23T21:50:47.283554Z INFO runtime: Registered dataset lineitem
Expand Down Expand Up @@ -82,7 +82,7 @@ spice run
The following output is shown in the Spice runtime terminal, confirming the updated in-memory caching settings (`300s`):

```bash
2024-05-23T22:02:36.899534Z INFO spiced: Metrics listening on 127.0.0.1:9000
2024-05-23T22:02:36.899534Z INFO spiced: Metrics listening on 127.0.0.1:9090
2024-05-23T22:02:36.900280Z INFO runtime: Initialized results cache; max size: 128.00 MiB, item ttl: 300s
2024-05-23T22:02:38.683392Z INFO runtime: Registered dataset customer
2024-05-23T22:02:40.054125Z INFO runtime: Registered dataset lineitem
Expand Down
48 changes: 39 additions & 9 deletions catalogs/databricks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,30 +28,60 @@ catalogs:
- name: databricks:<CATALOG_NAME>
from: db_uc
params:
endpoint: <instance-id>.cloud.databricks.com
mode: spark_connect # or delta_lake
databricks_token: ${env:DATABRICKS_TOKEN}
databricks_endpoint: <instance-id>.cloud.databricks.com
databricks_cluster_id: <cluster-id>
```
For `mode` you can choose between `spark_connect` or `delta_lake`. `spark_connect` is the default mode and requires an [All-Purpose Compute Cluster](https://docs.databricks.com/en/compute/index.html) to be available. `delta_lake` mode queries directly against Delta Lake tables in object storage, and requires Spice to have the necessary permissions to access the object storage directly.

Visit the documentation for more information configuring the [Databricks Unity Catalog Connector](https://docs.spiceai.org/components/catalogs/databricks).
Set the `DATABRICKS_TOKEN` environment variable to the Databricks personal access token created in Step 1. A `.env` file created in the same directory as `spicepod.yaml` can be used to set the variable, i.e.:

```bash
echo "DATABRICKS_TOKEN=<token>" > .env
```

## Step 4. Login to Databricks with `spice login`
Visit the documentation for more information configuring the [Databricks Unity Catalog Connector](https://docs.spiceai.org/components/catalogs/databricks).

Using the Spice CLI, set the credentials needed to connect to Databricks.
## Step 4. Set the object storage credentials for `delta_lake` mode

### Using Spark Connect
`spice login databricks --token <access-token>`
When using the `delta_lake` mode, the object storage credentials must be set for Spice to access the data.

### Using Delta Lake directly against AWS S3
`spice login databricks --token <access-token> --aws-region <aws-region> --aws-access-key-id <aws-access-key-id> --aws-secret-access-key <aws-secret-access-key>`

```yaml
params:
mode: delta_lake
databricks_token: ${env:DATABRICKS_TOKEN}
databricks_aws_access_key_id: ${env:AWS_ACCESS_KEY_ID}
databricks_aws_secret_access_key: ${env:AWS_SECRET_ACCESS_KEY}
databricks_aws_region: <region> # E.g. us-east-1, us-west-2
databricks_aws_endpoint: <endpoint> # If using an S3-compatible service, like Minio
```

Set the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables to the AWS access key and secret key, respectively.

### Using Delta Lake directly against Azure Blob Storage
`spice login databricks --token <access-token> --azure-storage-account-name <account-name> --azure-storage-access-key <access-key>`

```yaml
params:
mode: delta_lake
databricks_token: ${env:DATABRICKS_TOKEN}
databricks_azure_storage_account_name: ${env:AZURE_ACCOUNT_NAME}
databricks_azure_account_key: ${env:AZURE_ACCOUNT_KEY}
```

Set the `AZURE_ACCOUNT_NAME` and `AZURE_ACCOUNT_KEY` environment variables to the Azure storage account name and account key, respectively.

### Using Delta Lake directly against Google Cloud Storage
`spice login databricks --token <access-token> --google-service-account-path /path/to/service-account.json`

```yaml
params:
mode: delta_lake
databricks_token: ${env:DATABRICKS_TOKEN}
databricks_google_service_account: </path/to/service-account.json>
```

## Step 5. Start the Spice runtime

Expand Down
18 changes: 10 additions & 8 deletions catalogs/spiceai/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,22 +11,24 @@ The Spice.ai Cloud Platform Catalog Connector makes querying datasets in the Spi

Sign up for a Spice.ai Cloud Platform account at https://spice.ai.

## Step 2. Login to the Spice.ai Cloud Platform with `spice login`

Using the Spice CLI, login to the Spice.ai Cloud Platform. A window will open in your browser to authenticate.
## Step 2. Create a new directory and initialize a Spicepod

```bash
spice login
mkdir spice-catalog-demo
cd spice-catalog-demo
spice init
```

## Step 3. Create a new directory and initialize a Spicepod
## Step 3. Login to the Spice.ai Cloud Platform with `spice login`

Working in the `spice-catalog-demo` directory, use the Spice CLI to login to the Spice.ai Cloud Platform. A browser window will open to authenticate when executing the `spice login` command.

```bash
mkdir spice-catalog-demo
cd spice-catalog-demo
spice init
spice login
```

After successfully authenticating, the Spice.ai Cloud Platform API Key and Token will be stored in the `spice-catalog-demo` working directory `.env` file. The Spice runtime reads environment variables set in the local working `.env` file.

## Step 4. Add the Spice.ai Cloud Platform Catalog Connector to `spicepod.yaml`

Add the following configuration to your `spicepod.yaml`:
Expand Down
38 changes: 31 additions & 7 deletions catalogs/unity_catalog/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ Add the following configuration to your `spicepod.yaml`:
catalogs:
- name: unity_catalog:https://<unity_catalog_host>/api/2.1/unity-catalog/catalogs/<catalog_name>
from: uc
params:
# Configure the object store credentials here
```

The Unity Catalog connector only supports Delta Lake tables and requires specifying the object store credentials to connect to the Delta Lake tables.
Expand All @@ -33,16 +35,38 @@ Visit the documentation for more information configuring the [Unity Catalog Conn

## Step 3. Configure the object store credentials

Using the Spice CLI, set the credentials needed to connect to the Delta Lake tables provided by the Unity Catalog Connector.
Configure credentials for the underlying Delta Lake tables object store.

### Using Delta Lake directly against AWS S3
`spice login delta_lake --aws-region <aws-region> --aws-access-key-id <aws-access-key-id> --aws-secret-access-key <aws-secret-access-key>`
### AWS S3

### Using Delta Lake directly against Azure Blob Storage
`spice login delta_lake --azure-storage-account-name <account-name> --azure-storage-access-key <access-key>`
```yaml
params:
unity_catalog_aws_access_key_id: ${env:AWS_ACCESS_KEY_ID}
unity_catalog_aws_secret_access_key: ${env:AWS_SECRET_ACCESS_KEY}
unity_catalog_aws_region: <region> # E.g. us-east-1, us-west-2
unity_catalog_aws_endpoint: <endpoint> # If using an S3-compatible service, like Minio
```
Set the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables to the AWS access key and secret key, respectively.

### Using Delta Lake directly against Google Cloud Storage
`spice login delta_lake --google-service-account-path /path/to/service-account.json`
### Azure Blob Storage

```yaml
params:
mode: delta_lake
unity_catalog_azure_storage_account_name: ${env:AZURE_ACCOUNT_NAME}
unity_catalog_azure_account_key: ${env:AZURE_ACCOUNT_KEY}
```

Set the `AZURE_ACCOUNT_NAME` and `AZURE_ACCOUNT_KEY` environment variables to the Azure storage account name and account key, respectively.

### Google Cloud Storage

```yaml
params:
mode: delta_lake
unity_catalog_google_service_account: </path/to/service-account.json>
```

## Step 5. Start the Spice runtime

Expand Down
14 changes: 13 additions & 1 deletion clickhouse/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,19 @@ Follow the [quickstart guide](https://docs.spiceai.org/getting-started) to get s

See the [datasets reference](https://docs.spiceai.org/reference/spicepod/datasets) for more dataset configuration options.

To securely store your Clickhouse password, see [Secret Stores](https://docs.spiceai.org/secret-stores)
Set the environment variable `CLICKHOUSE_PASS` to the Clickhouse instance password. Environment variables can be specified on the command line when running the Spice runtime or in a `.env` file in the same directory as `spicepod.yaml`.

i.e. to set the password in a `.env` file:

```bash
echo "CLICKHOUSE_PASS=<password>" > .env
```

A `.env` file is created in the project directory with the following content:

```bash
CLICKHOUSE_PASS=<password>
```

**Step 2.** Run the Spice runtime with `spice run` from this directory.

Expand Down
2 changes: 1 addition & 1 deletion clickhouse/spicepod.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ datasets:
clickhouse_db: [Database name]
clickhouse_tcp_port: [Port]
clickhouse_user: [Username]
clickhouse_pass: [Password]
clickhouse_pass: ${env:CLICKHOUSE_PASS}
clickhouse_secure: [true/false] # Default to true. Set to false if not connecting over SSL
acceleration:
enabled: true
Expand Down
8 changes: 5 additions & 3 deletions databricks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Spice can read data straight from a Databricks instance. This guide will create
- A Databricks personal access token is available (as the environment variable `DATABRICKS_TOKEN`).
- A table already exists in Databricks, called `spice_data.public.awesome_table`.

1. Initialise a Spice app
1. Initialize a Spice app
```shell
spice init databricks_demo
cd databricks_demo
Expand All @@ -16,12 +16,12 @@ Spice can read data straight from a Databricks instance. This guide will create
1. Start the Spice runtime
```shell
>>> spice run
2024-03-27T05:27:52.696536Z INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:3000
2024-03-27T05:27:52.696536Z INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090
2024-03-27T05:27:52.696543Z INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
2024-03-27T05:27:52.696606Z INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052
```

1. In another terminal, authenticate Spice with Databricks
1. In another terminal, working in the `databricks_demo` directory, configure Spice with the Databricks credentials
```shell
spice login databricks \
--token $DATABRICKS_TOKEN \
Expand All @@ -30,6 +30,8 @@ Spice can read data straight from a Databricks instance. This guide will create
--aws-region us-east-1
```

Executing `spice login` and successfully authenticating will create a `.env` file in the `databricks_demo` directory with the Databricks credentials.

1. Configure a Databricks dataset into the spicepod. The table provided must be a reference to a table in the Databricks unity catalog.
```shell
>>> spice dataset configure
Expand Down
18 changes: 10 additions & 8 deletions dremio/README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,28 @@

## Spice.ai Quickstart Tutorial using Dremio

This quickstart will use a demo instance of Dremio with a sample dataset. No need to set up a Dremio instance, but the same steps can be used to connect to any Dremio instance available.
The Dremio quickstart uses a publicly accessible demo instance of Dremio loaded with sample datasets. Thus, setting up your own Dremio instance is not required to complete the quickstart, but the same steps can be used to connect to any Dremio instance.

**Step 1.** Set the login credentials that the Spice runtime will use when accessing Dremio.
**Step 2.** Initialize a Spice project:

```bash
spice login dremio -u demo -p demo1234
spice init dremio-demo
cd dremio-demo
```

**Step 2.** Initialize a Spice project and start the runtime:
**Step 2.** Set the login credentials that the Spice runtime will use when accessing Dremio. Ensure this command is run in the `dremio-demo` directory.

```bash
spice init dremio-demo
spice login dremio -u demo -p demo1234
```

**Step 3.** Start the runtime.

```bash
cd dremio-demo
spice run
```

**Step 3.** Configure the dataset to connect to Dremio:
**Step 4.** Configure the dataset to connect to Dremio:

```bash
spice dataset configure
Expand Down Expand Up @@ -83,7 +85,7 @@ The Spice runtime terminal will show that the dataset has been loaded:
2024-03-27T05:36:38.107138Z INFO runtime::dataconnector: Refreshing data for taxi_trips
```
**Step 4.** Run queries against the dataset using the Spice SQL REPL.
**Step 5.** Run queries against the dataset using the Spice SQL REPL.
In a new terminal, start the Spice SQL REPL
Expand Down
4 changes: 2 additions & 2 deletions duckdb/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,10 @@ Confirm in the terminal output the `tpch_customer` dataset has been loaded:

```bash
Spice.ai runtime starting...
2024-04-29T18:23:18.055782Z INFO spiced: Metrics listening on 127.0.0.1:9000
2024-04-29T18:23:18.055782Z INFO spiced: Metrics listening on 127.0.0.1:9090
2024-04-29T18:23:18.059972Z INFO runtime: Loaded dataset: tpch_customer
2024-04-29T18:23:18.060005Z INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052
2024-04-29T18:23:18.062230Z INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:3000
2024-04-29T18:23:18.062230Z INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090
2024-04-29T18:23:18.062249Z INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
```

Expand Down
2 changes: 1 addition & 1 deletion federation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ spice init
name: (federation)?
```

**Step 3.** Log into the demo Dremio instance.
**Step 3.** Log into the demo Dremio instance. Ensure this command is run in the `federation` directory.

```bash
spice login dremio -u demo -p demo1234
Expand Down
8 changes: 7 additions & 1 deletion ftp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,13 @@ Follow the [quickstart guide](https://docs.spiceai.org/getting-started) to get s

See the [datasets reference](https://docs.spiceai.org/reference/spicepod/datasets) for more dataset configuration options.

To securely store your FTP/SFTP password, see [Secret Stores](https://docs.spiceai.org/secret-stores)
Set the environment variable `FTP_PASS`/`SFTP_PASS` to the password for your FTP server. This can be specified on the command line when running the Spice runtime, or in a `.env` file in the same directory as `spicepod.yaml`.

i.e. to set the password in a `.env` file:

```bash
echo "FTP_PASS=<password>" > .env
```

**Step 2.** Run the Spice runtime with `spice run` from this directory.

Expand Down
4 changes: 2 additions & 2 deletions ftp/spicepod_ftp.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ datasets:
- from: ftp://[remote_host]/[remote_path]/
name: [local_table_name]
params:
ftp_user: [ftp_user]
ftp_pass: [ftp_password]
ftp_user: [ftp_username]
ftp_pass: ${env:FTP_PASS}
acceleration:
enabled: true
refresh_mode: full
Expand Down
2 changes: 1 addition & 1 deletion ftp/spicepod_sftp.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ datasets:
name: [local_table_name]
params:
sftp_user: [sftp_user]
sftp_pass: [sftp_password]
sftp_pass: ${env:SFTP_PASS}
acceleration:
enabled: true
refresh_mode: full
Expand Down
4 changes: 2 additions & 2 deletions graphql/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@

Follow these steps to get started with GraphQL as a Data Connector.

**Step 1.** Edit the `spicepod.yaml` file in this directory and replace the params in the `graphql_quickstart` dataset with the connection parameters for your GraphQL instance, where `[local_table_name]` is your desired name for the federated table, `[graphql_endpoint]` is the url to your GraphQL endpoint, `[graphql_query]` is the query to execute, and `[json_pointer]` is the pointer to the data in the GraphQL response.
**Step 1.** Edit the `spicepod.yaml` file in this directory and replace the `graphql_quickstart` dataset params with the connection parameters for your GraphQL instance, where `[local_table_name]` is your desired name for the federated table, `[graphql_endpoint]` is the url to your GraphQL endpoint, `[graphql_query]` is the query to execute, and `[json_pointer]` is the pointer to the data in the GraphQL response.

For authentication options see [GraphQL Data Connector docs](https://docs.spiceai.org/data-connectors/graphql#configuration)

Follow the [quickstart guide](https://docs.spiceai.org/getting-started) to get started with the Spice.ai runtime.

See the [datasets reference](https://docs.spiceai.org/reference/spicepod/datasets) for more dataset configuration options.

To securely store your GraphQL auth params, see [Secret Stores](https://docs.spiceai.org/secret-stores)
To securely store GraphQL auth params, see [Secret Stores](https://docs.spiceai.org/components/secret-stores)

**Step 2.** Run the Spice runtime with `spice run` from this directory.

Expand Down
2 changes: 1 addition & 1 deletion graphql/spicepod.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ datasets:
name: [local_table_name]
params:
json_pointer: [json_pointer]
query: |
graphql_query: |
[graphql_query]
acceleration:
enabled: true
Expand Down
Loading

0 comments on commit 861b95f

Please sign in to comment.