Merge branch 'trunk' into graphql_update

spiceai · Jul 22, 2024 · 861b95f · 861b95f
2 parents 68244c4 + 03c7a35
commit 861b95f
Show file tree

Hide file tree

Showing 34 changed files with 236 additions and 94 deletions.
diff --git a/acceleration/data-refresh/README.md b/acceleration/data-refresh/README.md
@@ -98,15 +98,15 @@ curl -i -X PATCH \
      -d '{
            "refresh_sql": "SELECT * FROM taxi_trips WHERE passenger_count = 3"
          }' \
-     localhost:3000/v1/datasets/taxi_trips/acceleration
+     localhost:8090/v1/datasets/taxi_trips/acceleration
 ```
 
 The updated `refresh_sql` will be applied on the _next_ refresh (as determined by `refresh_check_interval`).
 
 Make an additional call to trigger a refresh now:
 
 ```bash
-curl -i -X POST localhost:3000/v1/datasets/taxi_trips/acceleration/refresh
+curl -i -X POST localhost:8090/v1/datasets/taxi_trips/acceleration/refresh
 ```
 
 Swap to the Spice SQL REPL and enter:

diff --git a/caching/README.md b/caching/README.md
@@ -27,7 +27,7 @@ spice add spiceai/tpch
 The following output is shown in the Spice runtime terminal:
 
 ```bash
-2024-05-23T21:50:44.372314Z  INFO spiced: Metrics listening on 127.0.0.1:9000
+2024-05-23T21:50:44.372314Z  INFO spiced: Metrics listening on 127.0.0.1:9090
 2024-05-23T21:50:44.372986Z  INFO runtime: Initialized results cache; max size: 128.00 MiB, item ttl: 1s
 2024-05-23T21:50:45.861161Z  INFO runtime: Registered dataset customer
 2024-05-23T21:50:47.283554Z  INFO runtime: Registered dataset lineitem
@@ -82,7 +82,7 @@ spice run
 The following output is shown in the Spice runtime terminal, confirming the updated in-memory caching settings (`300s`):
 
 ```bash
-2024-05-23T22:02:36.899534Z  INFO spiced: Metrics listening on 127.0.0.1:9000
+2024-05-23T22:02:36.899534Z  INFO spiced: Metrics listening on 127.0.0.1:9090
 2024-05-23T22:02:36.900280Z  INFO runtime: Initialized results cache; max size: 128.00 MiB, item ttl: 300s
 2024-05-23T22:02:38.683392Z  INFO runtime: Registered dataset customer
 2024-05-23T22:02:40.054125Z  INFO runtime: Registered dataset lineitem

diff --git a/catalogs/databricks/README.md b/catalogs/databricks/README.md
@@ -28,30 +28,60 @@ catalogs:
   - name: databricks:<CATALOG_NAME>
     from: db_uc
     params:
-      endpoint: <instance-id>.cloud.databricks.com
       mode: spark_connect # or delta_lake
+      databricks_token: ${env:DATABRICKS_TOKEN}
+      databricks_endpoint: <instance-id>.cloud.databricks.com
       databricks_cluster_id: <cluster-id>
 ```
 
 For `mode` you can choose between `spark_connect` or `delta_lake`. `spark_connect` is the default mode and requires an [All-Purpose Compute Cluster](https://docs.databricks.com/en/compute/index.html) to be available. `delta_lake` mode queries directly against Delta Lake tables in object storage, and requires Spice to have the necessary permissions to access the object storage directly.
 
-Visit the documentation for more information configuring the [Databricks Unity Catalog Connector](https://docs.spiceai.org/components/catalogs/databricks).
+Set the `DATABRICKS_TOKEN` environment variable to the Databricks personal access token created in Step 1. A `.env` file created in the same directory as `spicepod.yaml` can be used to set the variable, i.e.:
+
+```bash
+echo "DATABRICKS_TOKEN=<token>" > .env
+```
 
-## Step 4. Login to Databricks with `spice login`
+Visit the documentation for more information configuring the [Databricks Unity Catalog Connector](https://docs.spiceai.org/components/catalogs/databricks).
 
-Using the Spice CLI, set the credentials needed to connect to Databricks.
+## Step 4. Set the object storage credentials for `delta_lake` mode
 
-### Using Spark Connect
-`spice login databricks --token <access-token>`
+When using the `delta_lake` mode, the object storage credentials must be set for Spice to access the data.
 
 ### Using Delta Lake directly against AWS S3
-`spice login databricks --token <access-token> --aws-region <aws-region> --aws-access-key-id <aws-access-key-id> --aws-secret-access-key <aws-secret-access-key>`
+
+```yaml
+params:
+  mode: delta_lake
+  databricks_token: ${env:DATABRICKS_TOKEN}
+  databricks_aws_access_key_id: ${env:AWS_ACCESS_KEY_ID}
+  databricks_aws_secret_access_key: ${env:AWS_SECRET_ACCESS_KEY}
+  databricks_aws_region: <region> # E.g. us-east-1, us-west-2
+  databricks_aws_endpoint: <endpoint> # If using an S3-compatible service, like Minio
+```
+
+Set the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables to the AWS access key and secret key, respectively.
 
 ### Using Delta Lake directly against Azure Blob Storage
-`spice login databricks --token <access-token> --azure-storage-account-name <account-name> --azure-storage-access-key <access-key>`
+
+```yaml
+params:
+  mode: delta_lake
+  databricks_token: ${env:DATABRICKS_TOKEN}
+  databricks_azure_storage_account_name: ${env:AZURE_ACCOUNT_NAME}
+  databricks_azure_account_key: ${env:AZURE_ACCOUNT_KEY}
+```
+
+Set the `AZURE_ACCOUNT_NAME` and `AZURE_ACCOUNT_KEY` environment variables to the Azure storage account name and account key, respectively.
 
 ### Using Delta Lake directly against Google Cloud Storage
-`spice login databricks --token <access-token> --google-service-account-path /path/to/service-account.json`
+
+```yaml
+params:
+  mode: delta_lake
+  databricks_token: ${env:DATABRICKS_TOKEN}
+  databricks_google_service_account: </path/to/service-account.json>
+```
 
 ## Step 5. Start the Spice runtime
 

diff --git a/catalogs/spiceai/README.md b/catalogs/spiceai/README.md
@@ -11,22 +11,24 @@ The Spice.ai Cloud Platform Catalog Connector makes querying datasets in the Spi
 
 Sign up for a Spice.ai Cloud Platform account at https://spice.ai.
 
-## Step 2. Login to the Spice.ai Cloud Platform with `spice login`
-
-Using the Spice CLI, login to the Spice.ai Cloud Platform. A window will open in your browser to authenticate.
+## Step 2. Create a new directory and initialize a Spicepod
 
 ```bash
-spice login
+mkdir spice-catalog-demo
+cd spice-catalog-demo
+spice init
 ```
 
-## Step 3. Create a new directory and initialize a Spicepod
+## Step 3. Login to the Spice.ai Cloud Platform with `spice login`
+
+Working in the `spice-catalog-demo` directory, use the Spice CLI to login to the Spice.ai Cloud Platform. A browser window will open to authenticate when executing the `spice login` command.
 
 ```bash
-mkdir spice-catalog-demo
-cd spice-catalog-demo
-spice init
+spice login
 ```
 
+After successfully authenticating, the Spice.ai Cloud Platform API Key and Token will be stored in the `spice-catalog-demo` working directory `.env` file. The Spice runtime reads environment variables set in the local working `.env` file.
+
 ## Step 4. Add the Spice.ai Cloud Platform Catalog Connector to `spicepod.yaml`
 
 Add the following configuration to your `spicepod.yaml`:

diff --git a/catalogs/unity_catalog/README.md b/catalogs/unity_catalog/README.md
@@ -25,6 +25,8 @@ Add the following configuration to your `spicepod.yaml`:
 catalogs:
   - name: unity_catalog:https://<unity_catalog_host>/api/2.1/unity-catalog/catalogs/<catalog_name>
     from: uc
+    params:
+      # Configure the object store credentials here
 ```
 
 The Unity Catalog connector only supports Delta Lake tables and requires specifying the object store credentials to connect to the Delta Lake tables.
@@ -33,16 +35,38 @@ Visit the documentation for more information configuring the [Unity Catalog Conn
 
 ## Step 3. Configure the object store credentials
 
-Using the Spice CLI, set the credentials needed to connect to the Delta Lake tables provided by the Unity Catalog Connector.
+Configure credentials for the underlying Delta Lake tables object store.
 
-### Using Delta Lake directly against AWS S3
-`spice login delta_lake --aws-region <aws-region> --aws-access-key-id <aws-access-key-id> --aws-secret-access-key <aws-secret-access-key>`
+### AWS S3
 
-### Using Delta Lake directly against Azure Blob Storage
-`spice login delta_lake --azure-storage-account-name <account-name> --azure-storage-access-key <access-key>`
+```yaml
+params:
+  unity_catalog_aws_access_key_id: ${env:AWS_ACCESS_KEY_ID}
+  unity_catalog_aws_secret_access_key: ${env:AWS_SECRET_ACCESS_KEY}
+  unity_catalog_aws_region: <region> # E.g. us-east-1, us-west-2
+  unity_catalog_aws_endpoint: <endpoint> # If using an S3-compatible service, like Minio
+```
+
+Set the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables to the AWS access key and secret key, respectively.
 
-### Using Delta Lake directly against Google Cloud Storage
-`spice login delta_lake --google-service-account-path /path/to/service-account.json`
+### Azure Blob Storage
+
+```yaml
+params:
+  mode: delta_lake
+  unity_catalog_azure_storage_account_name: ${env:AZURE_ACCOUNT_NAME}
+  unity_catalog_azure_account_key: ${env:AZURE_ACCOUNT_KEY}
+```
+
+Set the `AZURE_ACCOUNT_NAME` and `AZURE_ACCOUNT_KEY` environment variables to the Azure storage account name and account key, respectively.
+
+### Google Cloud Storage
+
+```yaml
+params:
+  mode: delta_lake
+  unity_catalog_google_service_account: </path/to/service-account.json>
+```
 
 ## Step 5. Start the Spice runtime
 

diff --git a/clickhouse/README.md b/clickhouse/README.md
@@ -10,7 +10,19 @@ Follow the [quickstart guide](https://docs.spiceai.org/getting-started) to get s
 
 See the [datasets reference](https://docs.spiceai.org/reference/spicepod/datasets) for more dataset configuration options.
 
-To securely store your Clickhouse password, see [Secret Stores](https://docs.spiceai.org/secret-stores)
+Set the environment variable `CLICKHOUSE_PASS` to the Clickhouse instance password. Environment variables can be specified on the command line when running the Spice runtime or in a `.env` file in the same directory as `spicepod.yaml`.
+
+i.e. to set the password in a `.env` file:
+
+```bash
+echo "CLICKHOUSE_PASS=<password>" > .env
+```
+
+A `.env` file is created in the project directory with the following content:
+
+```bash
+CLICKHOUSE_PASS=<password>
+```
 
 **Step 2.** Run the Spice runtime with `spice run` from this directory.
 

diff --git a/clickhouse/spicepod.yaml b/clickhouse/spicepod.yaml
@@ -9,7 +9,7 @@ datasets:
       clickhouse_db: [Database name]
       clickhouse_tcp_port: [Port]
       clickhouse_user: [Username]
-      clickhouse_pass: [Password]
+      clickhouse_pass: ${env:CLICKHOUSE_PASS}
       clickhouse_secure: [true/false] # Default to true.  Set to false if not connecting over SSL
     acceleration:
       enabled: true

diff --git a/databricks/README.md b/databricks/README.md
@@ -7,7 +7,7 @@ Spice can read data straight from a Databricks instance. This guide will create
 - A Databricks personal access token is available (as the environment variable `DATABRICKS_TOKEN`).
 - A table already exists in Databricks, called `spice_data.public.awesome_table`.
 
-1. Initialise a Spice app
+1. Initialize a Spice app
     ```shell
     spice init databricks_demo
     cd databricks_demo
@@ -16,12 +16,12 @@ Spice can read data straight from a Databricks instance. This guide will create
 1. Start the Spice runtime
     ```shell
     >>> spice run
-    2024-03-27T05:27:52.696536Z  INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:3000
+    2024-03-27T05:27:52.696536Z  INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090
     2024-03-27T05:27:52.696543Z  INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
     2024-03-27T05:27:52.696606Z  INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052
     ```
 
-1. In another terminal, authenticate Spice with Databricks
+1. In another terminal, working in the `databricks_demo` directory, configure Spice with the Databricks credentials
     ```shell
     spice login databricks \
         --token $DATABRICKS_TOKEN \
@@ -30,6 +30,8 @@ Spice can read data straight from a Databricks instance. This guide will create
         --aws-region us-east-1
     ``` 
 
+    Executing `spice login` and successfully authenticating will create a `.env` file in the `databricks_demo` directory with the Databricks credentials.
+
 1. Configure a Databricks dataset into the spicepod. The table provided must be a reference to a table in the Databricks unity catalog. 
     ```shell
     >>> spice dataset configure

diff --git a/dremio/README.md b/dremio/README.md
@@ -1,26 +1,28 @@
 
 ## Spice.ai Quickstart Tutorial using Dremio
 
-This quickstart will use a demo instance of Dremio with a sample dataset.  No need to set up a Dremio instance, but the same steps can be used to connect to any Dremio instance available. 
+The Dremio quickstart uses a publicly accessible demo instance of Dremio loaded with sample datasets. Thus, setting up your own Dremio instance is not required to complete the quickstart, but the same steps can be used to connect to any Dremio instance.
 
-**Step 1.** Set the login credentials that the Spice runtime will use when accessing Dremio.
+**Step 2.** Initialize a Spice project:
 
 ```bash
-spice login dremio -u demo -p demo1234
+spice init dremio-demo
+cd dremio-demo
 ```
 
-**Step 2.** Initialize a Spice project and start the runtime:
+**Step 2.** Set the login credentials that the Spice runtime will use when accessing Dremio. Ensure this command is run in the `dremio-demo` directory.
 
 ```bash
-spice init dremio-demo
+spice login dremio -u demo -p demo1234
 ```
 
+**Step 3.** Start the runtime.
+
 ```bash
-cd dremio-demo
 spice run
 ```
 
-**Step 3.** Configure the dataset to connect to Dremio:
+**Step 4.** Configure the dataset to connect to Dremio:
 
 ```bash
 spice dataset configure
@@ -83,7 +85,7 @@ The Spice runtime terminal will show that the dataset has been loaded:
 2024-03-27T05:36:38.107138Z  INFO runtime::dataconnector: Refreshing data for taxi_trips
 ```
 
-**Step 4.** Run queries against the dataset using the Spice SQL REPL.
+**Step 5.** Run queries against the dataset using the Spice SQL REPL.
 
 In a new terminal, start the Spice SQL REPL
 

diff --git a/duckdb/README.md b/duckdb/README.md
@@ -61,10 +61,10 @@ Confirm in the terminal output the `tpch_customer` dataset has been loaded:
 
 ```bash
 Spice.ai runtime starting...
-2024-04-29T18:23:18.055782Z  INFO spiced: Metrics listening on 127.0.0.1:9000
+2024-04-29T18:23:18.055782Z  INFO spiced: Metrics listening on 127.0.0.1:9090
 2024-04-29T18:23:18.059972Z  INFO runtime: Loaded dataset: tpch_customer
 2024-04-29T18:23:18.060005Z  INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052
-2024-04-29T18:23:18.062230Z  INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:3000
+2024-04-29T18:23:18.062230Z  INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090
 2024-04-29T18:23:18.062249Z  INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
 ```
 

diff --git a/federation/README.md b/federation/README.md
@@ -22,7 +22,7 @@ spice init
 name: (federation)?
 ```
 
-**Step 3.** Log into the demo Dremio instance.
+**Step 3.** Log into the demo Dremio instance. Ensure this command is run in the `federation` directory.
 
 ```bash
 spice login dremio -u demo -p demo1234

diff --git a/ftp/README.md b/ftp/README.md
@@ -8,7 +8,13 @@ Follow the [quickstart guide](https://docs.spiceai.org/getting-started) to get s
 
 See the [datasets reference](https://docs.spiceai.org/reference/spicepod/datasets) for more dataset configuration options.
 
-To securely store your FTP/SFTP password, see [Secret Stores](https://docs.spiceai.org/secret-stores)
+Set the environment variable `FTP_PASS`/`SFTP_PASS` to the password for your FTP server. This can be specified on the command line when running the Spice runtime, or in a `.env` file in the same directory as `spicepod.yaml`.
+
+i.e. to set the password in a `.env` file:
+
+```bash
+echo "FTP_PASS=<password>" > .env
+```
 
 **Step 2.** Run the Spice runtime with `spice run` from this directory.
 

diff --git a/ftp/spicepod_ftp.yaml b/ftp/spicepod_ftp.yaml
@@ -5,8 +5,8 @@ datasets:
   - from: ftp://[remote_host]/[remote_path]/
     name: [local_table_name]
     params:
-      ftp_user: [ftp_user]
-      ftp_pass: [ftp_password]
+      ftp_user: [ftp_username]
+      ftp_pass: ${env:FTP_PASS}
     acceleration:
       enabled: true
       refresh_mode: full

diff --git a/ftp/spicepod_sftp.yaml b/ftp/spicepod_sftp.yaml
@@ -6,7 +6,7 @@ datasets:
     name: [local_table_name]
     params:
       sftp_user: [sftp_user]
-      sftp_pass: [sftp_password]
+      sftp_pass: ${env:SFTP_PASS}
     acceleration:
       enabled: true
       refresh_mode: full

diff --git a/graphql/README.md b/graphql/README.md
@@ -2,15 +2,15 @@
 
 Follow these steps to get started with GraphQL as a Data Connector.
 
-**Step 1.** Edit the `spicepod.yaml` file in this directory and replace the params in the `graphql_quickstart` dataset with the connection parameters for your GraphQL instance, where `[local_table_name]` is your desired name for the federated table, `[graphql_endpoint]` is the url to your GraphQL endpoint, `[graphql_query]` is the query to execute, and `[json_pointer]` is the pointer to the data in the GraphQL response.
+**Step 1.** Edit the `spicepod.yaml` file in this directory and replace the `graphql_quickstart` dataset params with the connection parameters for your GraphQL instance, where `[local_table_name]` is your desired name for the federated table, `[graphql_endpoint]` is the url to your GraphQL endpoint, `[graphql_query]` is the query to execute, and `[json_pointer]` is the pointer to the data in the GraphQL response.
 
 For authentication options see [GraphQL Data Connector docs](https://docs.spiceai.org/data-connectors/graphql#configuration)
 
 Follow the [quickstart guide](https://docs.spiceai.org/getting-started) to get started with the Spice.ai runtime.
 
 See the [datasets reference](https://docs.spiceai.org/reference/spicepod/datasets) for more dataset configuration options.
 
-To securely store your GraphQL auth params, see [Secret Stores](https://docs.spiceai.org/secret-stores)
+To securely store GraphQL auth params, see [Secret Stores](https://docs.spiceai.org/components/secret-stores)
 
 **Step 2.** Run the Spice runtime with `spice run` from this directory.
 

diff --git a/graphql/spicepod.yaml b/graphql/spicepod.yaml
@@ -6,7 +6,7 @@ datasets:
     name: [local_table_name]
     params:
       json_pointer: [json_pointer]
-      query: |
+      graphql_query: |
         [graphql_query]
     acceleration:
       enabled: true