|
| 1 | +--- |
| 2 | +title: "Polaris Catalog" |
| 3 | +linkTitle: "Polaris Catalog" |
| 4 | +weight: 18 |
| 5 | +description: Get started with Polaris Catalog in LocalStack for Snowflake |
| 6 | +--- |
| 7 | + |
| 8 | +{{< preview-notice >}} |
| 9 | + |
| 10 | +## Introduction |
| 11 | + |
| 12 | +[Polaris Catalog](https://github.com/apache/polaris) is a unified data catalog that provides a single view of all your data assets across Snowflake and external sources. It enables you to discover, understand, and govern your data assets, making it easier to find and use the right data for your analytics and machine learning projects. |
| 13 | + |
| 14 | +The Snowflake emulator supports creating Iceberg tables with Polaris catalog. Currently, [`CREATE CATALOG INTEGRATION`](https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-open-catalog) is supported by LocalStack. LocalStack also provides a `localstack/polaris` Docker image that can be used to create a local Polaris REST catalog. |
| 15 | + |
| 16 | +## Getting started |
| 17 | + |
| 18 | +This guide is designed for users new to Iceberg tables with Polaris catalog and assumes basic knowledge of SQL and Snowflake. Start your Snowflake emulator and connect to it using an SQL client in order to execute the queries further below. |
| 19 | + |
| 20 | +This guide shows how to use the Polaris REST catalog to create Iceberg tables in the Snowflake emulator, by: |
| 21 | + |
| 22 | +- Launching the Polaris Catalog service |
| 23 | +- Setting up an external volume |
| 24 | +- Creating a catalog integration |
| 25 | +- Creating an Iceberg table |
| 26 | +- Querying the Iceberg table |
| 27 | + |
| 28 | +### Start Polaris catalog container |
| 29 | + |
| 30 | +The following command starts the Polaris catalog container using the `localstack/polaris` Docker image: |
| 31 | + |
| 32 | +```bash |
| 33 | +docker run -d --name polaris-test \ |
| 34 | + -p 8181:8181 -p 8182:8182 \ |
| 35 | + -e AWS_REGION=us-east-1 \ |
| 36 | + -e AWS_ACCESS_KEY_ID=test \ |
| 37 | + -e AWS_SECRET_ACCESS_KEY=test \ |
| 38 | + -e AWS_ENDPOINT_URL=http://localhost:4566 \ |
| 39 | + -e POLARIS_BOOTSTRAP_CREDENTIALS=default-realm,root,s3cr3t \ |
| 40 | + -e polaris.realm-context.realms=default-realm \ |
| 41 | + -e quarkus.otel.sdk.disabled=true \ |
| 42 | + localstack/polaris:latest |
| 43 | +``` |
| 44 | + |
| 45 | +Wait for Polaris to become healthy: |
| 46 | + |
| 47 | +```bash |
| 48 | +curl -X GET http://localhost:8182/health |
| 49 | +``` |
| 50 | + |
| 51 | +### Authenticate and create Polaris catalog |
| 52 | + |
| 53 | +Set variables and retrieve an access token: |
| 54 | + |
| 55 | +```bash |
| 56 | +REALM="default-realm" |
| 57 | +CLIENT_ID="root" |
| 58 | +CLIENT_SECRET="s3cr3t" |
| 59 | +BUCKET_NAME="test-bucket-$(openssl rand -hex 4)" |
| 60 | +CATALOG_NAME="polaris" |
| 61 | + |
| 62 | +TOKEN=$(curl -s -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ |
| 63 | + -H "Polaris-Realm: $REALM" \ |
| 64 | + -d "grant_type=client_credentials&client_id=$CLIENT_ID&client_secret=$CLIENT_SECRET&scope=PRINCIPAL_ROLE:ALL" | jq -r '.access_token') |
| 65 | +``` |
| 66 | + |
| 67 | +The `TOKEN` variable will contain the access token. |
| 68 | + |
| 69 | +Create a catalog: |
| 70 | + |
| 71 | +```bash |
| 72 | +curl -s -X POST http://localhost:8181/api/management/v1/catalogs \ |
| 73 | + -H "Authorization: Bearer $TOKEN" \ |
| 74 | + -H "Content-Type: application/json" \ |
| 75 | + -d '{ |
| 76 | + "catalog": { |
| 77 | + "name": "'"$CATALOG_NAME"'", |
| 78 | + "type": "INTERNAL", |
| 79 | + "properties": { |
| 80 | + "default-base-location": "s3://'"$BUCKET_NAME"'/test" |
| 81 | + }, |
| 82 | + "storageConfigInfo": { |
| 83 | + "storageType": "S3_COMPATIBLE", |
| 84 | + "allowedLocations": ["s3://'"$BUCKET_NAME"'/"], |
| 85 | + "s3.roleArn": "arn:aws:iam::000000000000:role/'"$BUCKET_NAME"'", |
| 86 | + "region": "us-east-1", |
| 87 | + "s3.pathStyleAccess": true, |
| 88 | + "s3.endpoint": "http://localhost:4566" |
| 89 | + } |
| 90 | + } |
| 91 | + }' |
| 92 | +``` |
| 93 | + |
| 94 | +Grant necessary permissions to the catalog: |
| 95 | + |
| 96 | +```bash |
| 97 | +curl -s -X PUT http://localhost:8181/api/management/v1/catalogs/polaris/catalog-roles/catalog_admin/grants \ |
| 98 | + -H "Authorization: Bearer $TOKEN" \ |
| 99 | + -H "Content-Type: application/json" \ |
| 100 | + -d '{"type": "catalog", "privilege": "TABLE_WRITE_DATA"}' |
| 101 | +``` |
| 102 | + |
| 103 | +### Create a bucket |
| 104 | + |
| 105 | +Create a bucket using the `awslocal` command: |
| 106 | + |
| 107 | +```bash |
| 108 | +awslocal s3 mb s3://$BUCKET_NAME |
| 109 | +``` |
| 110 | + |
| 111 | +### Create an external volume |
| 112 | + |
| 113 | +In your SQL client, create an external volume using the `CREATE EXTERNAL VOLUME` statement: |
| 114 | + |
| 115 | +```sql |
| 116 | +CREATE EXTERNAL VOLUME polaris_volume |
| 117 | +STORAGE_LOCATIONS = ( |
| 118 | + ( |
| 119 | + NAME = aws_s3_test |
| 120 | + STORAGE_PROVIDER = S3 |
| 121 | + STORAGE_BASE_URL = 's3://test-bucket/' |
| 122 | + STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::000000000000:role/test-bucket' |
| 123 | + ENCRYPTION = (TYPE = AWS_SSE_S3) |
| 124 | + ) |
| 125 | +) |
| 126 | +ALLOW_WRITES = TRUE; |
| 127 | +``` |
| 128 | + |
| 129 | +### Create catalog integration |
| 130 | + |
| 131 | +Create a catalog integration using the `CREATE CATALOG INTEGRATION` statement: |
| 132 | + |
| 133 | +```sql |
| 134 | +CREATE CATALOG INTEGRATION polaris_catalog |
| 135 | +CATALOG_SOURCE = ICEBERG_REST |
| 136 | +TABLE_FORMAT = ICEBERG |
| 137 | +CATALOG_NAMESPACE = 'test_namespace' |
| 138 | +REST_CONFIG = ( |
| 139 | + CATALOG_URI = 'http://localhost:8181', |
| 140 | + CATALOG_NAME = 'polaris' |
| 141 | +) |
| 142 | +REST_AUTHENTICATION = ( |
| 143 | + TYPE = OAUTH, |
| 144 | + OAUTH_CLIENT_ID = 'root', |
| 145 | + OAUTH_CLIENT_SECRET = 's3cr3t', |
| 146 | + OAUTH_ALLOWED_SCOPES = (PRINCIPAL_ROLE:ALL) |
| 147 | +) |
| 148 | +ENABLED = TRUE |
| 149 | +REFRESH_INTERVAL_SECONDS = 60 |
| 150 | +COMMENT = 'Polaris catalog integration'; |
| 151 | +``` |
| 152 | + |
| 153 | +### Create and query an Iceberg table |
| 154 | + |
| 155 | +Now create the table using the Polaris catalog and volume: |
| 156 | + |
| 157 | +```sql |
| 158 | +CREATE ICEBERG TABLE polaris_iceberg_table (c1 TEXT) |
| 159 | +CATALOG = 'polaris_catalog', |
| 160 | +EXTERNAL_VOLUME = 'polaris_volume', |
| 161 | +BASE_LOCATION = 'test/test_namespace'; |
| 162 | +``` |
| 163 | + |
| 164 | +Insert and query data: |
| 165 | + |
| 166 | +```sql |
| 167 | +INSERT INTO polaris_iceberg_table(c1) VALUES ('test'), ('polaris'), ('iceberg'); |
| 168 | + |
| 169 | +SELECT * FROM polaris_iceberg_table; |
| 170 | +``` |
| 171 | + |
| 172 | +The output should be: |
| 173 | + |
| 174 | +```sql |
| 175 | ++----------+ |
| 176 | +| c1 | |
| 177 | +|----------| |
| 178 | +| iceberg | |
| 179 | +| foobar | |
| 180 | +| test | |
| 181 | ++----------+ |
| 182 | +``` |
| 183 | + |
| 184 | +All data will be persisted under: |
| 185 | + |
| 186 | +```bash |
| 187 | +awslocal s3 ls s3://$BUCKET_NAME/test/test_namespace/ |
| 188 | +``` |
| 189 | + |
| 190 | +You will see: |
| 191 | + |
| 192 | +- `data/` with `.parquet` files |
| 193 | +- `metadata/` with Iceberg metadata files |
| 194 | + |
| 195 | +## Configuration options |
| 196 | + |
| 197 | +The following configuration options are available for the Polaris Catalog Docker image provided by LocalStack: |
| 198 | + |
| 199 | +| Environment Variable | Description | Default Value | Required | |
| 200 | +|---------------------|-------------|---------------|----------| |
| 201 | +| `AWS_REGION` | The AWS region to use | `us-east-1` | Yes | |
| 202 | +| `AWS_ACCESS_KEY_ID` | AWS access key ID for accessing AWS services | - | Yes when using AWS services | |
| 203 | +| `AWS_SECRET_ACCESS_KEY` | AWS secret access key for accessing AWS services | - | Yes when using AWS services | |
| 204 | +| `AWS_ENDPOINT_URL` | Custom endpoint URL for AWS services (e.g., for LocalStack) | - | No | |
| 205 | +| `POLARIS_BOOTSTRAP_CREDENTIALS` | Initial realm, username, and password in format: `realm,username,password` | - | Yes | |
| 206 | +| `polaris.realm-context.realms` | List of realms to create/use | - | Yes | |
| 207 | +| `quarkus.otel.sdk.disabled` | Disable OpenTelemetry SDK | `false` | No | |
| 208 | + |
| 209 | +The following logging options are available for the Polaris Catalog Docker image: |
| 210 | + |
| 211 | +| Logging Option | Description | |
| 212 | +|----------------|-------------| |
| 213 | +| `quarkus.log.level` | Sets the overall logging level (e.g., DEBUG) | |
| 214 | +| `quarkus.log.console.level` | Sets the console logging level (e.g., DEBUG) | |
| 215 | +| `quarkus.log.category."org.apache.polaris".level` | Sets the logging level specifically for the Polaris components | |
| 216 | +| `quarkus.log.category."org.apache.polaris".min-level` | Sets the minimum logging level for the Polaris components (e.g., TRACE) | |
0 commit comments