Skip to content

Commit 086b349

Browse files
HarshCasperwhummer
andauthored
add docs for polaris catalog (#73)
* add docs for polaris catalog * Update index.md * Update content/en/user-guide/polaris-catalog/index.md Co-authored-by: Waldemar Hummer <[email protected]> * Update content/en/user-guide/polaris-catalog/index.md Co-authored-by: Waldemar Hummer <[email protected]> * revamp polaris catalog docs * add some configuration options * add a step to create a S3 bucket * last nits --------- Co-authored-by: Waldemar Hummer <[email protected]>
1 parent d719793 commit 086b349

File tree

2 files changed

+217
-1
lines changed

2 files changed

+217
-1
lines changed

content/en/user-guide/init-hooks/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "Initialization Hooks"
33
linkTitle: "Initialization Hooks"
4-
weight: 17
4+
weight: 19
55
description: Writing SQL scripts to initialize your Snowflake emulator
66
---
77

Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
---
2+
title: "Polaris Catalog"
3+
linkTitle: "Polaris Catalog"
4+
weight: 18
5+
description: Get started with Polaris Catalog in LocalStack for Snowflake
6+
---
7+
8+
{{< preview-notice >}}
9+
10+
## Introduction
11+
12+
[Polaris Catalog](https://github.com/apache/polaris) is a unified data catalog that provides a single view of all your data assets across Snowflake and external sources. It enables you to discover, understand, and govern your data assets, making it easier to find and use the right data for your analytics and machine learning projects.
13+
14+
The Snowflake emulator supports creating Iceberg tables with Polaris catalog. Currently, [`CREATE CATALOG INTEGRATION`](https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-open-catalog) is supported by LocalStack. LocalStack also provides a `localstack/polaris` Docker image that can be used to create a local Polaris REST catalog.
15+
16+
## Getting started
17+
18+
This guide is designed for users new to Iceberg tables with Polaris catalog and assumes basic knowledge of SQL and Snowflake. Start your Snowflake emulator and connect to it using an SQL client in order to execute the queries further below.
19+
20+
This guide shows how to use the Polaris REST catalog to create Iceberg tables in the Snowflake emulator, by:
21+
22+
- Launching the Polaris Catalog service
23+
- Setting up an external volume
24+
- Creating a catalog integration
25+
- Creating an Iceberg table
26+
- Querying the Iceberg table
27+
28+
### Start Polaris catalog container
29+
30+
The following command starts the Polaris catalog container using the `localstack/polaris` Docker image:
31+
32+
```bash
33+
docker run -d --name polaris-test \
34+
-p 8181:8181 -p 8182:8182 \
35+
-e AWS_REGION=us-east-1 \
36+
-e AWS_ACCESS_KEY_ID=test \
37+
-e AWS_SECRET_ACCESS_KEY=test \
38+
-e AWS_ENDPOINT_URL=http://localhost:4566 \
39+
-e POLARIS_BOOTSTRAP_CREDENTIALS=default-realm,root,s3cr3t \
40+
-e polaris.realm-context.realms=default-realm \
41+
-e quarkus.otel.sdk.disabled=true \
42+
localstack/polaris:latest
43+
```
44+
45+
Wait for Polaris to become healthy:
46+
47+
```bash
48+
curl -X GET http://localhost:8182/health
49+
```
50+
51+
### Authenticate and create Polaris catalog
52+
53+
Set variables and retrieve an access token:
54+
55+
```bash
56+
REALM="default-realm"
57+
CLIENT_ID="root"
58+
CLIENT_SECRET="s3cr3t"
59+
BUCKET_NAME="test-bucket-$(openssl rand -hex 4)"
60+
CATALOG_NAME="polaris"
61+
62+
TOKEN=$(curl -s -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \
63+
-H "Polaris-Realm: $REALM" \
64+
-d "grant_type=client_credentials&client_id=$CLIENT_ID&client_secret=$CLIENT_SECRET&scope=PRINCIPAL_ROLE:ALL" | jq -r '.access_token')
65+
```
66+
67+
The `TOKEN` variable will contain the access token.
68+
69+
Create a catalog:
70+
71+
```bash
72+
curl -s -X POST http://localhost:8181/api/management/v1/catalogs \
73+
-H "Authorization: Bearer $TOKEN" \
74+
-H "Content-Type: application/json" \
75+
-d '{
76+
"catalog": {
77+
"name": "'"$CATALOG_NAME"'",
78+
"type": "INTERNAL",
79+
"properties": {
80+
"default-base-location": "s3://'"$BUCKET_NAME"'/test"
81+
},
82+
"storageConfigInfo": {
83+
"storageType": "S3_COMPATIBLE",
84+
"allowedLocations": ["s3://'"$BUCKET_NAME"'/"],
85+
"s3.roleArn": "arn:aws:iam::000000000000:role/'"$BUCKET_NAME"'",
86+
"region": "us-east-1",
87+
"s3.pathStyleAccess": true,
88+
"s3.endpoint": "http://localhost:4566"
89+
}
90+
}
91+
}'
92+
```
93+
94+
Grant necessary permissions to the catalog:
95+
96+
```bash
97+
curl -s -X PUT http://localhost:8181/api/management/v1/catalogs/polaris/catalog-roles/catalog_admin/grants \
98+
-H "Authorization: Bearer $TOKEN" \
99+
-H "Content-Type: application/json" \
100+
-d '{"type": "catalog", "privilege": "TABLE_WRITE_DATA"}'
101+
```
102+
103+
### Create a bucket
104+
105+
Create a bucket using the `awslocal` command:
106+
107+
```bash
108+
awslocal s3 mb s3://$BUCKET_NAME
109+
```
110+
111+
### Create an external volume
112+
113+
In your SQL client, create an external volume using the `CREATE EXTERNAL VOLUME` statement:
114+
115+
```sql
116+
CREATE EXTERNAL VOLUME polaris_volume
117+
STORAGE_LOCATIONS = (
118+
(
119+
NAME = aws_s3_test
120+
STORAGE_PROVIDER = S3
121+
STORAGE_BASE_URL = 's3://test-bucket/'
122+
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::000000000000:role/test-bucket'
123+
ENCRYPTION = (TYPE = AWS_SSE_S3)
124+
)
125+
)
126+
ALLOW_WRITES = TRUE;
127+
```
128+
129+
### Create catalog integration
130+
131+
Create a catalog integration using the `CREATE CATALOG INTEGRATION` statement:
132+
133+
```sql
134+
CREATE CATALOG INTEGRATION polaris_catalog
135+
CATALOG_SOURCE = ICEBERG_REST
136+
TABLE_FORMAT = ICEBERG
137+
CATALOG_NAMESPACE = 'test_namespace'
138+
REST_CONFIG = (
139+
CATALOG_URI = 'http://localhost:8181',
140+
CATALOG_NAME = 'polaris'
141+
)
142+
REST_AUTHENTICATION = (
143+
TYPE = OAUTH,
144+
OAUTH_CLIENT_ID = 'root',
145+
OAUTH_CLIENT_SECRET = 's3cr3t',
146+
OAUTH_ALLOWED_SCOPES = (PRINCIPAL_ROLE:ALL)
147+
)
148+
ENABLED = TRUE
149+
REFRESH_INTERVAL_SECONDS = 60
150+
COMMENT = 'Polaris catalog integration';
151+
```
152+
153+
### Create and query an Iceberg table
154+
155+
Now create the table using the Polaris catalog and volume:
156+
157+
```sql
158+
CREATE ICEBERG TABLE polaris_iceberg_table (c1 TEXT)
159+
CATALOG = 'polaris_catalog',
160+
EXTERNAL_VOLUME = 'polaris_volume',
161+
BASE_LOCATION = 'test/test_namespace';
162+
```
163+
164+
Insert and query data:
165+
166+
```sql
167+
INSERT INTO polaris_iceberg_table(c1) VALUES ('test'), ('polaris'), ('iceberg');
168+
169+
SELECT * FROM polaris_iceberg_table;
170+
```
171+
172+
The output should be:
173+
174+
```sql
175+
+----------+
176+
| c1 |
177+
|----------|
178+
| iceberg |
179+
| foobar |
180+
| test |
181+
+----------+
182+
```
183+
184+
All data will be persisted under:
185+
186+
```bash
187+
awslocal s3 ls s3://$BUCKET_NAME/test/test_namespace/
188+
```
189+
190+
You will see:
191+
192+
- `data/` with `.parquet` files
193+
- `metadata/` with Iceberg metadata files
194+
195+
## Configuration options
196+
197+
The following configuration options are available for the Polaris Catalog Docker image provided by LocalStack:
198+
199+
| Environment Variable | Description | Default Value | Required |
200+
|---------------------|-------------|---------------|----------|
201+
| `AWS_REGION` | The AWS region to use | `us-east-1` | Yes |
202+
| `AWS_ACCESS_KEY_ID` | AWS access key ID for accessing AWS services | - | Yes when using AWS services |
203+
| `AWS_SECRET_ACCESS_KEY` | AWS secret access key for accessing AWS services | - | Yes when using AWS services |
204+
| `AWS_ENDPOINT_URL` | Custom endpoint URL for AWS services (e.g., for LocalStack) | - | No |
205+
| `POLARIS_BOOTSTRAP_CREDENTIALS` | Initial realm, username, and password in format: `realm,username,password` | - | Yes |
206+
| `polaris.realm-context.realms` | List of realms to create/use | - | Yes |
207+
| `quarkus.otel.sdk.disabled` | Disable OpenTelemetry SDK | `false` | No |
208+
209+
The following logging options are available for the Polaris Catalog Docker image:
210+
211+
| Logging Option | Description |
212+
|----------------|-------------|
213+
| `quarkus.log.level` | Sets the overall logging level (e.g., DEBUG) |
214+
| `quarkus.log.console.level` | Sets the console logging level (e.g., DEBUG) |
215+
| `quarkus.log.category."org.apache.polaris".level` | Sets the logging level specifically for the Polaris components |
216+
| `quarkus.log.category."org.apache.polaris".min-level` | Sets the minimum logging level for the Polaris components (e.g., TRACE) |

0 commit comments

Comments
 (0)