You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/api.md
+7-15Lines changed: 7 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,5 @@
1
1
---
2
2
title: "Java API"
3
-
url: api
4
-
aliases:
5
-
- "java/api"
6
-
menu:
7
-
main:
8
-
parent: "API"
9
-
identifier: java_api
10
-
weight: 200
11
3
---
12
4
<!--
13
5
- Licensed to the Apache Software Foundation (ASF) under one or more
@@ -36,11 +28,11 @@ Table metadata and operations are accessed through the `Table` interface. This i
36
28
37
29
### Table metadata
38
30
39
-
The [`Table` interface](../../../javadoc/{{% icebergVersion %}}/index.html?org/apache/iceberg/Table.html) provides access to the table metadata:
31
+
The [`Table` interface](../../javadoc/{{ icebergVersion }}/index.html?org/apache/iceberg/Table.html) provides access to the table metadata:
40
32
41
-
*`schema` returns the current table [schema](../schemas)
33
+
*`schema` returns the current table [schema](schemas.md)
42
34
*`spec` returns the current table partition spec
43
-
*`properties` returns a map of key-value [properties](../configuration)
35
+
*`properties` returns a map of key-value [properties](configuration.md)
44
36
*`currentSnapshot` returns the current table snapshot
45
37
*`snapshots` returns all valid snapshots for the table
46
38
*`snapshot(id)` returns a specific snapshot by ID
@@ -108,7 +100,7 @@ where `Record` is Iceberg record for iceberg-data module `org.apache.iceberg.dat
108
100
109
101
### Update operations
110
102
111
-
`Table` also exposes operations that update the table. These operations use a builder pattern, [`PendingUpdate`](../../../javadoc/{{% icebergVersion %}}/index.html?org/apache/iceberg/PendingUpdate.html), that commits when `PendingUpdate#commit` is called.
103
+
`Table` also exposes operations that update the table. These operations use a builder pattern, [`PendingUpdate`](../../javadoc/{{ icebergVersion }}/index.html?org/apache/iceberg/PendingUpdate.html), that commits when `PendingUpdate#commit` is called.
112
104
113
105
For example, updating the table schema is done by calling `updateSchema`, adding updates to the builder, and finally calling `commit` to commit the pending changes to the table:
114
106
@@ -150,7 +142,7 @@ t.commitTransaction();
150
142
151
143
## Types
152
144
153
-
Iceberg data types are located in the [`org.apache.iceberg.types` package](../../../javadoc/{{% icebergVersion %}}/index.html?org/apache/iceberg/types/package-summary.html).
145
+
Iceberg data types are located in the [`org.apache.iceberg.types` package](../../javadoc/{{ icebergVersion }}/index.html?org/apache/iceberg/types/package-summary.html).
Structs, maps, and lists are created using factory methods in type classes.
168
160
169
-
Like struct fields, map keys or values and list elements are tracked as nested fields. Nested fields track [field IDs](../evolution#correctness) and nullability.
161
+
Like struct fields, map keys or values and list elements are tracked as nested fields. Nested fields track [field IDs](evolution.md#correctness) and nullability.
170
162
171
163
Struct fields are created using `NestedField.optional` or `NestedField.required`. Map value and list element nullability is set in the map and list factory methods.
172
164
@@ -193,7 +185,7 @@ ListType list = ListType.ofRequired(1, IntegerType.get());
193
185
194
186
## Expressions
195
187
196
-
Iceberg's expressions are used to configure table scans. To create expressions, use the factory methods in [`Expressions`](../../../javadoc/{{% icebergVersion %}}/index.html?org/apache/iceberg/expressions/Expressions.html).
188
+
Iceberg's expressions are used to configure table scans. To create expressions, use the factory methods in [`Expressions`](../../javadoc/{{ icebergVersion }}/index.html?org/apache/iceberg/expressions/Expressions.html).
@@ -142,7 +138,7 @@ an Iceberg table is stored as a [Glue Table](https://docs.aws.amazon.com/glue/la
142
138
and every Iceberg table version is stored as a [Glue TableVersion](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-tables.html#aws-glue-api-catalog-tables-TableVersion).
143
139
You can start using Glue catalog by specifying the `catalog-impl` as `org.apache.iceberg.aws.glue.GlueCatalog`,
144
140
just like what is shown in the [enabling AWS integration](#enabling-aws-integration) section above.
145
-
More details about loading the catalog can be found in individual engine pages, such as [Spark](../spark-configuration/#loading-a-custom-catalog) and [Flink](../flink/#creating-catalogs-and-using-catalogs).
141
+
More details about loading the catalog can be found in individual engine pages, such as [Spark](spark-configuration.md#loading-a-custom-catalog) and [Flink](flink.md#creating-catalogs-and-using-catalogs).
146
142
147
143
#### Glue Catalog ID
148
144
@@ -181,17 +177,17 @@ If there is no commit conflict, the operation will be retried.
181
177
Optimistic locking guarantees atomic transaction of Iceberg tables in Glue.
182
178
It also prevents others from accidentally overwriting your changes.
183
179
184
-
{{< hint info >}}
185
-
Please use AWS SDK version >= 2.17.131 to leverage Glue's Optimistic Locking.
186
-
If the AWS SDK version is below 2.17.131, only in-memory lock is used. To ensure atomic transaction, you need to set up a [DynamoDb Lock Manager](#dynamodb-lock-manager).
187
-
{{< /hint >}}
180
+
!!! info
181
+
Please use AWS SDK version >= 2.17.131 to leverage Glue's Optimistic Locking.
182
+
If the AWS SDK version is below 2.17.131, only in-memory lock is used. To ensure atomic transaction, you need to set up a [DynamoDb Lock Manager](#dynamodb-lock-manager).
183
+
188
184
189
185
#### Warehouse Location
190
186
191
187
Similar to all other catalog implementations, `warehouse` is a required catalog property to determine the root path of the data warehouse in storage.
192
188
By default, Glue only allows a warehouse location in S3 because of the use of `S3FileIO`.
193
189
To store data in a different local or cloud store, Glue catalog can switch to use `HadoopFileIO` or any custom FileIO by setting the `io-impl` catalog property.
194
-
Details about this feature can be found in the [custom FileIO](../custom-catalog/#custom-file-io-implementation) section.
190
+
Details about this feature can be found in the [custom FileIO](custom-catalog.md#custom-file-io-implementation) section.
195
191
196
192
#### Table Location
197
193
@@ -267,7 +263,7 @@ This design has the following benefits:
267
263
268
264
Iceberg also supports the JDBC catalog which uses a table in a relational database to manage Iceberg tables.
269
265
You can configure to use the JDBC catalog with relational database services like [AWS RDS](https://aws.amazon.com/rds).
270
-
Read [the JDBC integration page](../jdbc/#jdbc-catalog) for guides and examples about using the JDBC catalog.
266
+
Read [the JDBC integration page](jdbc.md#jdbc-catalog) for guides and examples about using the JDBC catalog.
271
267
Read [this AWS documentation](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAMDBAuth.Connecting.Java.html) for more details about configuring the JDBC catalog with IAM authentication.
272
268
273
269
### Which catalog to choose?
@@ -293,7 +289,7 @@ This feature requires the following lock related catalog properties:
293
289
2. Set `lock.table` as the DynamoDB table name you would like to use. If the lock table with the given name does not exist in DynamoDB, a new table is created with billing mode set as [pay-per-request](https://aws.amazon.com/blogs/aws/amazon-dynamodb-on-demand-no-capacity-planning-and-pay-per-request-pricing).
294
290
295
291
Other lock related catalog properties can also be used to adjust locking behaviors such as heartbeat interval.
296
-
For more details, please refer to [Lock catalog properties](../configuration/#lock-catalog-properties).
292
+
For more details, please refer to [Lock catalog properties](configuration.md#lock-catalog-properties).
297
293
298
294
299
295
## S3 FileIO
@@ -347,7 +343,7 @@ Iceberg by default uses the Hive storage layout but can be switched to use the `
347
343
With `ObjectStoreLocationProvider`, a deterministic hash is generated for each stored file, with the hash appended
348
344
directly after the `write.data.path`. This ensures files written to s3 are equally distributed across multiple [prefixes](https://aws.amazon.com/premiumsupport/knowledge-center/s3-object-key-naming-pattern/) in the S3 bucket. Resulting in minimized throttling and maximized throughput for S3-related IO operations. When using `ObjectStoreLocationProvider` having a shared and short `write.data.path` across your Iceberg tables will improve performance.
349
345
350
-
For more information on how S3 scales API QPS, check out the 2018 re:Invent session on [Best Practices for Amazon S3 and Amazon S3 Glacier](https://youtu.be/rHeTn9pHNKo?t=3219). At [53:39](https://youtu.be/rHeTn9pHNKo?t=3219) it covers how S3 scales/partitions & at [54:50](https://youtu.be/rHeTn9pHNKo?t=3290) it discusses the 30-60 minute wait time before new partitions are created.
346
+
For more information on how S3 scales API QPS, check out the 2018 re:Invent session on [Best Practices for Amazon S3 and Amazon S3 Glacier](https://youtu.be/rHeTn9pHNKo?t=3219). At [53:39](https://youtu.be/rHeTn9pHNKo?t=3219) it covers how S3 scales/partitions & at [54:50](https://youtu.be/rHeTn9pHNKo?t=3290) it discusses the 30-60 minute wait time before new partitions are created.
351
347
352
348
To use the `ObjectStorageLocationProvider` add `'write.object-storage.enabled'=true` in the table's properties.
353
349
Below is an example Spark SQL command to create a table using the `ObjectStorageLocationProvider`:
@@ -378,7 +374,7 @@ However, for the older versions up to 0.12.0, the logic is as follows:
378
374
- before 0.12.0, `write.object-storage.path` must be set.
379
375
- at 0.12.0, `write.object-storage.path` then `write.folder-storage.path` then `<tableLocation>/data`.
380
376
381
-
For more details, please refer to the [LocationProvider Configuration](../custom-catalog/#custom-location-provider-implementation) section.
377
+
For more details, please refer to the [LocationProvider Configuration](custom-catalog.md#custom-location-provider-implementation) section.
382
378
383
379
### S3 Strong Consistency
384
380
@@ -539,7 +535,7 @@ The Glue, S3 and DynamoDB clients are then initialized with the assume-role cred
539
535
Here is an example to start Spark shell with this client factory:
@@ -618,13 +614,14 @@ For versions before 6.5.0, you can use a [bootstrap action](https://docs.aws.ama
618
614
```sh
619
615
#!/bin/bash
620
616
621
-
ICEBERG_VERSION={{% icebergVersion %}}
617
+
ICEBERG_VERSION={{ icebergVersion }}
622
618
MAVEN_URL=https://repo1.maven.org/maven2
623
619
ICEBERG_MAVEN_URL=$MAVEN_URL/org/apache/iceberg
624
620
# NOTE: this is just an example shared class path between Spark and Flink,
625
621
# please choose a proper class path for production.
626
622
LIB_PATH=/usr/share/aws/aws-java-sdk/
627
623
624
+
628
625
ICEBERG_PACKAGES=(
629
626
"iceberg-spark-runtime-3.3_2.12"
630
627
"iceberg-flink-runtime"
@@ -655,7 +652,7 @@ More details could be found [here](https://docs.aws.amazon.com/glue/latest/dg/aw
655
652
### AWS EKS
656
653
657
654
[AWS Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/) can be used to start any Spark, Flink, Hive, Presto or Trino clusters to work with Iceberg.
658
-
Search the [Iceberg blogs](../../../blogs) page for tutorials around running Iceberg with Docker and Kubernetes.
655
+
Search the [Iceberg blogs](../../blogs.md) page for tutorials around running Iceberg with Docker and Kubernetes.
Copy file name to clipboardExpand all lines: docs/docs/branching.md
+10-18Lines changed: 10 additions & 18 deletions
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,5 @@
1
1
---
2
2
title: "Branching and Tagging"
3
-
url: branching
4
-
aliases:
5
-
- "tables/branching"
6
-
menu:
7
-
main:
8
-
parent: Tables
9
-
identifier: tables_branching
10
-
weight: 0
11
3
---
12
4
13
5
<!--
@@ -33,14 +25,14 @@ menu:
33
25
34
26
Iceberg table metadata maintains a snapshot log, which represents the changes applied to a table.
35
27
Snapshots are fundamental in Iceberg as they are the basis for reader isolation and time travel queries.
36
-
For controlling metadata size and storage costs, Iceberg provides snapshot lifecycle management procedures such as [`expire_snapshots`](../spark-procedures/#expire-snapshots) for removing unused snapshots and no longer necessary data files based on table snapshot retention properties.
28
+
For controlling metadata size and storage costs, Iceberg provides snapshot lifecycle management procedures such as [`expire_snapshots`](spark-procedures.md#expire-snapshots) for removing unused snapshots and no longer necessary data files based on table snapshot retention properties.
37
29
38
30
**For more sophisticated snapshot lifecycle management, Iceberg supports branches and tags which are named references to snapshots with their own independent lifecycles. This lifecycle is controlled by branch and tag level retention policies.**
39
31
Branches are independent lineages of snapshots and point to the head of the lineage.
40
32
Branches and tags have a maximum reference age property which control when the reference to the snapshot itself should be expired.
41
33
Branches have retention properties which define the minimum number of snapshots to retain on a branch as well as the maximum age of individual snapshots to retain on the branch.
42
34
These properties are used when the expireSnapshots procedure is run.
43
-
For details on the algorithm for expireSnapshots, refer to the [spec](../../../spec#snapshot-retention-policy).
35
+
For details on the algorithm for expireSnapshots, refer to the [spec](../../spec.md#snapshot-retention-policy).
44
36
45
37
## Use Cases
46
38
@@ -52,7 +44,7 @@ See below for some examples of how branching and tagging can facilitate these us
52
44
53
45
Tags can be used for retaining important historical snapshots for auditing purposes.
Copy file name to clipboardExpand all lines: docs/docs/configuration.md
+3-11Lines changed: 3 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,5 @@
1
1
---
2
2
title: "Configuration"
3
-
url: configuration
4
-
aliases:
5
-
- "tables/configuration"
6
-
menu:
7
-
main:
8
-
parent: Tables
9
-
identifier: tables_configuration
10
-
weight: 0
11
3
---
12
4
<!--
13
5
- Licensed to the Apache Software Foundation (ASF) under one or more
@@ -144,8 +136,8 @@ Iceberg catalogs support using catalog properties to configure catalog behaviors
144
136
`HadoopCatalog` and `HiveCatalog` can access the properties in their constructors.
145
137
Any other custom catalog can access the properties by implementing `Catalog.initialize(catalogName, catalogProperties)`.
146
138
The properties can be manually constructed or passed in from a compute engine like Spark or Flink.
147
-
Spark uses its session properties as catalog properties, see more details in the [Spark configuration](../spark-configuration#catalog-configuration) section.
148
-
Flink passes in catalog properties through `CREATE CATALOG` statement, see more details in the [Flink](../flink/#creating-catalogs-and-using-catalogs) section.
139
+
Spark uses its session properties as catalog properties, see more details in the [Spark configuration](spark-configuration.md#catalog-configuration) section.
140
+
Flink passes in catalog properties through `CREATE CATALOG` statement, see more details in the [Flink](flink.md#adding-catalogs) section.
149
141
150
142
### Lock catalog properties
151
143
@@ -154,7 +146,7 @@ Here are the catalog properties related to locking. They are used by some catalo
0 commit comments