Skip to content

Commit ed28898

Browse files
Convert Hugo versioned docs to mkdocs format (#9591)
1 parent 6bbf70a commit ed28898

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+432
-638
lines changed

docs/java-api.md renamed to docs/docs/api.md

Lines changed: 7 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,5 @@
11
---
22
title: "Java API"
3-
url: api
4-
aliases:
5-
- "java/api"
6-
menu:
7-
main:
8-
parent: "API"
9-
identifier: java_api
10-
weight: 200
113
---
124
<!--
135
- Licensed to the Apache Software Foundation (ASF) under one or more
@@ -36,11 +28,11 @@ Table metadata and operations are accessed through the `Table` interface. This i
3628

3729
### Table metadata
3830

39-
The [`Table` interface](../../../javadoc/{{% icebergVersion %}}/index.html?org/apache/iceberg/Table.html) provides access to the table metadata:
31+
The [`Table` interface](../../javadoc/{{ icebergVersion }}/index.html?org/apache/iceberg/Table.html) provides access to the table metadata:
4032

41-
* `schema` returns the current table [schema](../schemas)
33+
* `schema` returns the current table [schema](schemas.md)
4234
* `spec` returns the current table partition spec
43-
* `properties` returns a map of key-value [properties](../configuration)
35+
* `properties` returns a map of key-value [properties](configuration.md)
4436
* `currentSnapshot` returns the current table snapshot
4537
* `snapshots` returns all valid snapshots for the table
4638
* `snapshot(id)` returns a specific snapshot by ID
@@ -108,7 +100,7 @@ where `Record` is Iceberg record for iceberg-data module `org.apache.iceberg.dat
108100

109101
### Update operations
110102

111-
`Table` also exposes operations that update the table. These operations use a builder pattern, [`PendingUpdate`](../../../javadoc/{{% icebergVersion %}}/index.html?org/apache/iceberg/PendingUpdate.html), that commits when `PendingUpdate#commit` is called.
103+
`Table` also exposes operations that update the table. These operations use a builder pattern, [`PendingUpdate`](../../javadoc/{{ icebergVersion }}/index.html?org/apache/iceberg/PendingUpdate.html), that commits when `PendingUpdate#commit` is called.
112104

113105
For example, updating the table schema is done by calling `updateSchema`, adding updates to the builder, and finally calling `commit` to commit the pending changes to the table:
114106

@@ -150,7 +142,7 @@ t.commitTransaction();
150142

151143
## Types
152144

153-
Iceberg data types are located in the [`org.apache.iceberg.types` package](../../../javadoc/{{% icebergVersion %}}/index.html?org/apache/iceberg/types/package-summary.html).
145+
Iceberg data types are located in the [`org.apache.iceberg.types` package](../../javadoc/{{ icebergVersion }}/index.html?org/apache/iceberg/types/package-summary.html).
154146

155147
### Primitives
156148

@@ -166,7 +158,7 @@ Types.DecimalType.of(9, 2) // decimal(9, 2)
166158

167159
Structs, maps, and lists are created using factory methods in type classes.
168160

169-
Like struct fields, map keys or values and list elements are tracked as nested fields. Nested fields track [field IDs](../evolution#correctness) and nullability.
161+
Like struct fields, map keys or values and list elements are tracked as nested fields. Nested fields track [field IDs](evolution.md#correctness) and nullability.
170162

171163
Struct fields are created using `NestedField.optional` or `NestedField.required`. Map value and list element nullability is set in the map and list factory methods.
172164

@@ -193,7 +185,7 @@ ListType list = ListType.ofRequired(1, IntegerType.get());
193185

194186
## Expressions
195187

196-
Iceberg's expressions are used to configure table scans. To create expressions, use the factory methods in [`Expressions`](../../../javadoc/{{% icebergVersion %}}/index.html?org/apache/iceberg/expressions/Expressions.html).
188+
Iceberg's expressions are used to configure table scans. To create expressions, use the factory methods in [`Expressions`](../../javadoc/{{ icebergVersion }}/index.html?org/apache/iceberg/expressions/Expressions.html).
197189

198190
Supported predicate expressions are:
199191

38.1 KB
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading

docs/aws.md renamed to docs/docs/aws.md

Lines changed: 18 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,5 @@
11
---
22
title: "AWS"
3-
url: aws
4-
menu:
5-
main:
6-
parent: Integrations
7-
identifier: aws_integration
8-
weight: 0
93
---
104
<!--
115
- Licensed to the Apache Software Foundation (ASF) under one or more
@@ -53,7 +47,7 @@ For example, to use AWS features with Spark 3.4 (with scala 2.12) and AWS client
5347

5448
```sh
5549
# start Spark SQL client shell
56-
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:{{% icebergVersion %}},org.apache.iceberg:iceberg-aws-bundle:{{% icebergVersion %}} \
50+
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:{{ icebergVersion }},org.apache.iceberg:iceberg-aws-bundle:{{ icebergVersion }} \
5751
--conf spark.sql.defaultCatalog=my_catalog \
5852
--conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
5953
--conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \
@@ -69,10 +63,12 @@ To use AWS module with Flink, you can download the necessary dependencies and sp
6963

7064
```sh
7165
# download Iceberg dependency
72-
ICEBERG_VERSION={{% icebergVersion %}}
66+
ICEBERG_VERSION={{ icebergVersion }}
7367
MAVEN_URL=https://repo1.maven.org/maven2
7468
ICEBERG_MAVEN_URL=$MAVEN_URL/org/apache/iceberg
69+
7570
wget $ICEBERG_MAVEN_URL/iceberg-flink-runtime/$ICEBERG_VERSION/iceberg-flink-runtime-$ICEBERG_VERSION.jar
71+
7672
wget $ICEBERG_MAVEN_URL/iceberg-aws-bundle/$ICEBERG_VERSION/iceberg-aws-bundle-$ICEBERG_VERSION.jar
7773

7874
# start Flink SQL client shell
@@ -142,7 +138,7 @@ an Iceberg table is stored as a [Glue Table](https://docs.aws.amazon.com/glue/la
142138
and every Iceberg table version is stored as a [Glue TableVersion](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-tables.html#aws-glue-api-catalog-tables-TableVersion).
143139
You can start using Glue catalog by specifying the `catalog-impl` as `org.apache.iceberg.aws.glue.GlueCatalog`,
144140
just like what is shown in the [enabling AWS integration](#enabling-aws-integration) section above.
145-
More details about loading the catalog can be found in individual engine pages, such as [Spark](../spark-configuration/#loading-a-custom-catalog) and [Flink](../flink/#creating-catalogs-and-using-catalogs).
141+
More details about loading the catalog can be found in individual engine pages, such as [Spark](spark-configuration.md#loading-a-custom-catalog) and [Flink](flink.md#creating-catalogs-and-using-catalogs).
146142

147143
#### Glue Catalog ID
148144

@@ -181,17 +177,17 @@ If there is no commit conflict, the operation will be retried.
181177
Optimistic locking guarantees atomic transaction of Iceberg tables in Glue.
182178
It also prevents others from accidentally overwriting your changes.
183179

184-
{{< hint info >}}
185-
Please use AWS SDK version >= 2.17.131 to leverage Glue's Optimistic Locking.
186-
If the AWS SDK version is below 2.17.131, only in-memory lock is used. To ensure atomic transaction, you need to set up a [DynamoDb Lock Manager](#dynamodb-lock-manager).
187-
{{< /hint >}}
180+
!!! info
181+
Please use AWS SDK version >= 2.17.131 to leverage Glue's Optimistic Locking.
182+
If the AWS SDK version is below 2.17.131, only in-memory lock is used. To ensure atomic transaction, you need to set up a [DynamoDb Lock Manager](#dynamodb-lock-manager).
183+
188184

189185
#### Warehouse Location
190186

191187
Similar to all other catalog implementations, `warehouse` is a required catalog property to determine the root path of the data warehouse in storage.
192188
By default, Glue only allows a warehouse location in S3 because of the use of `S3FileIO`.
193189
To store data in a different local or cloud store, Glue catalog can switch to use `HadoopFileIO` or any custom FileIO by setting the `io-impl` catalog property.
194-
Details about this feature can be found in the [custom FileIO](../custom-catalog/#custom-file-io-implementation) section.
190+
Details about this feature can be found in the [custom FileIO](custom-catalog.md#custom-file-io-implementation) section.
195191

196192
#### Table Location
197193

@@ -267,7 +263,7 @@ This design has the following benefits:
267263

268264
Iceberg also supports the JDBC catalog which uses a table in a relational database to manage Iceberg tables.
269265
You can configure to use the JDBC catalog with relational database services like [AWS RDS](https://aws.amazon.com/rds).
270-
Read [the JDBC integration page](../jdbc/#jdbc-catalog) for guides and examples about using the JDBC catalog.
266+
Read [the JDBC integration page](jdbc.md#jdbc-catalog) for guides and examples about using the JDBC catalog.
271267
Read [this AWS documentation](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAMDBAuth.Connecting.Java.html) for more details about configuring the JDBC catalog with IAM authentication.
272268

273269
### Which catalog to choose?
@@ -293,7 +289,7 @@ This feature requires the following lock related catalog properties:
293289
2. Set `lock.table` as the DynamoDB table name you would like to use. If the lock table with the given name does not exist in DynamoDB, a new table is created with billing mode set as [pay-per-request](https://aws.amazon.com/blogs/aws/amazon-dynamodb-on-demand-no-capacity-planning-and-pay-per-request-pricing).
294290

295291
Other lock related catalog properties can also be used to adjust locking behaviors such as heartbeat interval.
296-
For more details, please refer to [Lock catalog properties](../configuration/#lock-catalog-properties).
292+
For more details, please refer to [Lock catalog properties](configuration.md#lock-catalog-properties).
297293

298294

299295
## S3 FileIO
@@ -347,7 +343,7 @@ Iceberg by default uses the Hive storage layout but can be switched to use the `
347343
With `ObjectStoreLocationProvider`, a deterministic hash is generated for each stored file, with the hash appended
348344
directly after the `write.data.path`. This ensures files written to s3 are equally distributed across multiple [prefixes](https://aws.amazon.com/premiumsupport/knowledge-center/s3-object-key-naming-pattern/) in the S3 bucket. Resulting in minimized throttling and maximized throughput for S3-related IO operations. When using `ObjectStoreLocationProvider` having a shared and short `write.data.path` across your Iceberg tables will improve performance.
349345

350-
For more information on how S3 scales API QPS, check out the 2018 re:Invent session on [Best Practices for Amazon S3 and Amazon S3 Glacier]( https://youtu.be/rHeTn9pHNKo?t=3219). At [53:39](https://youtu.be/rHeTn9pHNKo?t=3219) it covers how S3 scales/partitions & at [54:50](https://youtu.be/rHeTn9pHNKo?t=3290) it discusses the 30-60 minute wait time before new partitions are created.
346+
For more information on how S3 scales API QPS, check out the 2018 re:Invent session on [Best Practices for Amazon S3 and Amazon S3 Glacier](https://youtu.be/rHeTn9pHNKo?t=3219). At [53:39](https://youtu.be/rHeTn9pHNKo?t=3219) it covers how S3 scales/partitions & at [54:50](https://youtu.be/rHeTn9pHNKo?t=3290) it discusses the 30-60 minute wait time before new partitions are created.
351347

352348
To use the `ObjectStorageLocationProvider` add `'write.object-storage.enabled'=true` in the table's properties.
353349
Below is an example Spark SQL command to create a table using the `ObjectStorageLocationProvider`:
@@ -378,7 +374,7 @@ However, for the older versions up to 0.12.0, the logic is as follows:
378374
- before 0.12.0, `write.object-storage.path` must be set.
379375
- at 0.12.0, `write.object-storage.path` then `write.folder-storage.path` then `<tableLocation>/data`.
380376

381-
For more details, please refer to the [LocationProvider Configuration](../custom-catalog/#custom-location-provider-implementation) section.
377+
For more details, please refer to the [LocationProvider Configuration](custom-catalog.md#custom-location-provider-implementation) section.
382378

383379
### S3 Strong Consistency
384380

@@ -539,7 +535,7 @@ The Glue, S3 and DynamoDB clients are then initialized with the assume-role cred
539535
Here is an example to start Spark shell with this client factory:
540536

541537
```shell
542-
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:{{% icebergVersion %}},org.apache.iceberg:iceberg-aws-bundle:{{% icebergVersion %}} \
538+
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:{{ icebergVersion }},org.apache.iceberg:iceberg-aws-bundle:{{ icebergVersion }} \
543539
--conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
544540
--conf spark.sql.catalog.my_catalog.warehouse=s3://my-bucket/my/key/prefix \
545541
--conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
@@ -618,13 +614,14 @@ For versions before 6.5.0, you can use a [bootstrap action](https://docs.aws.ama
618614
```sh
619615
#!/bin/bash
620616

621-
ICEBERG_VERSION={{% icebergVersion %}}
617+
ICEBERG_VERSION={{ icebergVersion }}
622618
MAVEN_URL=https://repo1.maven.org/maven2
623619
ICEBERG_MAVEN_URL=$MAVEN_URL/org/apache/iceberg
624620
# NOTE: this is just an example shared class path between Spark and Flink,
625621
# please choose a proper class path for production.
626622
LIB_PATH=/usr/share/aws/aws-java-sdk/
627623

624+
628625
ICEBERG_PACKAGES=(
629626
"iceberg-spark-runtime-3.3_2.12"
630627
"iceberg-flink-runtime"
@@ -655,7 +652,7 @@ More details could be found [here](https://docs.aws.amazon.com/glue/latest/dg/aw
655652
### AWS EKS
656653

657654
[AWS Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/) can be used to start any Spark, Flink, Hive, Presto or Trino clusters to work with Iceberg.
658-
Search the [Iceberg blogs](../../../blogs) page for tutorials around running Iceberg with Docker and Kubernetes.
655+
Search the [Iceberg blogs](../../blogs.md) page for tutorials around running Iceberg with Docker and Kubernetes.
659656

660657
### Amazon Kinesis
661658

docs/branching-and-tagging.md renamed to docs/docs/branching.md

Lines changed: 10 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,5 @@
11
---
22
title: "Branching and Tagging"
3-
url: branching
4-
aliases:
5-
- "tables/branching"
6-
menu:
7-
main:
8-
parent: Tables
9-
identifier: tables_branching
10-
weight: 0
113
---
124

135
<!--
@@ -33,14 +25,14 @@ menu:
3325

3426
Iceberg table metadata maintains a snapshot log, which represents the changes applied to a table.
3527
Snapshots are fundamental in Iceberg as they are the basis for reader isolation and time travel queries.
36-
For controlling metadata size and storage costs, Iceberg provides snapshot lifecycle management procedures such as [`expire_snapshots`](../spark-procedures/#expire-snapshots) for removing unused snapshots and no longer necessary data files based on table snapshot retention properties.
28+
For controlling metadata size and storage costs, Iceberg provides snapshot lifecycle management procedures such as [`expire_snapshots`](spark-procedures.md#expire-snapshots) for removing unused snapshots and no longer necessary data files based on table snapshot retention properties.
3729

3830
**For more sophisticated snapshot lifecycle management, Iceberg supports branches and tags which are named references to snapshots with their own independent lifecycles. This lifecycle is controlled by branch and tag level retention policies.**
3931
Branches are independent lineages of snapshots and point to the head of the lineage.
4032
Branches and tags have a maximum reference age property which control when the reference to the snapshot itself should be expired.
4133
Branches have retention properties which define the minimum number of snapshots to retain on a branch as well as the maximum age of individual snapshots to retain on the branch.
4234
These properties are used when the expireSnapshots procedure is run.
43-
For details on the algorithm for expireSnapshots, refer to the [spec](../../../spec#snapshot-retention-policy).
35+
For details on the algorithm for expireSnapshots, refer to the [spec](../../spec.md#snapshot-retention-policy).
4436

4537
## Use Cases
4638

@@ -52,7 +44,7 @@ See below for some examples of how branching and tagging can facilitate these us
5244

5345
Tags can be used for retaining important historical snapshots for auditing purposes.
5446

55-
![Historical Tags](../img/historical-snapshot-tag.png)
47+
![Historical Tags](assets/images/historical-snapshot-tag.png)
5648

5749
The above diagram demonstrates retaining important historical snapshot with the following retention policy, defined
5850
via Spark SQL.
@@ -84,7 +76,7 @@ ALTER TABLE prod.db.table CREATE BRANCH `test-branch` RETAIN 7 DAYS WITH SNAPSHO
8476

8577
### Audit Branch
8678

87-
![Audit Branch](../img/audit-branch.png)
79+
![Audit Branch](assets/images/audit-branch.png)
8880

8981
The above diagram shows an example of using an audit branch for validating a write workflow.
9082

@@ -115,9 +107,9 @@ CALL catalog_name.system.fast_forward('prod.db.table', 'main', 'audit-branch');
115107

116108
Creating, querying and writing to branches and tags are supported in the Iceberg Java library, and in Spark and Flink engine integrations.
117109

118-
- [Iceberg Java Library](../java-api-quickstart/#branching-and-tagging)
119-
- [Spark DDLs](../spark-ddl/#branching-and-tagging-ddl)
120-
- [Spark Reads](../spark-queries/#time-travel)
121-
- [Spark Branch Writes](../spark-writes/#writing-to-branches)
122-
- [Flink Reads](../flink-queries/#reading-branches-and-tags-with-SQL)
123-
- [Flink Branch Writes](../flink-writes/#branch-writes)
110+
- [Iceberg Java Library](java-api-quickstart.md#branching-and-tagging)
111+
- [Spark DDLs](spark-ddl.md#branching-and-tagging-ddl)
112+
- [Spark Reads](spark-queries.md#time-travel)
113+
- [Spark Branch Writes](spark-writes.md#writing-to-branches)
114+
- [Flink Reads](flink-queries.md#reading-branches-and-tags-with-SQL)
115+
- [Flink Branch Writes](flink-writes.md#branch-writes)

docs/configuration.md renamed to docs/docs/configuration.md

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,5 @@
11
---
22
title: "Configuration"
3-
url: configuration
4-
aliases:
5-
- "tables/configuration"
6-
menu:
7-
main:
8-
parent: Tables
9-
identifier: tables_configuration
10-
weight: 0
113
---
124
<!--
135
- Licensed to the Apache Software Foundation (ASF) under one or more
@@ -144,8 +136,8 @@ Iceberg catalogs support using catalog properties to configure catalog behaviors
144136
`HadoopCatalog` and `HiveCatalog` can access the properties in their constructors.
145137
Any other custom catalog can access the properties by implementing `Catalog.initialize(catalogName, catalogProperties)`.
146138
The properties can be manually constructed or passed in from a compute engine like Spark or Flink.
147-
Spark uses its session properties as catalog properties, see more details in the [Spark configuration](../spark-configuration#catalog-configuration) section.
148-
Flink passes in catalog properties through `CREATE CATALOG` statement, see more details in the [Flink](../flink/#creating-catalogs-and-using-catalogs) section.
139+
Spark uses its session properties as catalog properties, see more details in the [Spark configuration](spark-configuration.md#catalog-configuration) section.
140+
Flink passes in catalog properties through `CREATE CATALOG` statement, see more details in the [Flink](flink.md#adding-catalogs) section.
149141

150142
### Lock catalog properties
151143

@@ -154,7 +146,7 @@ Here are the catalog properties related to locking. They are used by some catalo
154146
| Property | Default | Description |
155147
| --------------------------------- | ------------------ | ------------------------------------------------------ |
156148
| lock-impl | null | a custom implementation of the lock manager, the actual interface depends on the catalog used |
157-
| lock.table | null | an auxiliary table for locking, such as in [AWS DynamoDB lock manager](../aws/#dynamodb-for-commit-locking) |
149+
| lock.table | null | an auxiliary table for locking, such as in [AWS DynamoDB lock manager](aws.md#dynamodb-lock-manager) |
158150
| lock.acquire-interval-ms | 5000 (5 s) | the interval to wait between each attempt to acquire a lock |
159151
| lock.acquire-timeout-ms | 180000 (3 min) | the maximum time to try acquiring a lock |
160152
| lock.heartbeat-interval-ms | 3000 (3 s) | the interval to wait between each heartbeat after acquiring a lock |

0 commit comments

Comments
 (0)