Skip to content

Commit d6b12cc

Browse files
committed
fix_whitespace_formatting
1 parent 2512dfc commit d6b12cc

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

docs/docs/spark-procedures.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -407,6 +407,7 @@ Iceberg can compact data files in parallel using Spark with the `rewriteDataFile
407407
| `delete-ratio-threshold` | 0.3 | Minimum deletion ratio that needs to be associated with a data file for it to be considered for rewriting |
408408
| `output-spec-id` | current partition spec id | Identifier of the output partition spec. Data will be reorganized during the rewrite to align with the output partitioning. |
409409
| `remove-dangling-deletes` | false | Remove dangling position and equality deletes after rewriting. A delete file is considered dangling if it does not apply to any live data files. Enabling this will generate an additional commit for the removal. |
410+
| `max-files-to-rewrite` | null | This option sets an upper limit on the number of files eligible for rewrite operation. It can be useful for improving job stability, particularly when dealing with a large number of files. If this option is not specified, all files will be considered for rewriting |
410411

411412
!!! info
412413
Dangling delete files are removed based solely on data sequence numbers. This action does not apply to global
@@ -533,7 +534,6 @@ Dangling deletes are always filtered out during rewriting.
533534
| `min-input-files` | 5 | Any file group exceeding this number of files will be rewritten regardless of other criteria |
534535
| `rewrite-all` | false | Force rewriting of all provided files overriding other options |
535536
| `max-file-group-size-bytes` | 107374182400 (100GB) | Largest amount of data that should be rewritten in a single file group. The entire rewrite operation is broken down into pieces based on partitioning and within partitions based on size into file-groups. This helps with breaking down the rewriting of very large partitions which may not be rewritable otherwise due to the resource constraints of the cluster. |
536-
| `max-files-to-rewrite` | null | This option sets an upper limit on the number of files eligible for rewrite operation. It can be useful for improving job stability, particularly when dealing with a large number of files. If this option is not specified, all files will be considered for rewriting |
537537

538538
#### Output
539539

@@ -867,11 +867,11 @@ that provide additional information about the changes being tracked. These colum
867867
Here is an example of corresponding results. It shows that the first snapshot inserted 2 records, and the
868868
second snapshot deleted 1 record.
869869
870-
| id | name |_change_type | _change_ordinal | _commit_snapshot_id |
870+
| id | name |_change_type | _change_ordinal | _commit_snapshot_id |
871871
|---|--------|---|---|---|
872-
|1 | Alice |INSERT |0 |5390529835796506035|
873-
|2 | Bob |INSERT |0 |5390529835796506035|
874-
|1 | Alice |DELETE |1 |8764748981452218370|
872+
|1 | Alice |INSERT |0 |5390529835796506035|
873+
|2 | Bob |INSERT |0 |5390529835796506035|
874+
|1 | Alice |DELETE |1 |8764748981452218370|
875875
876876
#### Net Changes
877877
@@ -887,9 +887,9 @@ CALL spark_catalog.system.create_changelog_view(
887887
888888
With the net changes, the above changelog view only contains the following row since Alice was inserted in the first snapshot and deleted in the second snapshot.
889889
890-
| id | name |_change_type | _change_ordinal | _commit_snapshot_id |
890+
| id | name |_change_type | _change_ordinal | _commit_snapshot_id |
891891
|---|--------|---|---|---|
892-
|2 | Bob |INSERT |0 |5390529835796506035|
892+
|2 | Bob |INSERT |0 |5390529835796506035|
893893
894894
#### Carry-over Rows
895895
@@ -1056,4 +1056,4 @@ metadata files and data files to the target location.
10561056
Lastly, the [register_table](#register_table) procedure can be used to register the copied table in the target location with a catalog.
10571057
10581058
!!! warning
1059-
Iceberg tables with partition statistics files are not currently supported for path rewrite.
1059+
Iceberg tables with partition statistics files are not currently supported for path rewrite.

0 commit comments

Comments
 (0)