You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/spark-procedures.md
+8-8Lines changed: 8 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -407,6 +407,7 @@ Iceberg can compact data files in parallel using Spark with the `rewriteDataFile
407
407
|`delete-ratio-threshold`| 0.3 | Minimum deletion ratio that needs to be associated with a data file for it to be considered for rewriting |
408
408
|`output-spec-id`| current partition spec id | Identifier of the output partition spec. Data will be reorganized during the rewrite to align with the output partitioning. |
409
409
|`remove-dangling-deletes`| false | Remove dangling position and equality deletes after rewriting. A delete file is considered dangling if it does not apply to any live data files. Enabling this will generate an additional commit for the removal. |
410
+
|`max-files-to-rewrite`| null | This option sets an upper limit on the number of files eligible for rewrite operation. It can be useful for improving job stability, particularly when dealing with a large number of files. If this option is not specified, all files will be considered for rewriting |
410
411
411
412
!!! info
412
413
Dangling delete files are removed based solely on data sequence numbers. This action does not apply to global
@@ -533,7 +534,6 @@ Dangling deletes are always filtered out during rewriting.
533
534
|`min-input-files`| 5 | Any file group exceeding this number of files will be rewritten regardless of other criteria |
534
535
|`rewrite-all`| false | Force rewriting of all provided files overriding other options |
535
536
|`max-file-group-size-bytes`| 107374182400 (100GB) | Largest amount of data that should be rewritten in a single file group. The entire rewrite operation is broken down into pieces based on partitioning and within partitions based on size into file-groups. This helps with breaking down the rewriting of very large partitions which may not be rewritable otherwise due to the resource constraints of the cluster. |
536
-
|`max-files-to-rewrite`| null | This option sets an upper limit on the number of files eligible for rewrite operation. It can be useful for improving job stability, particularly when dealing with a large number of files. If this option is not specified, all files will be considered for rewriting |
537
537
538
538
#### Output
539
539
@@ -867,11 +867,11 @@ that provide additional information about the changes being tracked. These colum
867
867
Here is an example of corresponding results. It shows that the first snapshot inserted 2 records, and the
With the net changes, the above changelog view only contains the following row since Alice was inserted in the first snapshot and deleted in the second snapshot.
0 commit comments