Skip to content

Commit b189f48

Browse files
committed
repack: add --path-walk option
Since 'git pack-objects' supports a --path-walk option, allow passing it through in 'git repack'. This presents interesting testing opportunities for comparing the different repacking strategies against each other. In my copy of the Git repository, the new tests in p5313 show these results: Test this tree ------------------------------------------------------------- 5313.10: repack 27.88(150.23+2.70) 5313.11: repack size 228.2M 5313.12: repack with --path-walk 134.59(148.77+0.81) 5313.13: repack size with --path-walk 209.7M Note that the 'git pack-objects --path-walk' feature is not integrated with threads. Look forward to a future change that will introduce threading to improve the time performance of this feature with equivalent space performance. For the microsoft/fluentui repo [1] had some interesting aspects for the previous tests in p5313, so here are the repack results: Test this tree ------------------------------------------------------------- 5313.10: repack 91.76(680.94+2.48) 5313.11: repack size 439.1M 5313.12: repack with --path-walk 110.35(130.46+0.74) 5313.13: repack size with --path-walk 155.3M [1] https://github.com/microsoft/fluentui Here, we see the significant improvement of a full repack using this strategy. The name-hash collisions in this repo cause the space problems. Those collisions also cause the repack command to spend a lot of cycles trying to find delta bases among files that are not actually very similar, so the lack of threading with the --path-walk feature is less pronounced in the process time. For the Linux kernel repository, we have these stats: Test this tree --------------------------------------------------------------- 5313.10: repack 553.61(1929.41+30.31) 5313.11: repack size 2.5G 5313.12: repack with --path-walk 1777.63(2044.16+7.47) 5313.13: repack size with --path-walk 2.5G This demonstrates that the --path-walk feature does not always present measurable improvements, especially in cases where the name-hash has very few collisions. Signed-off-by: Derrick Stolee <[email protected]>
1 parent 8cd7719 commit b189f48

File tree

3 files changed

+42
-2
lines changed

3 files changed

+42
-2
lines changed

Documentation/git-repack.txt

+16-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@ git-repack - Pack unpacked objects in a repository
99
SYNOPSIS
1010
--------
1111
[verse]
12-
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m] [--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>] [--write-midx]
12+
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]
13+
[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]
14+
[--write-midx] [--path-walk]
1315

1416
DESCRIPTION
1517
-----------
@@ -249,6 +251,19 @@ linkgit:git-multi-pack-index[1]).
249251
Write a multi-pack index (see linkgit:git-multi-pack-index[1])
250252
containing the non-redundant packs.
251253

254+
--path-walk::
255+
This option passes the `--path-walk` option to the underlying
256+
`git pack-options` process (see linkgit:git-pack-objects[1]).
257+
By default, `git pack-objects` walks objects in an order that
258+
presents trees and blobs in an order unrelated to the path they
259+
appear relative to a commit's root tree. The `--path-walk` option
260+
enables a different walking algorithm that organizes trees and
261+
blobs by path. This has the potential to improve delta compression
262+
especially in the presence of filenames that cause collisions in
263+
Git's default name-hash algorithm. Due to changing how the objects
264+
are walked, this option is not compatible with `--delta-islands`
265+
or `--filter`.
266+
252267
CONFIGURATION
253268
-------------
254269

builtin/repack.c

+8-1
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,9 @@ static int run_update_server_info = 1;
3939
static char *packdir, *packtmp_name, *packtmp;
4040

4141
static const char *const git_repack_usage[] = {
42-
N_("git repack [<options>]"),
42+
N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n"
43+
"[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]\n"
44+
"[--write-midx] [--full-path-walk]"),
4345
NULL
4446
};
4547

@@ -58,6 +60,7 @@ struct pack_objects_args {
5860
int no_reuse_object;
5961
int quiet;
6062
int local;
63+
int path_walk;
6164
struct list_objects_filter_options filter_options;
6265
};
6366

@@ -289,6 +292,8 @@ static void prepare_pack_objects(struct child_process *cmd,
289292
strvec_pushf(&cmd->args, "--no-reuse-delta");
290293
if (args->no_reuse_object)
291294
strvec_pushf(&cmd->args, "--no-reuse-object");
295+
if (args->path_walk)
296+
strvec_pushf(&cmd->args, "--path-walk");
292297
if (args->local)
293298
strvec_push(&cmd->args, "--local");
294299
if (args->quiet)
@@ -1182,6 +1187,8 @@ int cmd_repack(int argc,
11821187
N_("pass --no-reuse-delta to git-pack-objects")),
11831188
OPT_BOOL('F', NULL, &po_args.no_reuse_object,
11841189
N_("pass --no-reuse-object to git-pack-objects")),
1190+
OPT_BOOL(0, "path-walk", &po_args.path_walk,
1191+
N_("pass --path-walk to git-pack-objects")),
11851192
OPT_NEGBIT('n', NULL, &run_update_server_info,
11861193
N_("do not run git-update-server-info"), 1),
11871194
OPT__QUIET(&po_args.quiet, N_("be quiet")),

t/perf/p5313-pack-objects.sh

+18
Original file line numberDiff line numberDiff line change
@@ -56,4 +56,22 @@ test_size 'big pack size with --path-walk' '
5656
test_file_size out
5757
'
5858

59+
test_perf 'repack' '
60+
git repack -adf
61+
'
62+
63+
test_size 'repack size' '
64+
pack=$(ls .git/objects/pack/pack-*.pack) &&
65+
test_file_size "$pack"
66+
'
67+
68+
test_perf 'repack with --path-walk' '
69+
git repack -adf --path-walk
70+
'
71+
72+
test_size 'repack size with --path-walk' '
73+
pack=$(ls .git/objects/pack/pack-*.pack) &&
74+
test_file_size "$pack"
75+
'
76+
5977
test_done

0 commit comments

Comments
 (0)