Skip to content

Commit 98ce832

Browse files
ttaylorrgitster
authored andcommitted
midx: implement MIDX compaction
When managing a MIDX chain with many layers, it is convenient to combine a sequence of adjacent layers into a single layer to prevent the chain from growing too long. While it is conceptually possible to "compact" a sequence of MIDX layers together by running "git multi-pack-index write --stdin-packs", there are a few drawbacks that make this less than desirable: - Preserving the MIDX chain is impossible, since there is no way to write a MIDX layer that contains objects or packs found in an earlier MIDX layer already part of the chain. So callers would have to write an entirely new (non-incremental) MIDX containing only the compacted layers, discarding all other objects/packs from the MIDX. - There is (currently) no way to write a MIDX layer outside of the MIDX chain to work around the above, such that the MIDX chain could be reassembled substituting the compacted layers with the MIDX that was written. - The `--stdin-packs` command-line option does not allow us to specify the order of packs as they appear in the MIDX. Therefore, even if there were workarounds for the previous two challenges, any bitmaps belonging to layers which come after the compacted layer(s) would no longer be valid. This commit introduces a way to compact a sequence of adjacent MIDX layers into a single layer while preserving the MIDX chain, as well as any bitmap(s) in layers which are newer than the compacted ones. Implementing MIDX compaction does not require a significant number of changes to how MIDX layers are written. The main changes are as follows: - Instead of calling `fill_packs_from_midx()`, we call a new function `fill_packs_from_midx_range()`, which walks backwards along the portion of the MIDX chain which we are compacting, and adds packs one layer a time. In order to preserve the pseudo-pack order, the concatenated pack order is preserved, with the exception of preferred packs which are always added first. - After adding entries from the set of packs in the compaction range, `compute_sorted_entries()` must adjust the `pack_int_id`'s for all objects added in each fanout layer to match their original `pack_int_id`'s (as opposed to the index at which each pack appears in `ctx.info`). - When writing out the new 'multi-pack-index-chain' file, discard any layers in the compaction range, replacing them with the newly written layer, instead of keeping them and placing the new layer at the end of the chain. This ends up being sufficient to implement MIDX compaction in such a way that preserves bitmaps corresponding to more recent layers in the MIDX chain. The tests for MIDX compaction are so far fairly spartan, since the main interesting behavior here is ensuring that the right packs/objects are selected from each layer, and that the pack order is preserved despite whether or not they are sorted in lexicographic order in the original MIDX chain. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 parent 6b7d812 commit 98ce832

File tree

6 files changed

+411
-19
lines changed

6 files changed

+411
-19
lines changed

Documentation/git-multi-pack-index.adoc

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ SYNOPSIS
1212
'git multi-pack-index' [<options>] write [--preferred-pack=<pack>]
1313
[--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]
1414
[--refs-snapshot=<path>]
15+
'git multi-pack-index' [<options>] compact [--[no-]incremental]
16+
<from> <to>
1517
'git multi-pack-index' [<options>] verify
1618
'git multi-pack-index' [<options>] expire
1719
'git multi-pack-index' [<options>] repack [--batch-size=<size>]
@@ -83,6 +85,17 @@ marker).
8385
necessary.
8486
--
8587

88+
compact::
89+
Write a new MIDX layer containing only objects and packs present
90+
in the range `<from>` to `<to>`, where both arguments are
91+
checksums of existing layers in the MIDX chain.
92+
+
93+
--
94+
--incremental::
95+
Write the result to a MIDX chain instead of writing a
96+
stand-alone MIDX. Incompatible with `--bitmap`.
97+
--
98+
8699
verify::
87100
Verify the contents of the MIDX file.
88101

builtin/multi-pack-index.c

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,10 @@
1717
" [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]\n" \
1818
" [--refs-snapshot=<path>]")
1919

20+
#define BUILTIN_MIDX_COMPACT_USAGE \
21+
N_("git multi-pack-index [<options>] compact [--[no-]incremental]\n" \
22+
" <from> <to>")
23+
2024
#define BUILTIN_MIDX_VERIFY_USAGE \
2125
N_("git multi-pack-index [<options>] verify")
2226

@@ -30,6 +34,10 @@ static char const * const builtin_multi_pack_index_write_usage[] = {
3034
BUILTIN_MIDX_WRITE_USAGE,
3135
NULL
3236
};
37+
static char const * const builtin_multi_pack_index_compact_usage[] = {
38+
BUILTIN_MIDX_COMPACT_USAGE,
39+
NULL
40+
};
3341
static char const * const builtin_multi_pack_index_verify_usage[] = {
3442
BUILTIN_MIDX_VERIFY_USAGE,
3543
NULL
@@ -44,6 +52,7 @@ static char const * const builtin_multi_pack_index_repack_usage[] = {
4452
};
4553
static char const * const builtin_multi_pack_index_usage[] = {
4654
BUILTIN_MIDX_WRITE_USAGE,
55+
BUILTIN_MIDX_COMPACT_USAGE,
4756
BUILTIN_MIDX_VERIFY_USAGE,
4857
BUILTIN_MIDX_EXPIRE_USAGE,
4958
BUILTIN_MIDX_REPACK_USAGE,
@@ -195,6 +204,63 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
195204
return ret;
196205
}
197206

207+
static int cmd_multi_pack_index_compact(int argc, const char **argv,
208+
const char *prefix,
209+
struct repository *repo)
210+
{
211+
struct multi_pack_index *m, *cur;
212+
struct multi_pack_index *from_midx = NULL;
213+
struct multi_pack_index *to_midx = NULL;
214+
struct odb_source *source;
215+
int ret;
216+
217+
struct option *options;
218+
static struct option builtin_multi_pack_index_compact_options[] = {
219+
OPT_BIT(0, "incremental", &opts.flags,
220+
N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL),
221+
OPT_END(),
222+
};
223+
224+
repo_config(repo, git_multi_pack_index_write_config, NULL);
225+
226+
options = add_common_options(builtin_multi_pack_index_compact_options);
227+
228+
trace2_cmd_mode(argv[0]);
229+
230+
if (isatty(2))
231+
opts.flags |= MIDX_PROGRESS;
232+
argc = parse_options(argc, argv, prefix,
233+
options, builtin_multi_pack_index_compact_usage,
234+
0);
235+
236+
if (argc != 2)
237+
usage_with_options(builtin_multi_pack_index_compact_usage,
238+
options);
239+
source = handle_object_dir_option(the_repository);
240+
241+
FREE_AND_NULL(options);
242+
243+
m = get_multi_pack_index(source);
244+
245+
for (cur = m; cur && !(from_midx && to_midx); cur = cur->base_midx) {
246+
const char *midx_csum = get_midx_checksum(cur);
247+
248+
if (!from_midx && !strcmp(midx_csum, argv[0]))
249+
from_midx = cur;
250+
if (!to_midx && !strcmp(midx_csum, argv[1]))
251+
to_midx = cur;
252+
}
253+
254+
if (!from_midx)
255+
die(_("could not find MIDX 'from': %s"), argv[0]);
256+
if (!to_midx)
257+
die(_("could not find MIDX 'to': %s"), argv[1]);
258+
259+
ret = write_midx_file_compact(source, from_midx, to_midx, opts.flags);
260+
261+
return ret;
262+
}
263+
198264
static int cmd_multi_pack_index_verify(int argc, const char **argv,
199265
const char *prefix,
200266
struct repository *repo UNUSED)
@@ -295,6 +361,7 @@ int cmd_multi_pack_index(int argc,
295361
struct option builtin_multi_pack_index_options[] = {
296362
OPT_SUBCOMMAND("repack", &fn, cmd_multi_pack_index_repack),
297363
OPT_SUBCOMMAND("write", &fn, cmd_multi_pack_index_write),
364+
OPT_SUBCOMMAND("compact", &fn, cmd_multi_pack_index_compact),
298365
OPT_SUBCOMMAND("verify", &fn, cmd_multi_pack_index_verify),
299366
OPT_SUBCOMMAND("expire", &fn, cmd_multi_pack_index_expire),
300367
OPT_END(),

0 commit comments

Comments
 (0)