git.99rst.org Git - git.git/log

SKIP_DASHED_BUILT_INS: respect `config.mak`

When `SKIP_DASHED_BUILT_INS` is specified in `config.mak`, the dashed
form of the built-ins was still generated.

By moving the `SKIP_DASHED_BUILT_INS` handling after `config.mak` was
read, this can be avoided.

Signed-off-by: Johannes Schindelin <redacted>
Signed-off-by: Junio C Hamano <redacted>

Merge branch 'en/ort-directory-rename' into en/merge-ort-perf

* en/ort-directory-rename: (28 commits)
  merge-ort: fix a directory rename detection bug
  merge-ort: process_renames() now needs more defensiveness
  merge-ort: implement apply_directory_rename_modifications()
  merge-ort: add a new toplevel_dir field
  merge-ort: implement handle_path_level_conflicts()
  merge-ort: implement check_for_directory_rename()
  merge-ort: implement apply_dir_rename() and check_dir_renamed()
  merge-ort: implement compute_collisions()
  merge-ort: modify collect_renames() for directory rename handling
  merge-ort: implement handle_directory_level_conflicts()
  merge-ort: implement compute_rename_counts()
  merge-ort: copy get_renamed_dir_portion() from merge-recursive.c
  merge-ort: add outline of get_provisional_directory_renames()
  merge-ort: add outline for computing directory renames
  merge-ort: collect which directories are removed in dirs_removed
  merge-ort: initialize and free new directory rename data structures
  merge-ort: add new data structures for directory rename detection
  merge-ort: add implementation of type-changed rename handling
  merge-ort: add implementation of normal rename handling
  merge-ort: add implementation of rename collisions
  ...

merge-ort: fix a directory rename detection bug

As noted in commit 902c521a35 ("t6423: more involved directory rename
test", 2020-10-15), when we have a case where

  * dir/subdir/ has several files
  * almost all files in dir/subdir/ are renamed to folder/subdir/
  * one of the files in dir/subdir/ is renamed to folder/subdir/newsubdir/
  * the other side of history (that doesn't do the renames) adds a
    new file to dir/subdir/

Then for the majority of the file renames, the directory rename of
   dir/subdir/ -> folder/subdir/
is actually not represented that way but as
   dir/ -> folder/
We also had one rename that was represented as
   dir/subdir/ -> folder/subdir/newsubdir/

Now, since there's a new file in dir/subdir/, where does it go?  Well,
there's only one rule for dir/subdir/, so the code previously noted that
this rule had the "majority" of the one "relevant" rename and thus
erroneously used it to place the file in folder/subdir/newsubdir/.  We
really want the heavy weight associated with dir/ -> folder/ to also be
treated as dir/subdir/ -> folder/subdir/, so that we correctly place the
file in folder/subdir/.

Add a bunch of logic to make sure that we use all relevant renamings in
directory rename detection.

Note that testcase 12f of t6423 still fails after this, but it gets
further than merge-recursive does.  There are some performance related
bits in that testcase (the region_enter messages) that do not yet
succeed, but the rest of the testcase works after this patch.
Subsequent patch series will fix up the performance side.

Signed-off-by: Elijah Newren <redacted>
Signed-off-by: Junio C Hamano <redacted>

merge-ort: process_renames() now needs more defensiveness

Since directory rename detection adds new paths to opt->priv->paths and
removes old ones, process_renames() needs to now check whether
pair->one->path actually exists in opt->priv->paths instead of just
assuming it does.

Signed-off-by: Elijah Newren <redacted>
Signed-off-by: Junio C Hamano <redacted>

merge-ort: implement apply_directory_rename_modifications()

This function roughly follows the same outline as the function of the
same name from merge-recursive.c, but the code diverges in multiple
ways due to some special considerations:
  * merge-ort's version needs to update opt->priv->paths with any new
    paths (and opt->priv->paths points to struct conflict_infos which
    track quite a bit of metadata for each path); merge-recursive's
    version would directly update the index
  * merge-ort requires that opt->priv->paths has any leading directories
    of any relevant files also be included in the set of paths.  And
    due to pointer equality requirements on merged_info.directory_name,
    we have to be careful how we compute and insert these.
  * due to the above requirements on opt->priv->paths, merge-ort's
    version starts with a long comment to explain all the special
    considerations that need to be handled
  * merge-ort can use the full data stored in opt->priv->paths to avoid
    making expensive get_tree_entry() calls to regather the necessary
    data.
  * due to messages being deferred automatically in merge-ort, this is
    the best place to handle conflict messages whereas in
    merge-recursive.c they are deferred manually so that processing of
    entries does all the printing

Signed-off-by: Elijah Newren <redacted>
Signed-off-by: Junio C Hamano <redacted>

merge-ort: add a new toplevel_dir field

Due to the string-equality-iff-pointer-equality requirements placed on
merged_info.directory_name, apply_directory_rename_modifications() will
need to have access to the exact toplevel directory name string pointer
and can't just use a new empty string. Store it in a field that we can
use.

Signed-off-by: Elijah Newren <redacted>
Signed-off-by: Junio C Hamano <redacted>

merge-ort: implement handle_path_level_conflicts()

This is copied from merge-recursive.c, with minor tweaks due to:
  * using strmap API
  * merge-ort not using the non_unique_new_dir field, since it'll
    obviate its need entirely later with performance improvements
  * adding a new path_in_way() function that uses opt->priv->paths
    instead of doing an expensive tree_has_path() lookup to see if
    a tree has a given path.

Signed-off-by: Elijah Newren <redacted>
Signed-off-by: Junio C Hamano <redacted>

merge-ort: implement check_for_directory_rename()

This is copied from merge-recursive.c, with minor tweaks due to using strmap
API and the fact that it can use opt->priv->paths to get all pathnames that
exist instead of taking a tree object.

This depends on a new function, handle_path_level_conflicts(), which
just has a placeholder die-not-yet-implemented implementation for now; a
subsequent patch will implement it.

Signed-off-by: Elijah Newren <redacted>
Signed-off-by: Junio C Hamano <redacted>

merge-ort: implement apply_dir_rename() and check_dir_renamed()

Both of these are copied from merge-recursive.c, with just minor tweaks
due to using strmap API and not having a non_unique_new_dir field.

Signed-off-by: Elijah Newren <redacted>
Signed-off-by: Junio C Hamano <redacted>

merge-ort: implement compute_collisions()

This is nearly a wholesale copy of compute_collisions() from
merge-recursive.c, and the logic remains the same, but it has been
tweaked slightly due to:

  * using strmap.h API (instead of direct hashmaps)
  * allocation/freeing of data structures were done separately in
    merge_start() and clear_or_reinit_internal_opts() in an earlier
    patch in this series
  * there is no non_unique_new_dir data field in merge-ort; that will
    be handled a different way

It does depend on two new functions, apply_dir_rename() and
check_dir_renamed() which were introduced with simple
die-not-yet-implemented shells and will be implemented in subsequent
patches.

Signed-off-by: Elijah Newren <redacted>
Signed-off-by: Junio C Hamano <redacted>

merge-ort: modify collect_renames() for directory rename handling

collect_renames() is similar to merge-recursive.c's get_renames(), but
lacks the directory rename handling found in the latter. Port that code
structure over to merge-ort. This introduces three new
die-not-yet-implemented functions that will be defined in future
commits.

Signed-off-by: Elijah Newren <redacted>
Signed-off-by: Junio C Hamano <redacted>

merge-ort: implement handle_directory_level_conflicts()

This is modelled on the version of handle_directory_level_conflicts()
from merge-recursive.c, but is massively simplified due to the following
factors:
  * strmap API provides simplifications over using direct hashmap
  * we have a dirs_removed field in struct rename_info that we have an
    easy way to populate from collect_merge_info(); this was already
    used in compute_rename_counts() and thus we do not need to check
    for condition #2.
  * The removal of condition #2 by handling it earlier in the code also
    obviates the need to check for condition #3 -- if both sides renamed
    a directory, meaning that the directory no longer exists on either
    side, then neither side could have added any new files to that
    directory, and thus there are no files whose locations we need to
    move due to such a directory rename.

In fact, the same logic that makes condition #3 irrelevant means
condition #1 is also irrelevant so we could drop this function.
However, it is cheap to check if both sides rename the same directory,
and doing so can save future computation.  So, simply remove any
directories that both sides renamed from the list of directory renames.

Signed-off-by: Elijah Newren <redacted>
Signed-off-by: Junio C Hamano <redacted>

merge-ort: implement compute_rename_counts()

This function is based on the first half of get_directory_renames() from
merge-recursive.c; as part of the implementation, factor out a routine,
increment_count(), to update the bookkeeping to track the number of
items renamed into new directories.

Signed-off-by: Elijah Newren <redacted>
Signed-off-by: Junio C Hamano <redacted>

merge-ort: copy get_renamed_dir_portion() from merge-recursive.c

Signed-off-by: Elijah Newren <redacted>
Signed-off-by: Junio C Hamano <redacted>

merge-ort: add outline of get_provisional_directory_renames()

This function is based on merge-recursive.c's get_directory_renames(),
except that the first half has been split out into a not-yet-implemented
compute_rename_counts(). The primary difference here is our lack of the
non_unique_new_dir boolean in our strmap. The lack of that field will
at first cause us to fail testcase 2b of t6423; however, future
optimizations will obviate the need for that ugly field so we have just
left it out.

Signed-off-by: Elijah Newren <redacted>
Signed-off-by: Junio C Hamano <redacted>

merge-ort: add outline for computing directory renames

Port some directory rename handling changes from merge-recursive.c's
detect_and_process_renames() to the same-named function of merge-ort.c.
This does not yet add any use or handling of directory renames, just the
outline for where we start to compute them. Thus, a future patch will
add port additional changes to merge-ort's detect_and_process_renames().

Signed-off-by: Elijah Newren <redacted>
Signed-off-by: Junio C Hamano <redacted>

fsck doc: remove ancient out-of-date diagnostics

Remove diagnostics that haven't been emitted by "fsck" or its
predecessors for around 15 years. This documentation was added in
c64b9b88605 (Reference documentation for the core git commands.,
2005-05-05), but was out-of-date quickly after that.

Notes on individual diagnostics:

- "expect dangling commits": Added in bcee6fd8e71 (Make 'fsck' able
   to[...], 2005-04-13), documented in c64b9b88605. Not emitted since
   1024932f019 (fsck-cache: walk the 'refs' directory[...],
   2005-05-18).

- "missing sha1 directory": Added in 20222118ae4 (Add first cut at
   "fsck-cache"[...], 2005-04-08), documented in c64b9b88605. Not
   emitted since 230f13225df (Create object subdirectories on demand,
   2005-10-08).

Signed-off-by: Ævar Arnfjörð Bjarmason <redacted>
Reviewed-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

Doc: clarify contents of packfile sent as URI

Clarify that, when the packfile-uri feature is used, the client should
not assume that the extra packfiles downloaded would only contain a
single blob, but support packfiles containing multiple objects of all
types.

Signed-off-by: Jonathan Tan <redacted>
Signed-off-by: Junio C Hamano <redacted>

t7900: clean up some broken refs

The tests for the 'prefetch' task create remotes and fetch refs into
'refs/prefetch/<remote>/' and tags into 'refs/tags/'. These tests use
the remotes to create objects not intended to be seen by the "local"
repository.

In that sense, the incrmental-repack tasks did not have these objects
and refs in mind. That test replaces the object directory with a
specific pack-file layout for testing the batch-size logic. However,
this causes some operations to start showing warnings such as:

error: refs/prefetch/remote1/one does not point to a valid object!
error: refs/tags/one does not point to a valid object!

This only shows up if you run the tests verbosely and watch the output.
It caught my eye and I _thought_ that there was a bug where 'git gc' or
'git repack' wouldn't check 'refs/prefetch/' before pruning objects.
That is incorrect. Those commands do handle 'refs/prefetch/' correctly.

All that is left is to clean up the tests in t7900-maintenance.sh to
remove these tags and refs that are not being repacked for the
incremental-repack tests. Use update-ref to ensure this works with all
ref backends.

Helped-by: Taylor Blau <redacted>
Signed-off-by: Derrick Stolee <redacted>
Reviewed-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

maintenance: set log.excludeDecoration durin prefetch

The 'prefetch' task fetches refs from all remotes and places them in the
refs/prefetch/<remote>/ refspace. As this task is intended to run in the
background, this allows users to keep their local data very close to the
remote servers' data while not updating the users' understanding of the
remote refs in refs/remotes/<remote>/.

However, this can clutter 'git log' decorations with copies of the refs
with the full name 'refs/prefetch/<remote>/<branch>'.

The log.excludeDecoration config option was added in a6be5e67 (log: add
log.excludeDecoration config option, 2020-05-16) for exactly this
purpose.

Ensure we set this only for users that would benefit from it by
assigning it at the beginning of the prefetch task. Other alternatives
would be during 'git maintenance register' or 'git maintenance start',
but those might assign the config even when the prefetch task is
disabled by existing config. Further, users could run 'git maintenance
run --task=prefetch' using their own scripting or scheduling. This
provides the best coverage to automatically update the config when
valuable.

It is improbable, but possible, that users might want to run the
prefetch task _and_ see these refs in their log decorations. This seems
incredibly unlikely to me, but users can always opt-in on a
command-by-command basis using --decorate-refs=refs/prefetch/.

Test that this works in a few cases. In particular, ensure that our
assignment of log.excludeDecoration=refs/prefetch/ is additive to other
existing exclusions. Further, ensure we do not add multiple copies in
multiple runs.

Signed-off-by: Derrick Stolee <redacted>
Reviewed-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

commit: ignore additional signatures when parsing signed commits

When we create a commit with multiple signatures, neither of these
signatures includes the other. Consequently, when we produce the
payload which has been signed so we can verify the commit, we must strip
off any other signatures, or the payload will differ from what was
signed. Do so, and in preparation for verifying with multiple
algorithms, pass the algorithm we want to verify into
parse_signed_commit.

Signed-off-by: brian m. carlson <redacted>
Signed-off-by: Junio C Hamano <redacted>

ref-filter: switch some uses of unsigned long to size_t

In the future, we'll want to pass some of the arguments of find_subpos
to strbuf_detach, which takes a size_t. This is fine on systems where
that's the same size as unsigned long, but that isn't the case on all
systems. Moreover, size_t makes sense since it's not possible to use a
buffer here that's larger than memory anyway.

Let's switch each use to size_t for these lengths in
grab_sub_body_contents and find_subpos.

Signed-off-by: brian m. carlson <redacted>
Signed-off-by: Junio C Hamano <redacted>

doc: add corrected commit date info

With generation data chunk and corrected commit dates implemented, let's
update the technical documentation for commit-graph.

Signed-off-by: Abhishek Kumar <redacted>
Reviewed-by: Taylor Blau <redacted>
Reviewed-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

commit-reach: use corrected commit dates in paint_down_to_common()

091f4cf (commit: don't use generation numbers if not needed,
2018-08-30) changed paint_down_to_common() to use commit dates instead
of generation numbers v1 (topological levels) as the performance
regressed on certain topologies. With generation number v2 (corrected
commit dates) implemented, we no longer have to rely on commit dates and
can use generation numbers.

For example, the command `git merge-base v4.8 v4.9` on the Linux
repository walks 167468 commits, taking 0.135s for committer date and
167496 commits, taking 0.157s for corrected committer date respectively.

While using corrected commit dates, Git walks nearly the same number of
commits as commit date, the process is slower as for each comparision we
have to access a commit-slab (for corrected committer date) instead of
accessing struct member (for committer date).

This change incidentally broke the fragile t6404-recursive-merge test.
t6404-recursive-merge sets up a unique repository where all commits have
the same committer date without a well-defined merge-base.

While running tests with GIT_TEST_COMMIT_GRAPH unset, we use committer
date as a heuristic in paint_down_to_common(). 6404.1 'combined merge
conflicts' merges commits in the order:
- Merge C with B to form an intermediate commit.
- Merge the intermediate commit with A.

With GIT_TEST_COMMIT_GRAPH=1, we write a commit-graph and subsequently
use the corrected committer date, which changes the order in which
commits are merged:
- Merge A with B to form an intermediate commit.
- Merge the intermediate commit with C.

While resulting repositories are equivalent, 6404.4 'virtual trees were
processed' fails with GIT_TEST_COMMIT_GRAPH=1 as we are selecting
different merge-bases and thus have different object ids for the
intermediate commits.

As this has already causes problems (as noted in 859fdc0 (commit-graph:
define GIT_TEST_COMMIT_GRAPH, 2018-08-29)), we disable commit graph
within t6404-recursive-merge.

Signed-off-by: Abhishek Kumar <redacted>
Reviewed-by: Taylor Blau <redacted>
Reviewed-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

commit-graph: use generation v2 only if entire chain does

Since there are released versions of Git that understand generation
numbers in the commit-graph's CDAT chunk but do not understand the GDAT
chunk, the following scenario is possible:

1. "New" Git writes a commit-graph with the GDAT chunk.
2. "Old" Git writes a split commit-graph on top without a GDAT chunk.

If each layer of split commit-graph is treated independently, as it was
the case before this commit, with Git inspecting only the current layer
for chunk_generation_data pointer, commits in the lower layer (one with
GDAT) whould have corrected commit date as their generation number,
while commits in the upper layer would have topological levels as their
generation. Corrected commit dates usually have much larger values than
topological levels. This means that if we take two commits, one from the
upper layer, and one reachable from it in the lower layer, then the
expectation that the generation of a parent is smaller than the
generation of a child would be violated.

It is difficult to expose this issue in a test. Since we _start_ with
artificially low generation numbers, any commit walk that prioritizes
generation numbers will walk all of the commits with high generation
number before walking the commits with low generation number. In all the
cases I tried, the commit-graph layers themselves "protect" any
incorrect behavior since none of the commits in the lower layer can
reach the commits in the upper layer.

This issue would manifest itself as a performance problem in this case,
especially with something like "git log --graph" since the low
generation numbers would cause the in-degree queue to walk all of the
commits in the lower layer before allowing the topo-order queue to write
anything to output (depending on the size of the upper layer).

Therefore, When writing the new layer in split commit-graph, we write a
GDAT chunk only if the topmost layer has a GDAT chunk. This guarantees
that if a layer has GDAT chunk, all lower layers must have a GDAT chunk
as well.

Rewriting layers follows similar approach: if the topmost layer below
the set of layers being rewritten (in the split commit-graph chain)
exists, and it does not contain GDAT chunk, then the result of rewrite
does not have GDAT chunks either.

Signed-off-by: Derrick Stolee <redacted>
Signed-off-by: Abhishek Kumar <redacted>
Reviewed-by: Taylor Blau <redacted>
Reviewed-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

commit-graph: implement generation data chunk

As discovered by Ævar, we cannot increment graph version to
distinguish between generation numbers v1 and v2 [1]. Thus, one of
pre-requistes before implementing generation number v2 was to
distinguish between graph versions in a backwards compatible manner.

We are going to introduce a new chunk called Generation DATa chunk (or
GDAT). GDAT will store corrected committer date offsets whereas CDAT
will still store topological level.

Old Git does not understand GDAT chunk and would ignore it, reading
topological levels from CDAT. New Git can parse GDAT and take advantage
of newer generation numbers, falling back to topological levels when
GDAT chunk is missing (as it would happen with a commit-graph written
by old Git).

We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT'
which forces commit-graph file to be written without generation data
chunk to emulate a commit-graph file written by old Git.

To minimize the space required to store corrrected commit date, Git
stores corrected commit date offsets into the commit-graph file, instea
of corrected commit dates. This saves us 4 bytes per commit, decreasing
the GDAT chunk size by half, but it's possible for the offset to
overflow the 4-bytes allocated for storage. As such overflows are and
should be exceedingly rare, we use the following overflow management
scheme:

We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV')
to store corrected commit dates for commits with offsets greater than
GENERATION_NUMBER_V2_OFFSET_MAX.

If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set
the MSB of the offset and the other bits store the position of corrected
commit date in GDOV chunk, similar to how Extra Edge List is maintained.

We test the overflow-related code with the following repo history:

           F - N - U
          /         \
U - N - U            N
         \          /
  N - F - N

Where the commits denoted by U have committer date of zero seconds
since Unix epoch, the commits denoted by N have committer date of
1112354055 (default committer date for the test suite) seconds since
Unix epoch and the commits denoted by F have committer date of
(2 ^ 31 - 2) seconds since Unix epoch.

The largest offset observed is 2 ^ 31, just large enough to overflow.

[1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/

Signed-off-by: Abhishek Kumar <redacted>
Reviewed-by: Taylor Blau <redacted>
Reviewed-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

commit-graph: implement corrected commit date

With most of preparations done, let's implement corrected commit date.

The corrected commit date for a commit is defined as:

* A commit with no parents (a root commit) has corrected commit date
  equal to its committer date.
* A commit with at least one parent has corrected commit date equal to
  the maximum of its commit date and one more than the largest corrected
  commit date among its parents.

As a special case, a root commit with timestamp of zero (01.01.1970
00:00:00Z) has corrected commit date of one, to be able to distinguish
from GENERATION_NUMBER_ZERO (that is, an uncomputed corrected commit
date).

To minimize the space required to store corrected commit date, Git
stores corrected commit date offsets into the commit-graph file. The
corrected commit date offset for a commit is defined as the difference
between its corrected commit date and actual commit date.

Storing corrected commit date requires sizeof(timestamp_t) bytes, which
in most cases is 64 bits (uintmax_t). However, corrected commit date
offsets can be safely stored using only 32-bits. This halves the size
of GDAT chunk, which is a reduction of around 6% in the size of
commit-graph file.

However, using offsets be problematic if a commit is malformed but valid
and has committer date of 0 Unix time, as the offset would be the same
as corrected commit date and thus require 64-bits to be stored properly.

While Git does not write out offsets at this stage, Git stores the
corrected commit dates in member generation of struct commit_graph_data.
It will begin writing commit date offsets with the introduction of
generation data chunk.

Signed-off-by: Abhishek Kumar <redacted>
Reviewed-by: Taylor Blau <redacted>
Reviewed-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

commit-graph: return 64-bit generation number

In a preparatory step for introducing corrected commit dates, let's
return timestamp_t values from commit_graph_generation(), use
timestamp_t for local variables and define GENERATION_NUMBER_INFINITY
as (2 ^ 63 - 1) instead.

We rename GENERATION_NUMBER_MAX to GENERATION_NUMBER_V1_MAX to
represent the largest topological level we can store in the commit data
chunk.

With corrected commit dates implemented, we will have two such *_MAX
variables to denote the largest offset and largest topological level
that can be stored.

Signed-off-by: Abhishek Kumar <redacted>
Reviewed-by: Taylor Blau <redacted>
Reviewed-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

commit-graph: add a slab to store topological levels

In a later commit we will introduce corrected commit date as the
generation number v2. Corrected commit dates will be stored in the new
seperate Generation Data chunk. However, to ensure backwards
compatibility with "Old" Git we need to continue to write generation
number v1 (topological levels) to the commit data chunk. Thus, we need
to compute and store both versions of generation numbers to write the
commit-graph file.

Therefore, let's introduce a commit-slab `topo_level_slab` to store
topological levels; corrected commit date will be stored in the member
`generation` of struct commit_graph_data.

The macros `GENERATION_NUMBER_INFINITY` and `GENERATION_NUMBER_ZERO`
mark commits not in the commit-graph file and commits written by a
version of Git that did not compute generation numbers respectively.
Generation numbers are computed identically for both kinds of commits.

A "slab-miss" should return `GENERATION_NUMBER_INFINITY` as the commit
is not in the commit-graph file. However, since the slab is
zero-initialized, it returns 0 (or rather `GENERATION_NUMBER_ZERO`).
Thus, we no longer need to check if the topological level of a commit is
`GENERATION_NUMBER_INFINITY`.

We will add a pointer to the slab in `struct write_commit_graph_context`
and `struct commit_graph` to populate the slab in
`fill_commit_graph_info` if the commit has a pre-computed topological
level as in case of split commit-graphs.

Signed-off-by: Abhishek Kumar <redacted>
Reviewed-by: Taylor Blau <redacted>
Reviewed-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

t6600-test-reach: generalize *_three_modes

In a preparatory step to implement generation number v2, we add tests to
ensure Git can read and parse commit-graph files without Generation Data
chunk. These files represent commit-graph files written by Old Git and
are neccesary for backward compatability.

We extend run_three_modes() and test_three_modes() to *_all_modes() with
the fourth mode being "commit-graph without generation data chunk".

Signed-off-by: Abhishek Kumar <redacted>
Reviewed-by: Taylor Blau <redacted>
Reviewed-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

commit-graph: consolidate fill_commit_graph_info

Both fill_commit_graph_info() and fill_commit_in_graph() parse
information present in commit data chunk. Let's simplify the
implementation by calling fill_commit_graph_info() within
fill_commit_in_graph().

fill_commit_graph_info() used to not load committer data from commit data
chunk. However, with the upcoming switch to using corrected committer
date as generation number v2, we will have to load committer date to
compute generation number value anyway.

e51217e15 (t5000: test tar files that overflow ustar headers,
30-06-2016) introduced a test 'generate tar with future mtime' that
creates a commit with committer date of (2^36 + 1) seconds since
EPOCH. The CDAT chunk provides 34-bits for storing committer date, thus
committer time overflows into generation number (within CDAT chunk) and
has undefined behavior.

The test used to pass as fill_commit_graph_info() would not set struct
member `date` of struct commit and load committer date from the object
database, generating a tar file with the expected mtime.

However, with corrected commit date, we will load the committer date
from CDAT chunk (truncated to lower 34-bits to populate the generation
number. Thus, Git sets date and generates tar file with the truncated
mtime.

The ustar format (the header format used by most modern tar programs)
only has room for 11 (or 12, depending on some implementations) octal
digits for the size and mtime of each file.

As the CDAT chunk is overflow by 12-octal digits but not 11-octal
digits, we split the existing tests to test both implementations
separately and add a new explicit test for 11-digit implementation.

To test the 11-octal digit implementation, we create a future commit
with committer date of 2^34 - 1, which overflows 11-octal digits without
overflowing 34-bits of the Commit Date chunks.

To test the 12-octal digit implementation, the smallest committer date
possible is 2^36 + 1, which overflows the CDAT chunk and thus
commit-graph must be disabled for the test.

Signed-off-by: Abhishek Kumar <redacted>
Reviewed-by: Taylor Blau <redacted>
Reviewed-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

revision: parse parent in indegree_walk_step()

In indegree_walk_step(), we add unvisited parents to the indegree queue.
However, parents are not guaranteed to be parsed. As the indegree queue
sorts by generation number, let's parse parents before inserting them to
ensure the correct priority order.

Signed-off-by: Abhishek Kumar <redacted>
Reviewed-by: Taylor Blau <redacted>
Reviewed-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

commit-graph: fix regression when computing Bloom filters

Before computing Bloom filters, the commit-graph machinery uses
commit_gen_cmp to sort commits by generation order for improved diff
performance. 3d11275505 (commit-graph: examine commits by generation
number, 2020-03-30) claims that this sort can reduce the time spent to
compute Bloom filters by nearly half.

But since c49c82aa4c (commit: move members graph_pos, generation to a
slab, 2020-06-17), this optimization is broken, since asking for a
'commit_graph_generation()' directly returns GENERATION_NUMBER_INFINITY
while writing.

Not all hope is lost, though: 'commit_gen_cmp()' falls back to
comparing commits by their date when they have equal generation number,
and so since c49c82aa4c is purely a date comparison function. This
heuristic is good enough that we don't seem to loose appreciable
performance while computing Bloom filters.

Applying this patch (compared with v2.30.0) speeds up computing Bloom
filters by factors ranging from 0.40% to 5.19% on various repositories [1].

So, avoid the useless 'commit_graph_generation()' while writing by
instead accessing the slab directly. This returns the newly-computed
generation numbers, and allows us to avoid the heuristic by directly
comparing generation numbers.

[1]: https://lore.kernel.org/git/20210105094535.GN8396@szeder.dev/

Signed-off-by: Abhishek Kumar <redacted>
Reviewed-by: Taylor Blau <redacted>
Reviewed-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

cache-tree: speed up consecutive path comparisons

The previous change reduced time spent in strlen() while comparing
consecutive paths in verify_cache(), but we can do better. The
conditional checks the existence of a directory separator at the correct
location, but only after doing a string comparison. Swap the order to be
logically equivalent but perform fewer string comparisons.

To test the effect on performance, I used a repository with over three
million paths in the index. I then ran the following command on repeat:

  git -c index.threads=1 commit --amend --allow-empty --no-edit

Here are the measurements over 10 runs after a 5-run warmup:

  Benchmark #1: v2.30.0
    Time (mean ± σ):     854.5 ms ±  18.2 ms
    Range (min … max):   825.0 ms … 892.8 ms

  Benchmark #2: Previous change
    Time (mean ± σ):     833.2 ms ±  10.3 ms
    Range (min … max):   815.8 ms … 849.7 ms

  Benchmark #3: This change
    Time (mean ± σ):     815.5 ms ±  18.1 ms
    Range (min … max):   795.4 ms … 849.5 ms

This change is 2% faster than the previous change and 5% faster than
v2.30.0.

Signed-off-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

cache-tree: use ce_namelen() instead of strlen()

Use the name length field of cache entries instead of calculating its
value anew.

Signed-off-by: René Scharfe <redacted>
Signed-off-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

index-format: discuss recursion of cache-tree better

The end of the cache tree index extension format trails off with
ellipses ever since 23fcc98 (doc: technical details about the index
file format, 2011-03-01). While an intuitive reader could gather what
this means, it could be better to use "and so on" instead.

Really, this is only justified because I also wanted to point out that
the number of subtrees in the index format is used to determine when the
recursive depth-first-search stack should be "popped." This should help
to add clarity to the format.

Signed-off-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

index-format: update preamble to cache tree extension

I had difficulty in my efforts to learn about the cache tree extension
based on the documentation and code because I had an incorrect
assumption about how it behaved. This might be due to some ambiguity in
the documentation, so this change modifies the beginning of the cache
tree format by expanding the description of the feature.

My hope is that this documentation clarifies a few things:

1. There is an in-memory recursive tree structure that is constructed
from the extension data. This structure has a few differences, such
as where the name is stored.

2. What does it mean for an entry to be invalid?

3. When exactly are "new" trees created?

Helped-by: Junio C Hamano <redacted>
Signed-off-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

index-format: use 'cache tree' over 'cached tree'

The index has a "cache tree" extension. This corresponds to a
significant API implemented in cache-tree.[ch]. However, there are a few
places that refer to this erroneously as "cached tree". These are rare,
but notably the index-format.txt file itself makes this error.

The only other reference is in t7104-reset-hard.sh.

Reported-by: Junio C Hamano <redacted>
Signed-off-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

cache-tree: trace regions for prime_cache_tree

Commands such as "git reset --hard" rebuild the in-memory representation
of the cache tree index extension by parsing tree objects starting at a
known root tree. The performance of this operation can vary widely
depending on the width and depth of the repository's working directory
structure. Measure the time in this operation using trace2 regions in
prime_cache_tree().

Signed-off-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

cache-tree: trace regions for I/O

As we write or read the cache tree index extension, it can be good to
isolate how much of the file I/O time is spent constructing this
in-memory tree from the existing index or writing it out again to the
new index file. Use trace2 regions to indicate that we are spending time
on this operation.

Signed-off-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

The third batch

Signed-off-by: Junio C Hamano <redacted>

Merge branch 'jc/macos-install-dependencies-fix'

Fix for procedure to building CI test environment for mac.

* jc/macos-install-dependencies-fix:
ci/install-depends: attempt to fix "brew cask" stuff

Merge branch 'tb/local-clone-race-doc'

Doc update.

* tb/local-clone-race-doc:
Documentation/git-clone.txt: document race with --local

Merge branch 'bc/doc-status-short'

Doc update.

* bc/doc-status-short:
docs: rephrase and clarify the git status --short format

Merge branch 'dl/p4-encode-after-kw-expansion'

Text encoding fix for "git p4".

* dl/p4-encode-after-kw-expansion:
git-p4: fix syncing file types with pattern

Merge branch 'ab/gettext-charset-comment-fix'

Comments update.

* ab/gettext-charset-comment-fix:
gettext.c: remove/reword a mostly-useless comment
Makefile: remove a warning about old GETTEXT_POISON flag

Merge branch 'ug/doc-lose-dircache'

Doc update.

* ug/doc-lose-dircache:
doc: remove "directory cache" from man pages

Merge branch 'ad/t4129-setfacl-target-fix'

Test fix.

* ad/t4129-setfacl-target-fix:
t4129: fix setfacl-related permissions failure

Merge branch 'jk/t5516-deflake'

Test fix.

* jk/t5516-deflake:
t5516: loosen "not our ref" error check

Merge branch 'vv/send-email-with-less-secure-apps-access'

Doc update.

* vv/send-email-with-less-secure-apps-access:
git-send-email.txt: mention less secure app access with Gmail

Merge branch 'pb/mergetool-tool-help-fix'

Fix 2.29 regression where "git mergetool --tool-help" fails to list
all the available tools.

* pb/mergetool-tool-help-fix:
mergetool--lib: fix '--tool-help' to correctly show available tools

Merge branch 'ds/for-each-repo-noopfix'

"git for-each-repo --config=<var> <cmd>" should not run <cmd> for
any repository when the configuration variable <var> is not defined
even once.

* ds/for-each-repo-noopfix:
for-each-repo: do nothing on empty config

Merge branch 'jc/sign-off'

Doc update.

* jc/sign-off:
SubmittingPatches: tighten wording on "sign-off" procedure

Merge branch 'mt/t4129-with-setgid-dir'

Some tests expect that "ls -l" output has either '-' or 'x' for
group executable bit, but setgid bit can be inherited from parent
directory and make these fields 'S' or 's' instead, causing test
failures.

* mt/t4129-with-setgid-dir:
t4129: don't fail if setgid is set in the test directory

Merge branch 'ds/maintenance-part-4'

Follow-up on the "maintenance part-3" which introduced scheduled
maintenance tasks to support platforms whose native scheduling
methods are not 'cron'.

* ds/maintenance-part-4:
  maintenance: use Windows scheduled tasks
  maintenance: use launchctl on macOS
  maintenance: include 'cron' details in docs
  maintenance: extract platform-specific scheduling

The second batch

Signed-off-by: Junio C Hamano <redacted>

Merge branch 'fc/completion-aliases-support'

Bash completion (in contrib/) update to make it easier for
end-users to add completion for their custom "git" subcommands.

* fc/completion-aliases-support:
  completion: add proper public __git_complete
  test: completion: add tests for __git_complete
  completion: bash: improve function detection
  completion: bash: add __git_have_func helper

Merge branch 'en/stash-apply-sparse-checkout'

"git stash" did not work well in a sparsely checked out working
tree.

* en/stash-apply-sparse-checkout:
  stash: fix stash application in sparse-checkouts
  stash: remove unnecessary process forking
  t7012: add a testcase demonstrating stash apply bugs in sparse checkouts

Merge branch 'ar/t6016-modernise'

Test update.

* ar/t6016-modernise:
t6016: move to lib-log-graph.sh framework

Merge branch 'zh/arg-help-format'

Clean up option descriptions in "git cmd --help".

* zh/arg-help-format:
builtin/*: update usage format
parse-options: format argh like error messages

Merge branch 'nk/perf-fsmonitor-cleanup'

Test fix.

* nk/perf-fsmonitor-cleanup:
p7519: allow running without watchman prereq

Merge branch 'ds/trace2-topo-walk'

The topological walk codepath is covered by new trace2 stats.

* ds/trace2-topo-walk:
revision: trace topo-walk statistics

Merge branch 'rs/rebase-commit-validation'

Diagnose command line error of "git rebase" early.

* rs/rebase-commit-validation:
rebase: verify commit parameter

Merge branch 'ma/sha1-is-a-hash'

Retire more names with "sha1" in it.

* ma/sha1-is-a-hash:
  hash-lookup: rename from sha1-lookup
  sha1-lookup: rename `sha1_pos()` as `hash_pos()`
  object-file.c: rename from sha1-file.c
  object-name.c: rename from sha1-name.c

Merge branch 'ma/doc-pack-format-varint-for-sizes'

Doc update.

* ma/doc-pack-format-varint-for-sizes:
pack-format.txt: document sizes at start of delta data

Merge branch 'ma/t1300-cleanup'

Code clean-up.

* ma/t1300-cleanup:
  t1300: don't needlessly work with `core.foo` configs
  t1300: remove duplicate test for `--file no-such-file`
  t1300: remove duplicate test for `--file ../foo`

Merge branch 'pb/doc-modules-git-work-tree-typofix'

Doc fix.

* pb/doc-modules-git-work-tree-typofix:
gitmodules.txt: fix 'GIT_WORK_TREE' variable name

Merge branch 'ta/doc-typofix'

Doc fix.

* ta/doc-typofix:
doc: fix some typos

Merge branch 'bc/rev-parse-path-format'

"git rev-parse" can be explicitly told to give output as absolute
or relative path with the `--path-format=(absolute|relative)` option.

* bc/rev-parse-path-format:
rev-parse: add option for absolute or relative path formatting
abspath: add a function to resolve paths with missing components

Merge branch 'ew/decline-core-abbrev'

The configuration variable 'core.abbrev' can be set to 'no' to
force no abbreviation regardless of the hash algorithm.

* ew/decline-core-abbrev:
core.abbrev=no disables abbreviations

config: allow specifying config entries via envvar pairs

While we currently have the `GIT_CONFIG_PARAMETERS` environment variable
which can be used to pass runtime configuration data to git processes,
it's an internal implementation detail and not supposed to be used by
end users.

Next to being for internal use only, this way of passing config entries
has a major downside: the config keys need to be parsed as they contain
both key and value in a single variable. As such, it is left to the user
to escape any potentially harmful characters in the value, which is
quite hard to do if values are controlled by a third party.

This commit thus adds a new way of adding config entries via the
environment which gets rid of this shortcoming. If the user passes the
`GIT_CONFIG_COUNT=$n` environment variable, Git will parse environment
variable pairs `GIT_CONFIG_KEY_$i` and `GIT_CONFIG_VALUE_$i` for each
`i` in `[0,n)`.

While the same can be achieved with `git -c <name>=<value>`, one may
wish to not do so for potentially sensitive information. E.g. if one
wants to set `http.extraHeader` to contain an authentication token,
doing so via `-c` would trivially leak those credentials via e.g. ps(1),
which typically also shows command arguments.

Signed-off-by: Patrick Steinhardt <redacted>
Signed-off-by: Junio C Hamano <redacted>

environment: make `getenv_safe()` a public function

The `getenv_safe()` helper function helps to safely retrieve multiple
environment values without the need to depend on platform-specific
behaviour for the return value's lifetime. We'll make use of this
function in a following patch, so let's make it available by making it
non-static and adding a declaration.

Signed-off-by: Patrick Steinhardt <redacted>
Signed-off-by: Junio C Hamano <redacted>

config: store "git -c" variables using more robust format

The previous commit added a new format for $GIT_CONFIG_PARAMETERS which
is able to robustly handle subsections with "=" in them. Let's start
writing the new format. Unfortunately, this does much less than you'd
hope, because "git -c" itself has the same ambiguity problem! But it's
still worth doing:

  - we've now pushed the problem from the inter-process communication
    into the "-c" command-line parser. This would free us up to later
    add an unambiguous format there (e.g., separate arguments like "git
    --config key value", etc).

  - for --config-env, the parser already disallows "=" in the
    environment variable name. So:

      git --config-env section.with=equals.key=ENVVAR

    will robustly set section.with=equals.key to the contents of
    $ENVVAR.

The new test shows the improvement for --config-env.

Signed-off-by: Jeff King <redacted>
Signed-off-by: Junio C Hamano <redacted>

config: parse more robust format in GIT_CONFIG_PARAMETERS

When we stuff config options into GIT_CONFIG_PARAMETERS, we shell-quote
each one as a single unit, like:

  'section.one=value1' 'section.two=value2'

On the reading side, we de-quote to get the individual strings, and then
parse them by splitting on the first "=" we find. This format is
ambiguous, because an "=" may appear in a subsection. So the config
represented in a file by both:

  [section "subsection=with=equals"]
  key = value

and:

  [section]
  subsection = with=equals.key=value

ends up in this flattened format like:

  'section.subsection=with=equals.key=value'

and we can't tell which was desired. We have traditionally resolved this
by taking the first "=" we see starting from the left, meaning that we
allowed arbitrary content in the value, but not in the subsection.

Let's make our environment format a bit more robust by separately
quoting the key and value. That turns those examples into:

  'section.subsection=with=equals.key'='value'

and:

  'section.subsection'='with=equals.key=value'

respectively, and we can tell the difference between them. We can detect
which format is in use for any given element of the list based on the
presence of the unquoted "=". That means we can continue to allow the
old format to work to support any callers which manually used the old
format, and we can even intermingle the two formats. The old format
wasn't documented, and nobody was supposed to be using it. But it's
likely that such callers exist in the wild, so it's nice if we can avoid
breaking them. Likewise, it may be possible to trigger an older version
of "git -c" that runs a script that calls into a newer version of "git
-c"; that new version would see the intermingled format.

This does create one complication, which is that the obvious format in
the new scheme for

  [section]
  some-bool

is:

  'section.some-bool'

with no equals. We'd mistake that for an old-style variable. And it even
has the same meaning in the old style, but:

  [section "with=equals"]
  some-bool

does not. It would be:

  'section.with=equals=some-bool'

which we'd take to mean:

  [section]
  with = equals=some-bool

in the old, ambiguous style. Likewise, we can't use:

  'section.some-bool'=''

because that's ambiguous with an actual empty string. Instead, we'll
again use the shell-quoting to give us a hint, and use:

  'section.some-bool'=

to show that we have no value.

Note that this commit just expands the reading side. We'll start writing
the new format via "git -c" in a future patch. In the meantime, the
existing "git -c" tests will make sure we didn't break reading the old
format. But we'll also add some explicit coverage of the two formats to
make sure we continue to handle the old one after we move the writing
side over.

And one final note: since we're now using the shell-quoting as a
semantically meaningful hint, this closes the door to us ever allowing
arbitrary shell quoting, like:

  'a'shell'would'be'ok'with'this'.key=value

But we have never supported that (only what sq_quote() would produce),
and we are probably better off keeping things simple, robust, and
backwards-compatible, than trying to make it easier for humans. We'll
continue not to advertise the format of the variable to users, and
instead keep "git -c" as the recommended mechanism for setting config
(even if we are trying to be kind not to break users who may be relying
on the current undocumented format).

Signed-off-by: Jeff King <redacted>
Signed-off-by: Junio C Hamano <redacted>

t4203: make blame output massaging more robust

In the "git blame --porcelain" output, lines that ends with three
integers may not be the line that shows a commit object with line
numbers and block length (the contents from the blamed file or the
summary field can have a line that happens to match). Also, the
names of the author may have more than three SP separated tokens
("git blame -L242,+1 cf6de18aabf7 Documentation/SubmittingPatches"
gives an example). The existing "grep -E | cut" pipeline is a bit
too loose on these two points.

While they can be assumed on the test data, it is not so hard to
use the right pattern from the documented format, so let's do so.

Signed-off-by: Junio C Hamano <redacted>

mailmap doc: use correct environment variable 'GIT_WORK_TREE'

gitmailmap(5) uses 'GIT_WORK_DIR' to refer to the root of the
repository, but this environment variable does not exist.

Use the correct spelling for that variable, 'GIT_WORK_TREE'.

Signed-off-by: Philippe Blain <redacted>
Signed-off-by: Junio C Hamano <redacted>

ci/install-depends: attempt to fix "brew cask" stuff

We run "git pull" against "$cask_repo"; clarify that we are
expecting not to have any of our own modifications and running "git
pull" to merely update, by passing "--ff-only" on the command line.

Also, the "brew cask install" command line triggers an error message
that says:

    Error: Calling brew cask install is disabled! Use brew install
    [--cask] instead.

In addition, "brew install caskroom/cask/perforce" step triggers an
error that says:

    Error: caskroom/cask was moved. Tap homebrew/cask instead.

Attempt to see if blindly following the suggestion in these error
messages gets us into a better shape.

Signed-off-by: Junio C Hamano <redacted>

for_each_object_in_pack(): clarify pack vs index ordering

We may return objects in one of two orders: how they appear in the .idx
(sorted by object id) or how they appear in the packfile itself. To
further complicate matters, we have two ordering variables, "i" and
"pos", and it is not clear to which order they apply.

Let's clarify this by using an unambiguous name where possible, and
leaving a comment for the variable that does double-duty.

Signed-off-by: Jeff King <redacted>
Acked-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

t4203: stop losing return codes of git commands

In a pipe, only the return code of the last command is used. Thus, all
other commands will have their return codes masked. Rewrite pipes so
that there are no git commands upstream so that their failure is
reported.

Signed-off-by: Denton Liu <redacted>
Acked-by: Ævar Arnfjörð Bjarmason <redacted>
Signed-off-by: Junio C Hamano <redacted>

test-lib-functions.sh: fix usage for test_commit()

The usage comment for test_commit() shows that the --author option
should be given as `--author=<author>`. However, this is incorrect as it
only works when given as `--author <author>`. Correct this erroneous
text.

Also, for the sake of correctness, fix the description as well since we
invoke `git commit` with `--author <author>`, not `--author=<author>`.

Signed-off-by: Denton Liu <redacted>
Acked-by: Ævar Arnfjörð Bjarmason <redacted>
Signed-off-by: Junio C Hamano <redacted>

pack-write: die on error in write_promisor_file()

write_promisor_file() already uses xfopen(), so it would die
if the file cannot be opened for writing. To be consistent
with this behavior and not overlook issues, let's also die if
there are errors when we are actually writing to the file.

Suggested-by: Jeff King <redacted>
Suggested-by: Taylor Blau <redacted>
Signed-off-by: Christian Couder <redacted>
Signed-off-by: Junio C Hamano <redacted>

Merge branch 'en/ort-conflict-handling' into en/merge-ort-perf

* en/ort-conflict-handling:
  merge-ort: add handling for different types of files at same path
  merge-ort: copy find_first_merges() implementation from merge-recursive.c
  merge-ort: implement format_commit()
  merge-ort: copy and adapt merge_submodule() from merge-recursive.c
  merge-ort: copy and adapt merge_3way() from merge-recursive.c
  merge-ort: flesh out implementation of handle_content_merge()
  merge-ort: handle book-keeping around two- and three-way content merge
  merge-ort: implement unique_path() helper
  merge-ort: handle directory/file conflicts that remain
  merge-ort: handle D/F conflict where directory disappears due to merge

Merge branch 'en/diffcore-rename' into en/merge-ort-perf

* en/diffcore-rename:
  diffcore-rename: remove unnecessary duplicate entry checks
  diffcore-rename: accelerate rename_dst setup
  diffcore-rename: simplify and accelerate register_rename_src()
  t4058: explore duplicate tree entry handling in a bit more detail
  t4058: add more tests and documentation for duplicate tree entry handling
  diffcore-rename: reduce jumpiness in progress counters
  diffcore-rename: simplify limit check
  diffcore-rename: avoid usage of global in too_many_rename_candidates()
  diffcore-rename: rename num_create to num_destinations

pack-revindex.c: avoid direct revindex access in 'offset_to_pack_pos()'

To prepare for on-disk reverse indexes, remove a spot in
'offset_to_pack_pos()' that looks at the 'revindex' array in 'struct
packed_git'.

Even though this use of the revindex pointer is within pack-revindex.c,
this clean up is still worth doing. Since the 'revindex' pointer will be
NULL when reading from an on-disk reverse index (instead the
'revindex_data' pointer will be mmaped to the 'pack-*.rev' file), this
call-site would have to include a conditional to lookup the offset for
position 'mi' each iteration through the search.

So instead of open-coding 'pack_pos_to_offset()', call it directly from
within 'offset_to_pack_pos()'.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

pack-revindex: hide the definition of 'revindex_entry'

Now that all spots outside of pack-revindex.c that reference 'struct
revindex_entry' directly have been removed, it is safe to hide the
implementation by moving it from pack-revindex.h to pack-revindex.c.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

pack-revindex: remove unused 'find_revindex_position()'

Now that all 'find_revindex_position()' callers have been removed (and
converted to the more descriptive 'offset_to_pack_pos()'), it is almost
safe to get rid of 'find_revindex_position()' entirely. Almost, except
for the fact that 'offset_to_pack_pos()' calls
'find_revindex_position()'.

Inline 'find_revindex_position()' into 'offset_to_pack_pos()', and
then remove 'find_revindex_position()' entirely.

This is a straightforward refactoring with one minor snag.
'offset_to_pack_pos()' used to load the index before calling
'find_revindex_position()'. That means that by the time
'find_revindex_position()' starts executing, 'p->num_objects' can be
safely read. After inlining, be careful to not read 'p->num_objects'
until _after_ 'load_pack_revindex()' (which loads the index as a
side-effect) has been called.

Another small fix that is included is converting the upper- and
lower-bounds to be unsigned's instead of ints. This dates back to
92e5c77c37 (revindex: export new APIs, 2013-10-24)--ironically, the last
time we introduced new APIs here--but this unifies the types.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

pack-revindex: remove unused 'find_pack_revindex()'

Now that no callers of 'find_pack_revindex()' remain, remove the
function's declaration and implementation entirely.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

builtin/gc.c: guess the size of the revindex

'estimate_repack_memory()' takes into account the amount of memory
required to load the reverse index in memory by multiplying the assumed
number of objects by the size of the 'revindex_entry' struct.

Prepare for hiding the definition of 'struct revindex_entry' by removing
a 'sizeof()' of that type from outside of pack-revindex.c. Instead,
guess that one off_t and one uint32_t are required per object. Strictly
speaking, this is a worse guess than asking for 'sizeof(struct
revindex_entry)' directly, since the true size of this struct is 16
bytes with padding on the end of the struct in order to align the offset
field.

But, this is an approximation anyway, and it does remove a use of the
'struct revindex_entry' from outside of pack-revindex internals.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

for_each_object_in_pack(): convert to new revindex API

Avoid looking at the 'revindex' pointer directly and instead call
'pack_pos_to_index()'.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

unpack_entry(): convert to new revindex API

Remove direct manipulation of the 'struct revindex_entry' type as well
as calls to the deprecated API in 'packfile.c:unpack_entry()'. Usual
clean-up is performed (replacing '->nr' with calls to
'pack_pos_to_index()' and so on).

Add an additional check to make sure that 'obj_offset()' points at a
valid object. In the case this check is violated, we cannot call
'mark_bad_packed_object()' because we don't know the OID. At the top of
the call stack is do_oid_object_info_extended() (via
packed_object_info()), which does mark the object.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

packed_object_info(): convert to new revindex API

Convert another call of 'find_pack_revindex()' to its replacement
'pack_pos_to_offset()'. Likewise:

  - Avoid manipulating `struct packed_git`'s `revindex` pointer directly
    by removing the pointer-as-array indexing.

  - Add an additional guard to check that the offset 'obj_offset()'
    points to a real object. This should be the case with well-behaved
    callers to 'packed_object_info()', but isn't guarenteed.

    Other blocks that fill in various other values from the 'struct
    object_info' request handle bad inputs by setting the type to
    'OBJ_BAD' and jumping to 'out'. Do the same when given a bad offset
    here.

    The previous code would have segfaulted when given a bad
    'obj_offset' value, since 'find_pack_revindex()' would return
    'NULL', and then the line that fills 'oi->disk_sizep' would try to
    access 'NULL[1]' with a stride of 16 bytes (the width of 'struct
    revindex_entry)'.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

retry_bad_packed_offset(): convert to new revindex API

Perform exactly the same conversion as in the previous commit to another
caller within 'packfile.c'.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

get_delta_base_oid(): convert to new revindex API

Replace direct accesses to the 'struct revindex' type with a call to
'pack_pos_to_index()'.

Likewise drop the old-style 'find_pack_revindex()' with its replacement
'offset_to_pack_pos()' (while continuing to perform the same error
checking).

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

rebuild_existing_bitmaps(): convert to new revindex API

Remove another instance of looking at the revindex directly by instead
calling 'pack_pos_to_index()'. Unlike other patches, this caller only
cares about the index position of each object in the loop.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

try_partial_reuse(): convert to new revindex API

Remove another instance of direct revindex manipulation by calling
'pack_pos_to_offset()' instead (the caller here does not care about the
index position of the object at position 'pos').

Note that we cannot just use the existing "offset" variable to store the
value we get from pack_pos_to_offset(). It is incremented by
unpack_object_header(), but we later need the original value. Since
we'll no longer have revindex->offset to read it from, we'll store that
in a separate variable ("header" since it points to the entry's header
bytes).

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

get_size_by_pos(): convert to new revindex API

Remove another caller that holds onto a 'struct revindex_entry' by
replacing the direct indexing with calls to 'pack_pos_to_offset()' and
'pack_pos_to_index()'.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

show_objects_for_type(): convert to new revindex API

Avoid storing the revindex entry directly, since this structure will
soon be removed from the public interface. Instead, store the offset and
index position by calling 'pack_pos_to_offset()' and
'pack_pos_to_index()', respectively.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

bitmap_position_packfile(): convert to new revindex API

Replace find_revindex_position() with its counterpart in the new API,
offset_to_pack_pos().

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

check_object(): convert to new revindex API

Replace direct accesses to the revindex with calls to
'offset_to_pack_pos()' and 'pack_pos_to_index()'.

Since this caller already had some error checking (it can jump to the
'give_up' label if it encounters an error), we can easily check whether
or not the provided offset points to an object in the given pack. This
error checking existed prior to this patch, too, since the caller checks
whether the return value from 'find_pack_revindex()' was NULL or not.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

write_reused_pack_verbatim(): convert to new revindex API

Replace a direct access to the revindex array with
'pack_pos_to_offset()'.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>