git.99rst.org Git - git.git/log

t3311: make test work with SHA-256

Replace the hard-coded SHA-1 constants with the use of test_oid to look
up an appropriate constant for each hash algorithm. In addition, adjust
the fanout checks to look for either zero or one slashes in the filename
without needing to check for an explicit length.

Signed-off-by: brian m. carlson <redacted>
Signed-off-by: Junio C Hamano <redacted>

t3310: make test work with SHA-256

Replace the hard-coded SHA-1 constants with the use of test_oid to look
up an appropriate constant for each hash algorithm.

Signed-off-by: brian m. carlson <redacted>
Signed-off-by: Junio C Hamano <redacted>

t3309: make test work with SHA-256

Replace the hard-coded SHA-1 constants with the use of test_oid to look
up an appropriate constant for each hash algorithm.

Signed-off-by: brian m. carlson <redacted>
Signed-off-by: Junio C Hamano <redacted>

t3308: make test work with SHA-256

Replace the hard-coded SHA-1 constants with the use of test_oid to look
up an appropriate constant for each hash algorithm.

Signed-off-by: brian m. carlson <redacted>
Signed-off-by: Junio C Hamano <redacted>

t3206: make hash size independent

Fix the one assertion in this test that still uses SHA-1 to use test_oid
to be independent of the hash.

Signed-off-by: brian m. carlson <redacted>
Signed-off-by: Junio C Hamano <redacted>

t/lib-pack: support SHA-256

Update the support routines for generating packs to support both SHA-1
and SHA-256. Compute the trailing pack checksum and its length
correctly depending on the algorithm, and look up the object names based
on the algorithm as well. Ensure we initialize the algorithm facts so
that our callers need not do so.

Signed-off-by: brian m. carlson <redacted>
Signed-off-by: Junio C Hamano <redacted>

submodule: add newline on invalid submodule error

Since 'err' contains output for multiple submodules and is printed all
at once by fetch_populated_submodules(), errors for each submodule
should be newline separated for readability. The same strbuf is added to
with a newline in the other half of the conditional where this error is
detected, so make the two consistent.

Signed-off-by: Emily Shaffer <redacted>
Signed-off-by: Junio C Hamano <redacted>

add: change advice config variables used by the add API

advice.addNothing config variable is used to control the visibility of
two advice messages in the add library. This config variable is
replaced by two new variables, whose names are more clear and relevant
to the two cases.

Also add the two new variables to the documentation.

Signed-off-by: Heba Waly <redacted>
Signed-off-by: Junio C Hamano <redacted>

The third batch for 2.26

Signed-off-by: Junio C Hamano <redacted>

Merge branch 'mt/sparse-checkout-doc-update'

Doc update.

* mt/sparse-checkout-doc-update:
completion: add support for sparse-checkout
doc: sparse-checkout: mention --cone option

Merge branch 'pb/recurse-submodule-in-worktree-fix'

The "--recurse-submodules" option of various subcommands did not
work well when run in an alternate worktree, which has been
corrected.

* pb/recurse-submodule-in-worktree-fix:
  submodule.c: use get_git_dir() instead of get_git_common_dir()
  t2405: clarify test descriptions and simplify test
  t2405: use git -C and test_commit -C instead of subshells
  t7410: rename to t2405-worktree-submodule.sh

Merge branch 'es/fetch-show-failed-submodules-atend'

A fetch that is told to recursively fetch updates in submodules
inevitably produces reams of output, and it becomes hard to spot
error messages. The command has been taught to enumerate
submodules that had errors at the end of the operation.

* es/fetch-show-failed-submodules-atend:
fetch: emphasize failure during submodule fetch

Merge branch 'en/fill-directory-fixes-more'

Corner case bugs in "git clean" that stems from a (necessarily for
performance reasons) awkward calling convention in the directory
enumeration API has been corrected.

* en/fill-directory-fixes-more:
  dir: point treat_leading_path() warning to the right place
  dir: restructure in a way to avoid passing around a struct dirent
  dir: treat_leading_path() and read_directory_recursive(), round 2
  clean: demonstrate a bug with pathspecs

Merge branch 'bc/hash-independent-tests-part-7'

Preparation of test scripts for the day when the object names will
use SHA-256 continues.

* bc/hash-independent-tests-part-7:
  t5604: make hash independent
  t5601: switch into repository to hash object
  t5562: use $ZERO_OID
  t5540: make hash size independent
  t5537: make hash size independent
  t5530: compute results based on object length
  t5512: abstract away SHA-1-specific constants
  t5510: make hash size independent
  t5504: make hash algorithm independent
  t5324: make hash size independent
  t5319: make test work with SHA-256
  t5319: change invalid offset for SHA-256 compatibility
  t5318: update for SHA-256
  t4300: abstract away SHA-1-specific constants
  t4204: make hash size independent
  t4202: abstract away SHA-1-specific constants
  t4200: make hash size independent
  t4134: compute appropriate length constant
  t4066: compute index line in diffs
  t4054: make hash-size independent

Merge branch 'km/submodule-add-errmsg'

Improve error message generation for "git submodule add".

* km/submodule-add-errmsg:
submodule add: show 'add --dry-run' stderr when aborting

Merge branch 'am/checkout-file-and-ref-ref-ambiguity'

"git checkout X" did not correctly fail when X is not a local
branch but could name more than one remote-tracking branches
(i.e. to be dwimmed as the starting point to create a corresponding
local branch), which has been corrected.

* am/checkout-file-and-ref-ref-ambiguity:
checkout: don't revert file on ambiguous tracking branches
parse_branchname_arg(): extract part as new function

Merge branch 'js/add-p-leftover-bits'

The final leg of rewriting "add -i/-p" in C.

* js/add-p-leftover-bits:
  ci: include the built-in `git add -i` in the `linux-gcc` job
  built-in add -p: handle Escape sequences more efficiently
  built-in add -p: handle Escape sequences in interactive.singlekey mode
  built-in add -p: respect the `interactive.singlekey` config setting
  terminal: add a new function to read a single keystroke
  terminal: accommodate Git for Windows' default terminal
  terminal: make the code of disable_echo() reusable
  built-in add -p: handle diff.algorithm
  built-in add -p: support interactive.diffFilter
  t3701: adjust difffilter test

Merge branch 'js/patch-mode-in-others-in-c'

The effort to move "git-add--interactive" to C continues.

* js/patch-mode-in-others-in-c:
  commit --interactive: make it work with the built-in `add -i`
  built-in add -p: implement the "worktree" patch modes
  built-in add -p: implement the "checkout" patch modes
  built-in stash: use the built-in `git add -p` if so configured
  legacy stash -p: respect the add.interactive.usebuiltin setting
  built-in add -p: implement the "stash" and "reset" patch modes
  built-in add -p: prepare for patch modes other than "stage"

Merge branch 'dl/test-must-fail-fixes'

Test clean-up.

* dl/test-must-fail-fixes:
  t1507: inline full_name()
  t1507: run commands within test_expect_success
  t1507: stop losing return codes of git commands
  t1501: remove use of `test_might_fail cp`
  t1409: use test_path_is_missing()
  t1409: let sed open its own input file
  t1307: reorder `nongit test_must_fail`
  t1306: convert `test_might_fail rm` to `rm -f`
  t0020: use ! check_packed_refs_marked
  t0020: don't use `test_must_fail has_cr`
  t0003: don't use `test_must_fail attr_check`
  t0003: use test_must_be_empty()
  t0003: use named parameters in attr_check()
  t0000: replace test_must_fail with run_sub_test_lib_test_err()
  t/lib-git-p4: use test_path_is_missing()

parse-options: lose an unnecessary space in an error message

Signed-off-by: Jacques Bodin-Hullin <redacted>
Signed-off-by: Junio C Hamano <redacted>

name-rev: sort tip names before applying

name_ref() is called for each ref and checks if its a better name for
the referenced commit.  If that's the case it remembers it and checks if
a name based on it is better for its ancestors as well.  This in done in
the the order for_each_ref() imposes on us.

That might not be optimal.  If bad names happen to be encountered first
(as defined by is_better_name()), names derived from them may spread to
a lot of commits, only to be replaced by better names later.  Setting
better names first can avoid that.

is_better_name() prefers tags, short distances and old references.  The
distance is a measure that we need to calculate for each candidate
commit, but the other two properties are not dependent on the
relationships of commits.  Sorting the refs by them should yield better
performance than the essentially random order we currently use.

And applying older references first should also help to reduce rework
due to the fact that older commits have less ancestors than newer ones.

So add all details of names to the tip table first, then sort them
to prefer tags and older references and then apply them in this order.
Here's the performance as measures by hyperfine for the Linux repo
before:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     851.1 ms ±   4.5 ms    [User: 806.7 ms, System: 44.4 ms]
  Range (min … max):   845.9 ms … 859.5 ms    10 runs

... and with this patch:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     736.2 ms ±   8.7 ms    [User: 688.4 ms, System: 47.5 ms]
  Range (min … max):   726.0 ms … 755.2 ms    10 runs

Signed-off-by: René Scharfe <redacted>
Signed-off-by: Junio C Hamano <redacted>

name-rev: release unused name strings

name_rev() assigns a name to a commit and its parents and grandparents
and so on.  Commits share their name string with their first parent,
which in turn does the same, recursively to the root.  That saves a lot
of allocations.  When a better name is found, the old name is replaced,
but its memory is not released.  That leakage can become significant.

Can we release these old strings exactly once even though they are
referenced multiple times?  Yes, indeed -- we can make use of the fact
that name_rev() visits the ancestors of a commit after it set a new name
for it and tries to update their names as well.

Members of the first ancestral line have the same taggerdate and
from_tag values, but a higher distance value than their child commit at
generation 0.  These are the only criteria used by is_better_name().
Lower distance values are considered better, so a name that is better
for a child will also be better for its parent and grandparent etc.

That means we can free(3) an inferior name at generation 0 and rely on
name_rev() to replace all references in ancestors as well.

If we do that then we need to stop using the string pointer alone to
distinguish new empty rev_name slots from initialized ones, though, as
it technically becomes invalid after the free(3) call -- even though its
value is still different from NULL.

We can check the generation value first, as empty slots will have it
initialized to 0, and for the actual generation 0 we'll set a new valid
name right after the create_or_update_name() call that releases the
string.

For the Chromium repo, releasing superceded names reduces the memory
footprint of name-rev --all significantly.  Here's the output of GNU
time before:

0.98user 0.48system 0:01.46elapsed 99%CPU (0avgtext+0avgdata 2601812maxresident)k
0inputs+0outputs (0major+571470minor)pagefaults 0swaps

... and with this patch:

1.01user 0.26system 0:01.28elapsed 100%CPU (0avgtext+0avgdata 1559196maxresident)k
0inputs+0outputs (0major+314370minor)pagefaults 0swaps

It also gets faster; hyperfine before:

Benchmark #1: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      1.534 s ±  0.006 s    [User: 1.039 s, System: 0.494 s]
  Range (min … max):    1.522 s …  1.542 s    10 runs

... and with this patch:

Benchmark #1: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      1.338 s ±  0.006 s    [User: 1.047 s, System: 0.291 s]
  Range (min … max):    1.327 s …  1.346 s    10 runs

For the Linux repo it doesn't pay off; memory usage only gets down from:

0.76user 0.03system 0:00.80elapsed 99%CPU (0avgtext+0avgdata 292848maxresident)k
0inputs+0outputs (0major+44579minor)pagefaults 0swaps

... to:

0.78user 0.03system 0:00.81elapsed 100%CPU (0avgtext+0avgdata 284696maxresident)k
0inputs+0outputs (0major+44892minor)pagefaults 0swaps

The runtime actually increases slightly from:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     828.8 ms ±   5.0 ms    [User: 797.2 ms, System: 31.6 ms]
  Range (min … max):   824.1 ms … 838.9 ms    10 runs

... to:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     847.6 ms ±   3.4 ms    [User: 807.9 ms, System: 39.6 ms]
  Range (min … max):   843.4 ms … 854.3 ms    10 runs

Why is that?  In the Chromium repo, ca. 44000 free(3) calls in
create_or_update_name() release almost 1GB, while in the Linux repo
240000+ calls release a bit more than 5MB, so the average discarded
name is ca.  1000x longer in the latter.

Overall I think it's the right tradeoff to make, as it helps curb the
memory usage in repositories with big discarded names, and the added
overhead is small.

Signed-off-by: René Scharfe <redacted>
Signed-off-by: Junio C Hamano <redacted>

name-rev: generate name strings only if they are better

Leave setting the tip_name member of struct rev_name to callers of
create_or_update_name().  This avoids allocations for names that are
rejected by that function.  Here's how this affects the runtime when
working with a fresh clone of Git's own repository; performance numbers
by hyperfine before:

Benchmark #1: ./git -C ../git-pristine/ name-rev --all
  Time (mean ± σ):     437.8 ms ±   4.0 ms    [User: 422.5 ms, System: 15.2 ms]
  Range (min … max):   432.8 ms … 446.3 ms    10 runs

... and with this patch:

Benchmark #1: ./git -C ../git-pristine/ name-rev --all
  Time (mean ± σ):     408.5 ms ±   1.4 ms    [User: 387.2 ms, System: 21.2 ms]
  Range (min … max):   407.1 ms … 411.7 ms    10 runs

Signed-off-by: René Scharfe <redacted>
Signed-off-by: Junio C Hamano <redacted>

name-rev: pre-size buffer in get_parent_name()

We can calculate the size of new name easily and precisely. Open-code
the xstrfmt() calls and grow the buffers as needed before filling them.
This provides a surprisingly large benefit when working with the
Chromium repository; here are the numbers measured using hyperfine
before:

Benchmark #1: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      5.822 s ±  0.013 s    [User: 5.304 s, System: 0.516 s]
  Range (min … max):    5.803 s …  5.837 s    10 runs

... and with this patch:

Benchmark #1: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      1.527 s ±  0.003 s    [User: 1.015 s, System: 0.511 s]
  Range (min … max):    1.524 s …  1.535 s    10 runs

Signed-off-by: René Scharfe <redacted>
Signed-off-by: Junio C Hamano <redacted>

name-rev: factor out get_parent_name()

Reduce nesting by moving code to come up with a name for the parent into
its own function.

Signed-off-by: René Scharfe <redacted>
Signed-off-by: Junio C Hamano <redacted>

name-rev: put struct rev_name into commit slab

The commit slab commit_rev_name contains a pointer to a struct rev_name,
and the actual struct is allocated separatly.  Avoid that allocation and
pointer indirection by storing the full struct in the commit slab.  Use
the tip_name member pointer to determine if the returned struct is
initialized.

Performance in the Linux repository measured with hyperfine before:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     953.5 ms ±   6.3 ms    [User: 901.2 ms, System: 52.1 ms]
  Range (min … max):   945.2 ms … 968.5 ms    10 runs

... and with this patch:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     851.0 ms ±   3.1 ms    [User: 807.4 ms, System: 43.6 ms]
  Range (min … max):   846.7 ms … 857.0 ms    10 runs

Signed-off-by: René Scharfe <redacted>
Signed-off-by: Junio C Hamano <redacted>

name-rev: don't _peek() in create_or_update_name()

Look up the commit slab slot for the commit once using
commit_rev_name_at() and populate it in case it is empty, instead of
checking for emptiness in a separate step using commit_rev_name_peek()
via get_commit_rev_name().

Signed-off-by: René Scharfe <redacted>
Signed-off-by: Junio C Hamano <redacted>

name-rev: don't leak path copy in name_ref()

name_ref() duplicates the path string and passes it to name_rev(), which
either puts it into a commit slab or ignores it if there is already a
better name, leaking it. Move the duplication to name_rev() and release
the copy in the latter case.

Signed-off-by: René Scharfe <redacted>
Signed-off-by: Junio C Hamano <redacted>

name-rev: respect const qualifier

Keep the const qualifier of the first parameter of get_rev_name() even
when casting the object pointer to a commit pointer, and further for the
parameter of get_commit_rev_name(), as all these uses are read-only.

Signed-off-by: René Scharfe <redacted>
Signed-off-by: Junio C Hamano <redacted>

name-rev: remove unused typedef

The type alias became unused with bf43abc6e6 (name-rev: use sizeof(*ptr)
instead of sizeof(type) in allocation, 2019-11-12); remove it.

Signed-off-by: René Scharfe <redacted>
Signed-off-by: Junio C Hamano <redacted>

name-rev: rewrite create_or_update_name()

This code was moved straight out of name_rev(). As such, we inherited
the "goto" to jump from an if into an else-if. We also inherited the
fact that "nothing to do -- return NULL" is handled last.

Rewrite the function to first handle the "nothing to do" case. Then we
can handle the conditional allocation early before going on to populate
the struct. No need for goto-ing.

Signed-off-by: Martin Ågren <redacted>
Signed-off-by: Junio C Hamano <redacted>

index-pack: downgrade twice-resolved REF_DELTA to die()

When we're resolving a REF_DELTA, we compare-and-swap its type from
REF_DELTA to whatever real type the base object has, as discussed in
ab791dd138 (index-pack: fix race condition with duplicate bases,
2014-08-29). If the old type wasn't a REF_DELTA, we consider that a
BUG(). But as discussed in that commit, we might see this case whenever
we try to resolve an object twice, which may happen because we have
multiple copies of the base object.

So this isn't a bug at all, but rather a sign that the input pack is
broken. And indeed, this case is triggered already in t5309.5 and
t5309.6, which create packs with delta cycles and duplicate bases. But
we never noticed because those tests are marked expect_failure.

Those tests were added by b2ef3d9ebb (test index-pack on packs with
recoverable delta cycles, 2013-08-23), which was leaving the door open
for cases that we theoretically _could_ handle. And when we see an
already-resolved object like this, in theory we could keep going after
confirming that the previously resolved child->real_type matches
base->obj->real_type. But:

  - enforcing the "only resolve once" rule here saves us from an
    infinite loop in other parts of the code. If we keep going, then the
    delta cycle in t5309.5 causes us to loop infinitely, as
    find_ref_delta_children() doesn't realize which objects have already
    been resolved. So there would be more changes needed to make this
    case work, and in the meantime we'd be worse off.

  - any pack that triggers this is broken anyway. It either has a
    duplicate base object, or it has a cycle which causes us to bring in
    a duplicate via --fix-thin. In either case, we'd end up rejecting
    the pack in write_idx_file(), which also detects duplicates.

So the tests have little value in documenting what we _could_ be doing
(and have been neglected for 6+ years). Let's switch them to confirming
that we handle this case cleanly (and switch out the BUG() for a more
informative die() so that we do so).

Signed-off-by: Jeff King <redacted>
Signed-off-by: Junio C Hamano <redacted>

notes.c: fix off-by-one error when decreasing notes fanout

As noted in the previous commit, the nature of the fanout heuristic
in the notes code causes the exact point at which we increase or
decrease the notes fanout to vary with the objects being annotated.
Since the object ids generated by the test environment are
deterministic (by design), the notes generated and tested by t3305
are always the same, and we therefore happen to see the same fanout
behavior from one run to the next.

Coincidentally, if we were to change the test environment slightly
(say by making a test commit on an unrelated branch before we start
the t3305 test proper), we not only see the fanout switch happen at
different points, we also manage to trigger a _bug_ in the notes
code where the fanout 1 -> 0 switch is not applied uniformly across
the notes tree, but instead yields a notes tree like this:

  ...
  bdeafb301e44b0e4db0f738a2d2a7beefdb70b70
  bff2d39b4f7122bd4c5caee3de353a774d1e632a
  d3/8ec8f851adf470131178085bfbaab4b12ad2a7
  e0b173960431a3e692ae929736df3c9b73a11d5b
  eb3c3aede523d729990ac25c62a93eb47c21e2e3
  ...

The bug occurs when we are writing out a notes tree with a newly
decreased fanout, and the notes tree contains unexpanded subtrees
that should be consolidated into the parent tree as a consequence of
the decreased fanout):

Subtrees that happen to sit at an _even_ level in the internal notes
16-tree structure (in other words: subtrees whose path - "d3" in the
example above - is unique in the first nibble - i.e. there are no
other note paths that start with "d") are _not_ unpacked as part of
the tree writeout. This error will repeat itself in subsequent note
trees until the subtree is forced to be unpacked. In t3305 this only
happens when the d38ec8f8 note is itself removed from the tree.

The error is not severe (no information is lost, and the notes code
is able to read/decode this tree and manipulate it correctly), but
this is nonetheless a bug in the current implementation that should
be fixed.

That said, fixing the off-by-one error is not without complications:
We must take into account that the load_subtree() call from
for_each_note_helper() (that is now done to correctly unpack the
subtree while we're writing out the notes tree) may end up inserting
unpacked non-notes into the linked list of non_note entries held by
the struct notes_tree. Since we are in the process of writing out the
notes tree, this linked list is currently in the process of being
traversed by write_each_non_note_until(). The unpacked non-notes are
necessarily inserted between the last non-note we wrote out, and the
next non-note to be written. Hence, we cannot simply hold the
next_non_note to write in struct write_each_note_data (as we would
then silently skip these newly inserted notes), but must instead
always follow the ->next pointer from the last non-note we wrote.
(This part was caught by an existing test in t3304.)

Cc: Johannes Schindelin <redacted>
Cc: Brian M. Carlson <redacted>
Signed-off-by: Johan Herland <redacted>
Signed-off-by: Junio C Hamano <redacted>

t3305: check notes fanout more carefully and robustly

In short, before this patch, this test script:
- creates many notes
- verifies that all notes in the notes tree has a fanout of 1
- removes most notes
- verifies that the notes in the notes tree now has a fanout of 0

The fanout verification only happened twice: after creating all the
notes, and after removing most of them.

This patch strengthens the test by checking the fanout after _each_
added/removed note: We assert that the switch from fanout 0 -> 1
happens exactly once while adding notes (and that the switch pervades
the entire notes tree). Likewise, we assert that the switch from
fanout 1 -> 0 happens exactly once while removing notes.

Additionally, we decrease the number of notes left after removal,
from 50 to 15 notes, in order to ensure that fanout 1 -> 0 transition
keeps happening regardless of external factors[1].

[1]: Currently (with the SHA1 hash function and the deterministic
object ids of the test environment) the fanout heuristic in the notes
code happens to switch from 0 -> 1 at 109 notes, and from 1 -> 0 at
59 notes. However, changing the hash function or other external
factors will vary these numbers, and the latter may - in theory - go
as low as 15. For more details, please see the discussion at
https://public-inbox.org/git/20200125230035.136348-4-sandals@crustytoothpaste.net/

Cc: Johannes Schindelin <redacted>
Cc: Brian M. Carlson <redacted>
Signed-off-by: Johan Herland <redacted>
Signed-off-by: Junio C Hamano <redacted>

git-filter-branch.txt: wrap "maths" notation in backticks

In this paragraph, we have a few instances of the '^' character, which
we give as "\^". This renders well with AsciiDoc ("^"), but Asciidoctor
renders it literally as "\^". Dropping the backslashes renders fine
with Asciidoctor, but not AsciiDoc...

An earlier version of this patch used "{caret}" instead of "^", which
avoided these escaping problems. The rendering was still so-so, though
-- these expressions end up set as normal text, similarly to when one
provides, e.g., computer code in the middle of running text, without
properly marking it with `backticks` to be monospaced.

As noted by Jeff King, this suggests actually wrapping these
expressions in backticks, setting them in monospace.

The lone "5" could be left as is or wrapped as `5`. Spell it out as
"five" instead -- this generally looks better anyway for small numbers
in the middle of text like this.

Suggested-by: Jeff King <redacted>
Signed-off-by: Martin Ågren <redacted>
Signed-off-by: Junio C Hamano <redacted>

commit-graph.h: use odb in 'load_commit_graph_one_fd_st'

Apply a similar treatment as in the previous patch to pass a 'struct
object_directory *' through the 'load_commit_graph_one_fd_st'
initializer, too.

This prevents a potential bug where a pointer comparison is made to a
NULL 'g->odb', which would cause the commit-graph machinery to think
that a pair of commit-graphs belonged to different alternates when in
fact they do not (i.e., in the case of no '--object-dir').

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

commit-graph.c: remove path normalization, comparison

As of the previous patch, all calls to 'commit-graph.c' functions which
perform path normalization (for e.g., 'get_commit_graph_filename()') are
of the form 'ctx->odb->path', which is always in normalized form.

Now that there are no callers passing non-normalized paths to these
functions, ensure that future callers are bound by the same restrictions
by making these functions take a 'struct object_directory *' instead of
a 'const char *'. To match, replace all calls with arguments of the form
'ctx->odb->path' with 'ctx->odb' To recover the path, functions that
perform path manipulation simply use 'odb->path'.

Further, avoid string comparisons with arguments of the form
'odb->path', and instead prefer raw pointer comparisons, which
accomplish the same effect, but are far less brittle.

This has a pleasant side-effect of making these functions much more
robust to paths that cannot be normalized by 'normalize_path_copy()',
i.e., because they are outside of the current working directory.

For example, prior to this patch, Valgrind reports that the following
uninitialized memory read [1]:

  $ ( cd t && GIT_DIR=../.git valgrind git rev-parse HEAD^ )

because 'normalize_path_copy()' can't normalize '../.git' (since it's
relative to but above of the current working directory) [2].

By using a 'struct object_directory *' directly,
'get_commit_graph_filename()' does not need to normalize, because all
paths are relative to the current working directory since they are
always read from the '->path' of an object directory.

[1]: https://lore.kernel.org/git/20191027042116.GA5801@sigill.intra.peff.net.
[2]: The bug here is that 'get_commit_graph_filename()' returns the
     result of 'normalize_path_copy()' without checking the return
     value.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

commit-graph.h: store object directory in 'struct commit_graph'

In a previous patch, the 'char *object_dir' in 'struct commit_graph' was
replaced with a 'struct object_directory'. This patch applies the same
treatment to 'struct commit_graph', which is another intermediate step
towards getting rid of all path normalization in 'commit-graph.c'.

Instead of taking a 'char *object_dir', functions that construct a
'struct commit_graph' now take a 'struct object_directory *'. Any code
that needs an object directory path use '->path' instead.

This ensures that all calls to functions that perform path normalization
are given arguments which do not themselves require normalization. This
prepares those functions to drop their normalization entirely, which
will occur in the subsequent patch.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

commit-graph.h: store an odb in 'struct write_commit_graph_context'

There are lots of places in 'commit-graph.h' where a function either has
(or almost has) a full 'struct object_directory *', accesses '->path',
and then throws away the rest of the struct.

This can cause headaches when comparing the locations of object
directories across alternates (e.g., in the case of deciding if two
commit-graph layers can be merged). These paths are normalized with
'normalize_path_copy()' which mitigates some comparison issues, but not
all [1].

Replace usage of 'char *object_dir' with 'odb->path' by storing a
'struct object_directory *' in the 'write_commit_graph_context'
structure. This is an intermediate step towards getting rid of all path
normalization in 'commit-graph.c'.

Resolving a user-provided '--object-dir' argument now requires that we
compare it to the known alternates for equality. Prior to this patch,
an unknown '--object-dir' argument would silently exit with status zero.

This can clearly lead to unintended behavior, such as verifying
commit-graphs that aren't in a repository's own object store (or one of
its alternates), or causing a typo to mask a legitimate commit-graph
verification failure. Make this error non-silent by 'die()'-ing when the
given '--object-dir' does not match any known alternate object store.

[1]: In my testing, for example, I can get one side of the commit-graph
code to fill object_dir with "./objects" and the other with just
"objects".

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

t7400: testcase for submodule status on unregistered inner git repos

We have test coverage for "git submodule status" output in
various cases, i.e.

  1) not-init, not-cloned: status should initially be "missing"
  2) init, not-cloned: status should be "missing"
  3) not-init, cloned: status should ignore the inner git-repo
  4) init, cloned: status should be "up-to-date" after update
  4.1) + modified: status should be "modified" after submodule commit
  4.2) + modified, committed: status should be "up-to-date" after update

the case 3) is not covered yet.

Test that submodule status reports an inner git repo as unknown, while
it is not added to the superproject.  This covers case (3).

Signed-off-by: Peter Kaestle <redacted>
Signed-off-by: Junio C Hamano <redacted>

tree-walk.c: break circular dependency with unpack-trees

The unpack-trees API depends on the tree-walk API. But we've recently
introduced a dependency in tree-walk.c on MAX_UNPACK_TREES, which
doesn't otherwise care about unpack-trees at all.

Let's break that dependency by reversing the constants: we'll introduce
a new MAX_TRAVERSE_TREES which belongs to the tree-walk API. And then we
can define MAX_UNPACK_TREES in terms of that (since unpack-trees cannot
possibly work with more trees than it can traverse at once via
tree-walk).

The value for both will remain at 8. This is somewhat arbitrary and
probably more than is necessary, per ca885a4fe6 (read-tree() and
unpack_trees(): use consistent limit, 2008-03-13), but there's not
really any pressing need to reduce it.

Suggested-by: Elijah Newren <redacted>
Signed-off-by: Jeff King <redacted>
Acked-by: Elijah Newren <redacted>
Signed-off-by: Junio C Hamano <redacted>

sparse-checkout: fix cone mode behavior mismatch

The intention of the special "cone mode" in the sparse-checkout
feature is to always match the same patterns that are matched by the
same sparse-checkout file as when cone mode is disabled.

When a file path is given to "git sparse-checkout set" in cone mode,
then the cone mode improperly matches the file as a recursive path.
When setting the skip-worktree bits, files were not expecting the
MATCHED_RECURSIVE response, and hence these were left out of the
matched cone.

Fix this bug by checking for MATCHED_RECURSIVE in addition to MATCHED
and add a test that prevents regression.

Reported-by: Finn Bryant <redacted>
Signed-off-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

sparse-checkout: improve docs around 'set' in cone mode

The existing documentation does not clarify how the 'set' subcommand
changes when core.sparseCheckoutCone is enabled. Correct this by
changing some language around the "A/B/C" example. Also include a
description of the input format matching the output of 'git ls-tree
--name-only'.

Helped-by: Jeff King <redacted>
Signed-off-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

sparse-checkout: escape all glob characters on write

The sparse-checkout patterns allow special globs according to
fnmatch(3). When writing cone-mode patterns for paths containing
these characters, they must be escaped.

Use is_glob_special() to check which characters must be escaped
this way, and add a path to the tests that contains all glob
characters at once. Note that ']' is not special, since the
initial bracket '[' is escaped.

Reported-by: Jeff King <redacted>
Signed-off-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

sparse-checkout: use C-style quotes in 'list' subcommand

When in cone mode, the 'git sparse-checkout list' subcommand lists
the directories included in the sparse cone. When these directories
contain odd characters, such as a backslash, then we need to use
C-style quotes similar to 'git ls-tree'.

Signed-off-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

sparse-checkout: unquote C-style strings over --stdin

If a user somehow creates a directory with an asterisk (*) or backslash
(\), then the "git sparse-checkout set" command will struggle to provide
the correct pattern in the sparse-checkout file. When not in cone mode,
the provided pattern is written directly into the sparse-checkout file.
However, in cone mode we expect a list of paths to directories and then
we convert those into patterns.

Even more specifically, the goal is to always allow the following from
the root of a repo:

git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin

The ls-tree command provides directory names with an unescaped asterisk.
It also quotes the directories that contain an escaped backslash. We
must remove these quotes, then keep the escaped backslashes.

Use unquote_c_style() when parsing lines from stdin. Command-line
arguments will be parsed as-is, assuming the user can do the correct
level of escaping from their environment to match the exact directory
names.

Signed-off-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

sparse-checkout: write escaped patterns in cone mode

If a user somehow creates a directory with an asterisk (*) or backslash
(\), then the "git sparse-checkout set" command will struggle to provide
the correct pattern in the sparse-checkout file. When not in cone mode,
the provided pattern is written directly into the sparse-checkout file.
However, in cone mode we expect a list of paths to directories and then
we convert those into patterns.

However, there is some care needed for the timing of these escapes. The
in-memory pattern list is used to update the working directory before
writing the patterns to disk. Thus, we need the command to have the
unescaped names in the hashsets for the cone comparisons, then escape
the patterns later.

Signed-off-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

sparse-checkout: properly match escaped characters

In cone mode, the sparse-checkout feature uses hashset containment
queries to match paths. Make this algorithm respect escaped asterisk
(*) and backslash (\) characters.

Create dup_and_filter_pattern() method to convert a pattern by
removing escape characters and dropping an optional "/*" at the end.
This method is available in dir.h as we will use it in
builtin/sparse-checkout.c in a later change.

Signed-off-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

sparse-checkout: warn on globs in cone patterns

In cone mode, the sparse-checkout commmand will write patterns that
allow faster pattern matching. This matching only works if the patterns
in the sparse-checkout file are those written by that command. Users
can edit the sparse-checkout file and create patterns that cause the
cone mode matching to fail.

The cone mode patterns may end in "/*" but otherwise an un-escaped
asterisk or other glob character is invalid. Add checks to disable
cone mode when seeing these values.

A later change will properly handle escaped globs.

Signed-off-by: Derrick Stolee <redacted>
Signed-off-by: Junio C Hamano <redacted>

C: use skip_prefix() to avoid hardcoded string length

We often skip an optional prefix in a string with a hardcoded
constant, e.g.

if (starts_with(string, "prefix"))
string += 6;

which is less error prone when written

skip_prefix(string, "prefix", &string);

Note that this changes a few error messages from "git reflog expire
--expire=nonsense.timestamp", which used to complain by saying

'--expire=nonsense.timestamp' is not a valid timestamp

but with this change, we say

'nonsense.timestamp' is not a valid timestamp

which is more technically correct (the string with --expire= as
a prefix obviously cannot be a valid timestamp, but the error is
about the part of the input without that prefix).

Helped-by: Jeff King <redacted>
Signed-off-by: Junio C Hamano <redacted>

submodule foreach: replace $path with $sm_path in example

f0fd0dc5c5 (submodule foreach: document '$sm_path' instead of '$path',
2018-05-08) updated the documentation to advise callers to favor
$sm_path over the deprecated synonym $path. However, the example in
that section still uses $path. Update it to use $sm_path.

Signed-off-by: Kyle Meyer <redacted>
Signed-off-by: Junio C Hamano <redacted>

t5318: don't pass non-object directory to '--object-dir'

In f237c8b6fe (commit-graph: implement git-commit-graph write,
2018-04-02) the test t5318.3 was introduced to ensure that calling 'git
commit-graph write' in a repository with no packfiles does not write any
commit-graph file(s).

To exercise more paths in 'builtin/commit-graph.c', this test passes
'--object-dir' to 'git commit-graph write', but the given argument
refers to the working copy, not the object directory.

Since the commit-graph sub-commands currently swallow these errors, this
does not result in a test failure. But, it is only lucky that the test
ends with no commit-graphs, since there were none to begin with.

In preparation for a future commit where an '--object-dir' argument that
does not match a known object directory will print out a failure, let's
fix the test to still use '--object-dir', but pass the correct location
to the object store instead of '.'.

Signed-off-by: Taylor Blau <redacted>
Signed-off-by: Junio C Hamano <redacted>

diff: move diff.wsErrorHighlight to "basic" config

We parse diff.wsErrorHighlight in git_diff_ui_config(), meaning that it
doesn't take effect for plumbing commands, only for porcelains like
git-diff itself. This is mildly annoying as it means scripts like
add--interactive, which produce a user-visible diff with color, don't
respect the option.

We could teach that script to parse the config and pass it along as
--ws-error-highlight to the diff plumbing. But there's a simpler
solution.

It should be reasonably safe for plumbing to respect this option, as it
only kicks in when color is otherwise enabled. And anybody parsing
colorized output must already deal with the fact that color.diff.* may
change the exact output they see; those options have been part of
git_diff_basic_config() since its inception in 9a1805a872 (add a "basic"
diff config callback, 2008-01-04).

So we can just move it to the "basic" config, which fixes
add--interactive, along with any other script in the same boat, with a
very low risk of hurting any plumbing users.

Signed-off-by: Jeff King <redacted>
Signed-off-by: Junio C Hamano <redacted>

sha1-file: allow check_object_signature() to handle any repo

Some callers of check_object_signature() can work on arbitrary
repositories, but the repo does not get passed to this function.
Instead, the_repository is always used internally. To fix possible
inconsistencies, allow the function to receive a struct repository and
make those callers pass on the repo being handled.

Signed-off-by: Matheus Tavares <redacted>
Signed-off-by: Junio C Hamano <redacted>

sha1-file: pass git_hash_algo to hash_object_file()

Allow hash_object_file() to work on arbitrary repos by introducing a
git_hash_algo parameter. Change callers which have a struct repository
pointer in their scope to pass on the git_hash_algo from the said repo.
For all other callers, pass on the_hash_algo, which was already being
used internally at hash_object_file(). This functionality will be used
in the following patch to make check_object_signature() be able to work
on arbitrary repos (which, in turn, will be used to fix an
inconsistency at object.c:parse_object()).

Signed-off-by: Matheus Tavares <redacted>
Signed-off-by: Junio C Hamano <redacted>

sha1-file: pass git_hash_algo to write_object_file_prepare()

Allow write_object_file_prepare() to receive arbitrary 'struct
git_hash_algo's instead of always using the_hash_algo. The added
parameter will be used in the next commit to make hash_object_file() be
able to work with arbitrary git_hash_algo's, as well.

Signed-off-by: Matheus Tavares <redacted>
Signed-off-by: Junio C Hamano <redacted>

streaming: allow open_istream() to handle any repo

Some callers of open_istream() at archive-tar.c and archive-zip.c are
capable of working on arbitrary repositories but the repo struct is not
passed down to open_istream(), which uses the_repository internally. For
now, that's not a problem since the said callers are only being called
with the_repository. But to be consistent and avoid future problems,
let's allow open_istream() to receive a struct repository and use that
instead of the_repository. This parameter addition will also be used in
a future patch to make sha1-file.c:check_object_signature() be able to
work on arbitrary repos.

Signed-off-by: Matheus Tavares <redacted>
Signed-off-by: Junio C Hamano <redacted>

pack-check: use given repo's hash_algo at verify_packfile()

At verify_packfile(), use the git_hash_algo from the provided repository
instead of the_hash_algo, for consistency. Like the previous patch, this
shouldn't bring any behavior changes, since this function is currently
only receiving the_repository.

Signed-off-by: Matheus Tavares <redacted>
Signed-off-by: Junio C Hamano <redacted>

cache-tree: use given repo's hash_algo at verify_one()

verify_one() takes a struct repository argument but uses the_hash_algo
internally. Replace it with the provided repo's git_hash_algo, for
consistency. For now, this is mainly a cosmetic change, as all callers
of this function currently only pass the_repository to it.

Signed-off-by: Matheus Tavares <redacted>
Signed-off-by: Junio C Hamano <redacted>

diff: make diff_populate_filespec() honor its repo argument

diff_populate_filespec() takes a struct repository argument but it
doesn't get passed down to read_object_file().

Signed-off-by: Matheus Tavares <redacted>
Signed-off-by: Junio C Hamano <redacted>

Sync with maint

* maint:
.mailmap: map Yi-Jyun Pan's email

The second batch

Signed-off-by: Junio C Hamano <redacted>

Merge branch 'bc/misconception-doc'

Doc updates.

* bc/misconception-doc:
docs: mention when increasing http.postBuffer is valuable
doc: dissuade users from trying to ignore tracked files

Merge branch 'bc/author-committer-doc'

Clarify documentation on committer/author identities.

* bc/author-committer-doc:
  doc: provide guidance on user.name format
  docs: expand on possible and recommended user config options
  doc: move author and committer information to git-commit(1)

Merge branch 'ss/t6025-modernize'

Test style updates.

* ss/t6025-modernize:
t6025: use helpers to replace test -f <path>
t6025: modernize style

Merge branch 'lh/bool-to-type-bool'

Replace "git config --bool" calls with "git config --type=bool" in
sample templates.

* lh/bool-to-type-bool:
templates: fix deprecated type option `--bool`

Merge branch 'ds/refmap-doc'

"git fetch --refmap=" option has got a better documentation.

* ds/refmap-doc:
fetch: document and test --refmap=""

Merge branch 'bc/actualmente'

Doc grammo fix.

* bc/actualmente:
docs: use "currently" for the present time

Merge branch 'rt/submodule-i18n'

Comments update.

* rt/submodule-i18n:
submodule.c: mark more strings for translation

Merge branch 'js/builtin-add-i-cmds'

Minor bugfixes to "git add -i" that has recently been rewritten in C.

* js/builtin-add-i-cmds:
built-in add -i: accept open-ended ranges again
built-in add -i: do not try to `patch`/`diff` an empty list of files

Merge branch 'jk/test-fixes'

Test fixes.

* jk/test-fixes:
t7800: don't rely on reuse_worktree_file()
t4018: drop "debugging" cat from hunk-header tests

Merge branch 'jk/asan-build-fix'

Work around test breakages caused by custom regex engine used in
libasan, when address sanitizer is used with more recent versions
of gcc and clang.

* jk/asan-build-fix:
Makefile: use compat regex with SANITIZE=address

Merge branch 'sg/completion-worktree'

The command line completion (in contrib/) learned to complete
subcommands and arguments to "git worktree".

* sg/completion-worktree:
  completion: list paths and refs for 'git worktree add'
  completion: list existing working trees for 'git worktree' subcommands
  completion: simplify completing 'git worktree' subcommands and options
  completion: return the index of found word from __git_find_on_cmdline()
  completion: clean up the __git_find_on_cmdline() helper function
  t9902-completion: add tests for the __git_find_on_cmdline() helper

Merge branch 'jn/test-lint-one-shot-export-to-shell-function'

The test-lint machinery knew to check "VAR=VAL shell_function"
construct, but did not check "VAR= shell_funciton", which has been
corrected.

* jn/test-lint-one-shot-export-to-shell-function:
  fetch test: mark test of "skipping" haves as v0-only
  t/check-non-portable-shell: detect "FOO= shell_func", too
  fetch test: avoid use of "VAR= cmd" with a shell function

Merge branch 'hi/gpg-mintrustlevel'

gpg.minTrustLevel configuration variable has been introduced to
tell various signature verification codepaths the required minimum
trust level.

* hi/gpg-mintrustlevel:
gpg-interface: add minTrustLevel as a configuration option

Merge branch 'am/test-pathspec-f-f-error-cases'

More tests.

* am/test-pathspec-f-f-error-cases:
t: add tests for error conditions with --pathspec-from-file

Merge branch 'ds/graph-horizontal-edges'

Rendering by "git log --graph" of ancestry lines leading to a merge
commit were made suboptimal to waste vertical space a bit with a
recent update, which has been corrected.

* ds/graph-horizontal-edges:
graph: fix collapse of multiple edges
graph: add test to demonstrate horizontal line bug

Merge branch 'am/update-pathspec-f-f-tests'

Test updates.

* am/update-pathspec-f-f-tests:
t: directly test parse_pathspec_file()
t: fix quotes tests for --pathspec-from-file

Merge branch 'ds/sparse-cone'

The code recently added in this release to move to the entry beyond
the ones in the same directory in the index in the sparse-cone mode
did not count the number of entries to skip over incorrectly, which
has been corrected.

* ds/sparse-cone:
.mailmap: fix GGG authoship screwup
unpack-trees: correctly compute result count

Merge branch 'hi/indent-text-with-tabs-in-editorconfig'

Tell .editorconfig that in this project, *.txt files are indented
with tabs.

* hi/indent-text-with-tabs-in-editorconfig:
editorconfig: indent text files with tabs

traverse_trees(): use stack array for name entries

We heap-allocate our arrays of name_entry structs, etc, with one entry
per tree we're asked to traverse. The code does a raw multiplication in
the xmalloc() call, which I find when auditing for integer overflows
during allocation.

We could "fix" this by using ALLOC_ARRAY() instead. But as it turns out,
the maximum size of these arrays is limited at compile time:

  - merge_trees() always passes in 3 trees

  - unpack_trees() and its brethren never pass in more than
    MAX_UNPACK_TREES

So we can simplify even further by just using a stack array and bounding
it with MAX_UNPACK_TREES. There should be no concern with overflowing
the stack, since MAX_UNPACK_TREES is only 8 and the structs themselves
are small.

Note that since we're replacing xcalloc(), we have to move one of the
NULL initializations into a loop.

Signed-off-by: Jeff King <redacted>
Signed-off-by: Junio C Hamano <redacted>

walker_fetch(): avoid raw array length computation

We compute the length of an array of object_id's with a raw
multiplication. In theory this could trigger an integer overflow which
would cause an under-allocation (and eventually an out of bounds write).

I doubt this can be triggered in practice, since you'd need to feed it
an enormous number of target objects, which would typically come from
the ref advertisement and be using proportional memory. And even on
64-bit systems, where "int" is much smaller than "size_t", that should
hold: even though "targets" is an int, the multiplication will be done
as a size_t because of the use of sizeof().

But we can easily fix it by using ALLOC_ARRAY(), which uses st_mult()
under the hood.

Signed-off-by: Jeff King <redacted>
Signed-off-by: Junio C Hamano <redacted>

normalize_path_copy(): document "dst" size expectations

We take a "dst" buffer to write into, but there's no matching "len"
parameter. The hidden assumption is that normalizing always makes things
smaller, so we're OK as long as "dst" is at least as big as "src". Let's
document that explicitly.

Signed-off-by: Jeff King <redacted>
Signed-off-by: Junio C Hamano <redacted>

git-p4: avoid leak of file handle when cloning

Spotted by Eric Sunshine:

https://public-inbox.org/git/CAPig+cRx3hG64nuDie69o_gdX39F=sR6D8LyA7J1rCErgu0aMA@mail.gmail.com/

Signed-off-by: Luke Diamand <redacted>
Signed-off-by: Junio C Hamano <redacted>

git-p4: check for access to remote host earlier

Check we can talk to the remote host before starting the git-fastimport
subchild.

Otherwise we fail to connect, and then exit, leaving git-fastimport
still running since we did not wait() for it.

Signed-off-by: Luke Diamand <redacted>
Signed-off-by: Junio C Hamano <redacted>

git-p4: cleanup better on error exit

After an error, git-p4 calls die(). This just exits, and leaves child
processes still running.

Instead of calling die(), raise an exception and catch it where the
child process(es) (git-fastimport) are created.

This was analyzed in detail here:

https://public-inbox.org/git/20190227094926.GE19739@szeder.dev/

This change does not address the particular issue of p4CmdList()
invoking a subchild and not waiting for it on error.

Signed-off-by: Luke Diamand <redacted>
Signed-off-by: Junio C Hamano <redacted>

git-p4: create helper function importRevisions()

This makes it easier to try/catch around this block of code to ensure
cleanup following p4 failures is handled properly.

Signed-off-by: Luke Diamand <redacted>
Signed-off-by: Junio C Hamano <redacted>

git-p4: disable some pylint warnings, to get pylint output to something manageable

pylint is incredibly useful for finding bugs, but git-p4 has never used
it, so there are a lot of warnings that while important, don't actually
result in bugs.

Let's turn those off for now, so we can get some useful output.

Signed-off-by: Luke Diamand <redacted>
Signed-off-by: Junio C Hamano <redacted>

git-p4: add P4CommandException to report errors talking to Perforce

Currently when there is a P4 error, git-p4 calls die() which just exits.

This then leaves the git-fast-import process still running, and can even
leave p4 itself still running.

As a result, git-p4 fails to exit cleanly. This is a particular problem
for people running the unit tests in regression.

Use this exception to report errors upwards, cleaning up as the error
propagates.

Signed-off-by: Luke Diamand <redacted>
Signed-off-by: Junio C Hamano <redacted>

git-p4: make closeStreams() idempotent

Ensure that we can safely call self.closeStreams() multiple times, and
can also call it even if there is no git fast-import stream at all.

Signed-off-by: Luke Diamand <redacted>
Signed-off-by: Junio C Hamano <redacted>

sha1-name: mark get_oid() error messages for translation

There are several error messages in get_oid() and its children that are
clearly intended for humans, but aren't marked for translation. E.g.:

  $ git show :1:foo
  fatal: Path 'foo' is in the index, but not at stage 1.
  Did you mean ':0:foo'?

Let's mark these for translation. While we're at it, let's switch the
style to be more like our usual error messages: start with a lowercase
letter and omit a period at the end of the line.

This does mean that multi-line messages like the one above don't have
any punctuation between the two sentences. I solved that by adding a
"hint" marker like we'd see from advise(). So the result is:

  $ git show :1:foo
  fatal: path 'foo' is in the index, but not at stage 1
  hint: Did you mean ':0:foo'?

A few tests had to be switched to test_i18ngrep and test_i18ncmp. Since
we were touching them anyway, I also simplified the ones using i18ngrep
a bit for readability.

Signed-off-by: Jeff King <redacted>
Signed-off-by: Junio C Hamano <redacted>

fetch: forgo full connectivity check if --filter

If a filter is specified, we do not need a full connectivity check on
the contents of the packfile we just fetched; we only need to check that
the objects referenced are promisor objects.

This significantly speeds up fetches into repositories that have many
promisor objects, because during the connectivity check, all promisor
objects are enumerated (to mark them UNINTERESTING), and that takes a
significant amount of time.

Signed-off-by: Jonathan Tan <redacted>
Reviewed-by: Jonathan Nieder <redacted>
Signed-off-by: Junio C Hamano <redacted>

connected: verify promisor-ness of partial clone

Commit dfa33a298d ("clone: do faster object check for partial clones",
2019-04-21) optimized the connectivity check done when cloning with
--filter to check only the existence of objects directly pointed to by
refs. But this is not sufficient: they also need to be promisor objects.
Make this check more robust by instead checking that these objects are
promisor objects, that is, they appear in a promisor pack.

Signed-off-by: Jonathan Tan <redacted>
Reviewed-by: Jonathan Nieder <redacted>
Signed-off-by: Junio C Hamano <redacted>

git: update documentation for --git-dir

git --git-dir <path> is a bit confusing and sometimes doesn't work as
the user would expect it to.

For example, if the user runs `git --git-dir=<path> status`, git
will skip the repository discovery algorithm and will assign the
work tree to the user's current work directory unless otherwise
specified. When this assignment is wrong, the output will not match
the user's expectations.

This patch updates the documentation to make it clearer.

Signed-off-by: Heba Waly <redacted>
Helped-by: Junio C Hamano <redacted>
Signed-off-by: Junio C Hamano <redacted>

.mailmap: map Yi-Jyun Pan's email

In 13185fd241 (l10n: zh_TW.po: update translation for v2.25.0 round 1,
2019-12-31), the author mistakenly used their GitHub username for
authorship information instead of their real name. However, a commit
with their real name exists prior to this: 9917eca794 (l10n: zh_TW: add
translation for v2.24.0, 2019-11-20).

Map their email to their real name so that these contributions can be
counted together.

Signed-off-by: Denton Liu <redacted>
Signed-off-by: Junio C Hamano <redacted>

grep: ignore --recurse-submodules if --no-index is given

Since grep learned to recurse into submodules in 0281e487fd
(grep: optionally recurse into submodules, 2016-12-16),
using --recurse-submodules along with --no-index makes Git
die().

This is unfortunate because if submodule.recurse is set in a user's
~/.gitconfig, invoking `git grep --no-index` either inside or outside
a Git repository results in

fatal: option not supported with --recurse-submodules

Let's allow using these options together, so that setting submodule.recurse
globally does not prevent using `git grep --no-index`.

Using `--recurse-submodules` should not have any effect if `--no-index`
is used inside a repository, as Git will recurse into the checked out
submodule directories just like into regular directories.

Helped-by: Junio C Hamano <redacted>
Signed-off-by: Philippe Blain <redacted>
Signed-off-by: Junio C Hamano <redacted>

doc: drop "explicitly given" from push.default description

The documentation for push.default mentions that it is used if no
refspec is "explicitly given". Let's drop the notion of "explicit" here,
since it's vague, and just mention that any refspec from anywhere is
sufficient to override this.

I've dropped the mention of "explicitly given" from the definition of
the "nothing" value right below, too. It's close enough to our
clarification that it should be obvious we mean the same type of "given"
here.

Signed-off-by: Jeff King <redacted>
Signed-off-by: Junio C Hamano <redacted>

obstack: avoid computing offsets from NULL pointer

As with the previous two commits, UBSan with clang-11 complains about
computing offsets from a NULL pointer. The failures in t4013 (and
elsewhere) look like this:

  kwset.c:102:23: runtime error: applying non-zero offset 107820859019600 to null pointer
  ...
  not ok 79 - git log -SF master # magic is (not used)

That line is not enlightening:

  ... = obstack_alloc(&kwset->obstack, sizeof (struct trie));

because obstack is implemented almost entirely in macros, and the actual
problem is five macros deep (I temporarily converted them to inline
functions to get better compiler errors, which was tedious but worked
reasonably well).

The actual problem is in these pointer-alignment macros:

  /* If B is the base of an object addressed by P, return the result of
     aligning P to the next multiple of A + 1.  B and P must be of type
     char *.  A + 1 must be a power of 2.  */

  #define __BPTR_ALIGN(B, P, A) ((B) + (((P) - (B) + (A)) & ~(A)))

  /* Similar to _BPTR_ALIGN (B, P, A), except optimize the common case
     where pointers can be converted to integers, aligned as integers,
     and converted back again.  If PTR_INT_TYPE is narrower than a
     pointer (e.g., the AS/400), play it safe and compute the alignment
     relative to B.  Otherwise, use the faster strategy of computing the
     alignment relative to 0.  */

  #define __PTR_ALIGN(B, P, A)                                                \
    __BPTR_ALIGN (sizeof (PTR_INT_TYPE) < sizeof (void *) ? (B) : (char *) 0, \
                  P, A)

If we have a sufficiently-large integer pointer type, then we do the
computation using a NULL pointer constant. That turns __BPTR_ALIGN()
into something like:

  NULL + (P - NULL + A) & ~A

and UBSan is complaining about adding the full value of P to that
initial NULL. We can fix this by doing our math as an integer type, and
then casting the result back to a pointer. The problem case only happens
when we know that the integer type is large enough, so there should be
no issue with truncation.

Another option would be just simplify out all the 0's from
__BPTR_ALIGN() for the NULL-pointer case. That probably wouldn't work
for a platform where the NULL pointer isn't all-zeroes, but Git already
wouldn't work on such a platform (due to our use of memset to set
pointers in structs to NULL). But I tried here to keep as close to the
original as possible.

Signed-off-by: Jeff King <redacted>
Signed-off-by: Junio C Hamano <redacted>

xdiff: avoid computing non-zero offset from NULL pointer

As with the previous commit, clang-11's UBSan complains about computing
offsets from a NULL pointer, causing some tests to fail. In this case,
though, we're actually computing a non-zero offset, which is even more
dubious. From t7810:

  xdiff-interface.c:268:14: runtime error: applying non-zero offset 1 to null pointer
  ...
  not ok 131 - grep -p with userdiff

The problem is our parsing of the funcname config. We count the number
of lines in the string, allocate an array, and then loop over our
allocated entries, parsing each line and moving our cursor to one past
the trailing newline for the next iteration.

But the final line will not generally have a trailing newline (since
it's a config value), and hence we go to one past NULL. In practice this
is OK, since our loop should terminate before we look at the value. But
even computing such an invalid pointer technically violates the
standard.

We can fix it by leaving the pointer at NULL if we're at the end, rather
than one-past. And while we're thinking about it, we can also document
the variant by asserting that our initial line-count matches the
second-pass of parsing.

Signed-off-by: Jeff King <redacted>
Signed-off-by: Junio C Hamano <redacted>

avoid computing zero offsets from NULL pointer

The Undefined Behavior Sanitizer in clang-11 seems to have learned a new
trick: it complains about computing offsets from a NULL pointer, even if
that offset is 0. This causes numerous test failures. For example, from
t1090:

  unpack-trees.c:1355:41: runtime error: applying zero offset to null pointer
  ...
  not ok 6 - in partial clone, sparse checkout only fetches needed blobs

The code in question looks like this:

  struct cache_entry **cache_end = cache + nr;
  ...
  while (cache != cache_end)

and we sometimes pass in a NULL and 0 for "cache" and "nr". This is
conceptually fine, as "cache_end" would be equal to "cache" in this
case, and we wouldn't enter the loop at all. But computing even a zero
offset violates the C standard. And given the fact that UBSan is
noticing this behavior, this might be a potential problem spot if the
compiler starts making unexpected assumptions based on undefined
behavior.

So let's just avoid it, which is pretty easy. In some cases we can just
switch to iterating with a numeric index (as we do in sequencer.c here).
In other cases (like the cache_end one) the use of an end pointer is
more natural; we can keep that by just explicitly checking for the
NULL/0 case when assigning the end pointer.

Note that there are two ways you can write this latter case, checking
for the pointer:

  cache_end = cache ? cache + nr : cache;

or the size:

  cache_end = nr ? cache + nr : cache;

For the case of a NULL/0 ptr/len combo, they are equivalent. But writing
it the second way (as this patch does) has the property that if somebody
were to incorrectly pass a NULL pointer with a non-zero length, we'd
continue to notice and segfault, rather than silently pretending the
length was zero.

Signed-off-by: Jeff King <redacted>
Signed-off-by: Junio C Hamano <redacted>