fetch: add --must-have and remote.*.mustHave#2085
Open
derrickstolee wants to merge 4 commits intogitgitgadget:masterfrom
Open
fetch: add --must-have and remote.*.mustHave#2085derrickstolee wants to merge 4 commits intogitgitgadget:masterfrom
derrickstolee wants to merge 4 commits intogitgitgadget:masterfrom
Conversation
The 'fetch follows tags by default' test sorts using 'sort -k 4', but for-each-ref output only has 3 columns. This relies on sort treating records with fewer fields as having an empty fourth field, which may produce unstable results depending on locale. Use 'sort -k 3' to match the actual number of columns in the output. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Add a --must-have option to git fetch that specifies ref patterns whose tips should always be sent as "have" commits during negotiation, regardless of what the negotiation algorithm selects. Each value is either an exact ref name (e.g. refs/heads/release) or a glob pattern (e.g. refs/heads/release/*). The pattern syntax is the same as for --negotiation-tip. This is useful when certain references are important for negotiation efficiency but might be skipped by the negotiation algorithm or excluded by --negotiation-tip. Unlike --negotiation-tip which restricts the have set, --must-have is additive: the negotiation algorithm still runs and advertises its own selected commits, but the refs matching --must-have are sent unconditionally on top of those. If --negotiation-tip is used, the have set is first restricted by that option and then increased to include the tips specified by --must-have. Due to the comparision with --negotiation-tip, a previously untranslated warning around --negotiation-tip is converted into a translatable string with a swap for which option that is relevant. Getting this functionality to work requires moving these options through the transport API layer. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Add a new multi-valued config option remote.<name>.mustHave that specifies ref patterns whose tips should always be sent as "have" commits during fetch negotiation with that remote. Parse the option in handle_config() following the same pattern as remote.<name>.serverOption. Store the values in a string_list on struct remote so they are available per-remote. In builtin/fetch.c, when no --must-have options are given on the command line, use the remote.<name>.mustHave config values as the default. If the user explicitly provides --must-have on the CLI, the config is not used, giving CLI precedence. Signed-off-by: Derrick Stolee <stolee@gmail.com>
When push.negotiate is enabled, send-pack spawns a 'git fetch --negotiate-only' subprocess to discover common commits. Previously this subprocess had no way to include must-have refs in the negotiation. Add a must_have field to send_pack_args, set it from the transport layer where the remote struct is available, and pass explicit --must-have arguments to the negotiation subprocess. This approach directly passes the resolved config values rather than relying on the subprocess to read remote config, which is more robust when the URL alone is used as the remote identifier. Signed-off-by: Derrick Stolee <stolee@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fetch negotiation aims to find enough information from haves and wants such that the server can be reasonably confident that it will send all necessary objects and not too many "extra" objects that the client already has. However, this can break down if there are too many references, since Git truncates the list of haves based on a few factors (a 256 count limit or the server sending an ACK at the right time).
We already have the
--negotiation-tipfeature to focus the set of references that are used in negotiation, but I feel like this is designed backwards. I'd rather that we have a way to say "this is an important set of refs, but feel free to add more refs if needed" than "only use these refs for negotiation".Here's an example that demonstrates the problem. In an internal monorepo, developers work off of the 'main' branch so there are thousands of user branches that each add a few commits different from the 'main' branch. However, there is also a long-lived 'release' branch. This branch has a first-parent history that is parallel to 'main' and each of those commits is a merge whose second parent is a commit from 'main' that had a successful CI run. There are additional changes in the 'release' branch merge commits that add some changelog data, so there is a nontrivial set of novel blob content in that branch and not just a different set of commits.
The problem we had was that our georeplication system was regularly fetching from the origin and trying to get all data from all reachable branches. When the 'release' branch updated, the client would run out of haves before advertising its copy of the 'release' branch, but it would still list the new 'release' tip as a want. The server would then think that the client had never fetched that branch before and would send all of the changelog data from the whole history of the repo. (This led to a lot of downstream problems; we mitigated by setting a refspec that stopped fetching the 'release' branch, but this is not ideal.)
What I'd like is a mechanism to say "always advertise the client's version of 'main' and 'release' but also opportunistically include some user branches".
Based on my understanding, the '--negotiation-tip' option is close but not quite what I want. I could have the client only advertise 'release' and 'main' and never advertise any user branches. But then we'd download all content from each user branch every time it updates. Perhaps this would happen even with opportunistic inclusion of more haves, but I'd like to explore this area more.
There's also an issue that the '--negotiation-tip' feature doesn't seem to have a config key that enables it without CLI arguments. This is something that we could consider independently.
This patch series adds a new '--must-have' argument in the same places that '--negotiation-tip' exists. This adds a set of references that will always be included in the negotiation as haves, but then the rest of the negotiation can proceed as normal. It also adds a new '
remote.<name>.mustHave' config option to enable this behavior without CLI arguments.The series is organized into four changes:
--must-haveoption for fetch and tests it relative to--negotiation-tip.--must-haveby default. Any use in the CLI completely overrides the config option.git push.I've CC'd some folks who have been working on or near the
--negotiation-tipfeature for feedback as to how these things fit together in the bigger picture.Thanks,
-Stolee
cc: gitster@pobox.com
cc: jonathantanmy@google.com
cc: chooglen@google.com
cc: ps@pks.im