The GitHub Blog 08月19日
Highlights from Git 2.51
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Git 2.51 版本带来了众多改进,包括对多包索引的优化,通过“cruft-free”和“path walk”等技术显著提升了仓库读写性能。新版本还引入了更灵活的 stash interchange format,方便用户跨机器迁移 stash 记录。此外,git cat-file 命令对子模块的处理更友好,changed-path Bloom 过滤支持多路径,git switch 和 restore 命令移出实验状态,git whatchanged 被标记为弃用,并预告了 Git 3.0 的一些重大变更,如默认使用 SHA-256 和 reftable 后端。内部开发方面,Git 开始规范化使用 C99 特性,并放宽了贡献者身份要求,更加灵活。

✨ **Cruft-free 多包索引优化**:Git 2.51 引入了新的 repacking 行为,确保非 cruft 的包集合在可达性上是封闭的,并新增 `repack.MIDXMustContainCruft` 配置。这使得 GitHub 在处理大型仓库时,MIDX 尺寸减小约 38%,写入速度提升 35%,读性能提升约 5%,整体仓库性能得到显著增强。

🌳 **路径遍历(path walk)实现更小的包**:继 Git 2.49 的 name-hash v2 功能后,2.51 推出了“path walk”对象收集方式。它能够将同一路径下的所有对象一次性收集并处理,避免了对路径哈希的过度依赖,能够生成比以往更小的包文件,同时打包时间也具有竞争力。

📦 **Stash Interchange Format 改进**:Git 2.51 允许将多个 stash 条目表示为一系列提交,通过增加父提交引用前一个 stash 条目。这使得 stash 记录可以像普通分支或标签一样导出、推送和拉取,极大地便利了用户在不同机器间迁移和同步 stash 内容。

💡 **其他关键更新**:包括 `git cat-file --batch-check` 对子模块的更好支持,`changed-path` Bloom 过滤对多路径的优化,`git switch` 和 `git restore` 命令稳定化,`git whatchanged` 标记为弃用,以及 Git 3.0 的潜在重大变更预告(如 SHA-256 默认哈希)。

⚙️ **内部开发流程现代化**:Git 项目开始更积极地采用 C99 标准特性,例如允许使用 `bool` 关键字。同时,也放宽了补丁提交的身份要求,允许使用非法定姓名进行提交,与 Linux 内核的做法更趋一致,以促进更广泛的社区参与。

The open source Git project just released Git 2.51 with features and bug fixes from over 91 contributors, 21 of them new. We last caught up with you on the latest in Git back when 2.50 was released.

To celebrate this most recent release, here is GitHub’s look at some of the most interesting features and changes introduced since last time.

Cruft-free multi-pack indexes

Returning readers will have likely seen our coverage of cruft packs, multi-pack indexes (MIDXs), and reachability bitmaps. In case you’re new around here or otherwise need a refresher, here’s a brief overview:

Git stores repository contents as “objects” (blobs, trees, commits), either individually (“loose” objects, e.g. $GIT_DIR/objects/08/10d6a05...) or grouped into “packfiles” ($GIT_DIR/objects/pack). Each pack has an index (*.idx) that maps object hashes to offsets. With many packs, lookups slow down to O(M*log(N)), (where M is the number of packs in your repository, and N is the number of objects within a given pack).

A MIDX works like a pack index but covers the objects across multiple individual packfiles, reducing the lookup cost to O(log(N)), where N is the total number of objects in your repository. We use MIDXs at GitHub to store the contents of your repository after splitting it into multiple packs. We also use MIDXs to store a collection of reachability bitmaps for some selection of commits to quickly determine which object(s) are reachable from a given commit1.

However, we store unreachable objects separately in what is known as a “cruft pack”. Cruft packs were meant to exclude unreachable objects from the MIDX, but we realized pretty quickly that doing so was impossible. The exact reasons are spelled out in this commit, but the gist is as follows: if a once-unreachable object (stored in a cruft pack) later becomes reachable from some bitmapped commit, but the only copy of that object is stored in a cruft pack outside of the MIDX, then that object has no bit position, making it impossible to write a reachability bitmap.

Git 2.51 introduces a change to how the non-cruft portion of your repository is packed. When generating a new pack, Git used to exclude any object which appeared in at least one pack that would not be deleted during a repack operation, including cruft packs. In 2.51, Git now will store additional copies of objects (and their ancestors) whose only other copy is within a cruft pack. Carrying this process out repeatedly guarantees that the set of non-cruft packs does not have any object which reaches some other object not stored within that set of packs. (In other words, the set of non-cruft packs is closed under reachability.)

As a result, Git 2.51 has a new repack.MIDXMustContainCruft configuration which uses the new repacking behavior described above to store cruft packs outside of the MIDX. Using this at GitHub has allowed us to write significantly smaller MIDXs, in a fraction of the time, and resulting in faster repository read performance overall. (In our primary monorepo, MIDXs shrunk by about 38%, we wrote them 35% faster, and improved read performance by around 5%.)

Give cruft-less MIDXs a try today using the new repack.MIDXMustContainCruft configuration option.

[source]

Smaller packs with path walk

In Git 2.49, we talked about Git’s new “name-hash v2” feature, which changed the way that Git selects pairs of objects to delta-compress against one another. The full details are covered in that post, but here’s a quick gist. When preparing a packfile, Git computes a hash of all objects based on their filepath. Those hashes are then used to sort the list of objects to be packed, and Git uses a sliding window to search between pairs of objects to identify good delta/base candidates.

Prior to 2.49, Git used a single hash function based on the object’s filepath, with a heavy bias towards the last 16 characters of the path. That hash function, dating back all the way to 2006, works well in many circumstances, but can fall short when, say, unrelated blobs appear in paths whose final 16 characters are similar. Git 2.49 introduced a new hash function which takes more of the directory structure into account2, resulting in significantly smaller packs in some circumstances.

Git 2.51 takes the spirit of that change and goes a step further by introducing a new way to collect objects when repacking, called “path walk”. Instead of walking objects in revision order with Git emitting objects with their corresponding path names along the way, the path walk approach emits all objects from a given path at the same time. This approach avoids the name-hash heuristic altogether and can look for deltas within groups of objects that are known to be at the same path.

As a result, Git can generate packs using the path walk approach that are often significantly smaller than even those generated with the new name hash function described above. Its timings are competitive even with generating packs using the existing revision order traversal.

Try it out today by repacking with the new --path-walk command-line option.

[source]

Stash interchange format

If you’ve ever needed to switch to another branch, but wanted to save any uncommitted changes, you have likely used git stash. The stash command stores the state of your working copy and index, and then restores your local copy to match whatever was in HEAD at the time you stashed.

If you’ve ever wondered how Git actually stores a stash entry, then this section is for you. Whenever you push something onto your stash, Git creates three3 commits behind the scenes. There are two commits generated which capture the staged and unstaged changes. The staged changes represent whatever was in your index at the time of stashing, and the working directory changes represent everything you changed in your local copy but didn’t add to the index. Finally, Git creates a third commit listing the other two as its parents, capturing the entire snapshot.

Those internally generated commits are stored in the special refs/stash ref, and multiple stash entries are managed with the reflog. They can be accessed with git stash list, and so on. Since there is only one stash entry in refs/stash at a time, it’s extremely cumbersome to migrate stash entries from one machine to another.

Git 2.51 introduces a variant of the internal stash representation that allows multiple stash entries to be represented as a sequence of commits. Instead of using the first two parents to store changes from the index and working copy, this new representation adds one more parent to refer to the previous stash entry. That results in stash entries that contain four4 parents, and can be treated like an ordinary log of commits.

As a consequence of that, you can now export your stashes to a single reference, and then push or pull it like you would a normal branch or tag. Git 2.51 makes this easy by introducing two new sub-commands to git stash to import and export, respectively. You can now do something like:

$ git stash export --to-ref refs/stashes/my-stash$ git push origin refs/stashes/my-stash

on one machine to push the contents of your stash to origin, and then:

$ git fetch origin '+refs/stashes/*:refs/stashes/*'$ git stash import refs/stashes/my-stash

on another, preserving the contents of your stash between the two.

[source]


All that…

Now that we’ve covered some of the larger changes in more detail, let’s take a quicker look at a selection of some other new features and updates in this release.

…and a bag of chips

That’s just a sample of changes from the latest release. For more, check out the release notes for 2.51, or any previous version in the Git repository.


1 For some bit position (corresponding to a single object in your repository,) a 1 means that object can be reached from that bitmap’s associated commit, and a 0 means it is not reachable from that commit. There are also four type-level bitmaps (for blobs, trees, commits, and annotated tags); the XOR of those bitmaps is the all 1s bitmap. For more details on multi-pack reachability bitmaps, check out our previous post on Scaling monorepo maintenance. ⤴️

2 For the curious, each layer of the directory is hashed individually, then downshifted and XOR ed into the overall result. This results in a hash function which is more sensitive to the whole path structure, rather than just the final 16 characters. ⤴️

3 Usually. Git will sometimes generate a fourth commit if you stashed untracked (new files that haven’t yet been committed) or ignored files (that match one or more patterns in a .gitignore). ⤴️

4 Or five. ⤴️

5 Almost to the day; Git 2.23 was released on August 16, 2019, and Git 2.51 was released on August 18, 2025. ⤴️

6 It’s true; git --list-cmds=builtins | wc -l outputs “144” with Git 2.51. ⤴️

7 If you are somehow a diehard git whatchanged user, please let us know by sending a message to the Git mailing list. ⤴️

The post Highlights from Git 2.51 appeared first on The GitHub Blog.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Git 2.51 版本发布 性能优化 Git stash 版本控制
相关文章