Git fscked by SHA-1 collision? Not so fast, says Linus Torvalds

About that SHA-1 collision: Linus Torvalds has taken to Google+ to emphasise that in Git, its main role is error detection, so “the sky isn’t falling”.

The old algorithm is used, among other things, to provide a digital signature for software, documents like PDFs, and digital certificates. The mathematical operation should produce a unique result for any given input, but Google’s work showed it could be tricked in producing “collisions” – two different PDFs gave the same SHA-1 hash.

In the Git software repository system – authored by Torvalds – SHA-1 proves you’re fetching the repo you think you’re fetching: a collision means an attacker could insert a backdoor in a program and the victim would think they’re fetching a “safe” repo.

Not so fast, Torvalds writes. Let’s take his bullet list (in bold) with explanations.

“First off – the sky isn’t falling.

There’s a big difference between using a cryptographic hash for things like security signing, and using one for generating a ‘content identifier’ for a content-addressable system like git”

While the algorithm does have a security role even in Git, error correction is more important, Torvalds says, because trust should a function of the community.

If you fetch a Linux kernel from Linus’ repo, it’s because that’s where you expect the authoritative kernel to be: the hash is there so if something flipped a bit in storage, the signature won’t match, and you know something’s wrong.

“In contrast, in a project like git, the hash isn’t used for ‘trust’. I don’t pull on peoples trees because they have a hash of a4d442663580. Our trust is in people, and then we end up having lots of technology measures in place to secure the actual data.

“Think of [SHA-1] like ‘parity on steroids’: it’s not able to correct for errors, but it’s really really good at detecting corrupt data.”

”Secondly, the nature of this particular SHA1 attack means that it’s actually pretty easy to mitigate against, and there’s already been two sets of patches posted for that mitigation.”

The reason mitigation is “pretty easy”, he explained, is that to generate a collision, the attacker has to control both the “good” object and the “bad” object – and an attack is detectable on both sides of the collision.

For Git, this makes attacks hard to hide, unlike in the PDF-based proof of concept: “the pdf format acted as the ‘black box’, and what you see is the printout which has only a very indirect relationship to the pdf encoding.”

In Git ”If somebody inserts random odd generated crud in the middle of your source code, you will absolutely notice”, and even if someone found somewhere to hide a collision, git fsck “already warns about those kinds of shenanigans”.

In this lengthy mailing list discussion, he elaborates: “You also need to make the non-opaque data of the bad object besomething that actually encodes valid git data with interesting hashes in it (for the parent/tree/whatever pointers)”.

And even then, discovery is trivial – runs git fsck nightly to see if anyone’s fooling around with code.

“And finally, there’s actually a reasonably straightforward transition to some other hash that won’t break the world – or even old git repositories.”

“There’s a plan, it doesn’t look all that nasty, and you don’t even have to convert your repository”, he notes. On the mailing list, he says SHA3-256 looks like a sensible replacement hash. ®

Source: The Register – Security @ February 26, 2017 at 02:09PM