It seems it finally happened: three Chinese researchers, Xiaoyun Wang , Yiqun Lisa Yin and Hongbo Yu announced they found a method of obtaining collisions on the full SHA-1 hash function. If this result is confirmed, it means that SHA-1 is broken.
Let us first mention that their paper has not been disclosed yet. The authors have published a note that summarizes the result they achieve, but does not give any detail on the way the attack is performed. For the moment, the crypto community is still waiting to see the attack itself. Yet, as it is the same team that broke MD5 and other hash functions this summer, there are reasons to believe their claim is true.
In their summary note, the authors announce an attack with complexity less than 2^69. This is a terrific cryptanalytic result: a real attack on an actual and widely-used standard. An attack with a complexity that is high, but not completely out of reach.
On the cryptographer’s point of view, it means that SHA-1 is broken: an attack has been found, with complexity much less than that of the intended design (2^80). Yet, this does not mean that we will see thousands of forged signatures appearing tomorrow, or that ecommerce applications are at immediate risk.
(Update: on August 16, 2005, the same team announced they could improve their attack to 2^63. This improvement is briefly commented below).
What is a hash function and what is it expected to do?
There are plenty of excellent references describing what a hash function is, and our goal is not to give a complete description here, but basically a hash function h is a function producing, from an arbitrary-length message, a fixed-length digest that will be some sort of commitment for the message itself. More precisely, the properties we expect from a hash function are:
- pre-image resistance: given a digest x, it is practically impossible to find a message m that hashes to that digest (i.e. so that h(m)=x).
- second pre-image resistance: given a message m and its digest x, it is practically impossible to find a second message m’ with the same digest (i.e. h(m)=h(m’)=x).
- collision resistance: it is practically impossible to find a pair of messages m, m’ that hash to the same digest. Note that you have an additional degree of freedom compared to previous requirement.
Additionally, hash functions are expected to behave like purely random mappings, but we will not enter that discussion here.
Hash functions are used for many purposes in cryptography. One is to build electronic signatures: rather than directly signing a message, the message is first hashed, and the signature function is applied on the (typically shorter) digest. This is one example where we can see the rationale for the security properties expected from a hash function: if the function is not second pre-image resistant, then an attacker can take a message m signed by Bob, find another message m’ with the same hash value, and claim that Bob signed m’. If the function is not collision resistant, then Bob can find two colliding messages m and m’, sign the first, and later claim that it was the second that he had signed.
Hash functions’ strength: the birthday paradox
Collision resistance is driven by the well-known birthday paradox. Roughly speaking, if you have a function with output space of size n, the number of elements you need to consider before finding two elements with same output is only proportional to the square root of n (and not to n itself). Expressed in bits, this means that a function with n-bit output will exhibit collisions after 2^(n/2) attempts.
This fact is well known to cryptographers, and basically means that you need hash functions with output size twice as large as the expected security level. The hash function MD5, for example, produces 128-bit output and was thus designed to resist “exhaustive search” attacks of complexity 2^64. With its output size of 160 bits, SHA-1 was expected to resist attacks of complexity 2^80.
The attack against SHA-1
In their summary note, Xiaoyun Wang , Yiqun Lisa Yin and Hongbo Yu announce they can find collisions with complexity less than 2^69 hash operations. This is an attack on the collision resistance property, which is much more efficient (2000 times more efficient, in fact) than birthday attack. A collision example on a reduced-round version is also provided, but very little details are given on the attack.
In the long term, SHA-1 will not be used anymore, even if it will take decades before it really disappears from all legacy systems. A more interesting question is what will be used instead: as a matter of fact, there are not so many hash functions available today, and many of them are based on the same design principles as MD5 and SHA. The National Institute of Standards and Technology (NIST) has recently standardized successors of SHA-1 (FIPS 180-2), that take more conservative options, but it is not obvious that they will not be harmed by improved versions of the published attack. Yet, if you are designing a new security application and need a hash function, SHA-256 or SHA-512 are probably among the best available choices today.
The remaining question is “in the short term, what should we do with all existing systems based on SHA-1?” In current situation, there is no need to rush for patches and upgrades: we can afford the time to study the implications of this attack on each system.
A first question to ask is whether the attack makes sense in a given context. The attack is against the collision resistance of SHA-1, and not all systems need collision resistance. For example, existing signatures are not at risk, because this would need a second pre-image attack, which is, generally speaking, more difficult.
Even if collision resistance is needed, note that, with a complexity of 2^69, this attack still leaves SHA-1 more resistant than what MD5 was expected to provide if it had not been broken. Experts have been recommending departing from MD5 for a few years, because that security level was not deemed adequate anymore, but that was not an emergency call: like AES replaced DES, it was time to launch a well-thought replacement process (which was precisely the purpose of FIPS 180-2). The new attack makes the need for new hash functions more pressing, but does not put actual systems at immediate risk.
It is not clear whether the required effort is achievable or not with today’s computing power: we know that an effort of 2^55 operations has been done several times (and becomes reasonably easy), and that 2^64 has been done at least once. Yet, this represents huge effort requiring very long distributed computing and/or costly dedicated hardware. Storage space might also be a critical factor here, but we have no information about the storage needed by the SHA-1 attack. Even if it is possible, this attack is most probably not relevant for most practical applications.
Yet, attacks only improve, so what we write today might become false tomorrow. The complexity of the attack might be reduced, or extensions of the attack might appear that apply to other contexts. For sure, this is a topic we will have to keep an eye on.
(Update: as a matter of fact, improvements did appear, reducing the attack's complexity to 2^63. This is still a huge amount of computation, but definitively means that we should expect actual SHA-1 collisions to be exhibited in the not-so-far future).
Bruce Schneier’s initial announcement:
Cryptographers’ panel discussing this issue at RSA 2005 (webcast):
Arjen Lenstra’s comments on the attack: http://cm.bell-labs.com/who/akl/hash.pdf
Ecrypt's position paper:
Practical applications of the MD5 attack
Two postscript files with the same hash: http://www.cits.rub.de/MD5Collisions
Two pieces of code with same MD5 hash: http://www.codeproject.com/dotnet/HackingMd5.asp
Two colliding X.509 Certificates based on MD5 collisions: http://www.win.tue.nl/~bdeweger/CollidingCertificates
Does the proof of the Riemann hypothesis really bring the whole of ecommerce to its knees?
Recent news argues that a proof of Riemann hypothesis has been discovered, and that this breakthrough could "bring disaster for ecommerce". What are the actual implications on cryptography? more