MD5 Collision Attack

Hi !

Today I’m going to discuss about MD5 Collision Attack. But before proceeding, I’ll just give a brief on the HAsh function. A “hash” (also called a “digest”, and informally a “checksum”) is a kind of “signature” for a stream of data that represents the contents. The closest real-life analog we can think is “a tamper-evident seal on a software package”: if you open the box (change the file), it’s detected.

So what is an MD5 Collision Attack and how is it constructed ?? Before going at this stage, let me first explain what hashing really means ! The very first point here is, to understand that hashing is NOT encryption. This is a common confusion, especially because all these words are in the category of “cryptography”, but it’s important to understand the difference. “Encryption” transforms data from a cleartext to ciphertext and back (given the right keys). Thus we see that “Encryption” is a two-way operation. Hashes, on the other hand, are stream of data into small digest, and it’s strictly a one way operation. All hashes of the same type – (this example shows the “MD5″ variety) – have the same size no matter how big the inputs are.

Now that we know the difference between “hashing” and “encryption”, it gives rise to another well known topic : Digital Signatures. A digital signature, is nothing but signing the hash of any document with private key to ensure that the message came from the intended user.

This gave rise to a potentially dangerous attack – MD5 Hash Collision Attack ! By now, we know that any message can be encrypted and a mathematical hash value (MD5, in our case) can be generated for that message. This when signed electronically, gives us the digital certificate. So, what if I can write two different messages that have the same MD5 hash !!! If such messages can be generated, then I can write two different messages with same MD5 hash and can get digital signature of victim on the innocent looking message and can prove that both messages come from the same user !

Confused ?? Let me explain this by a simple example. As I said earlier, in digital signature, Instead of signing a long message “Msg” you simply sign its hash Hash(Msg). This is useful and simplifies many issues … but if Hash(Msg) is identical to Hash(Msg’) then the signature is also valid for Msg.

STORY OF ALICE & HER BOSS BOB

Alice has working some weeks at the office of Bob. At the day Alice is supposed to leave, Bob writes a letter of recommendation for Alice on paper. The same day, Alice asks Bob to digitally sign the letter. For his convenience Alice gives an electronic copy of the document. Bob opens the document reads it, and finding nothing unusual, signs it.

As the time passed, Bob came to know that there has been a security breach with his confidential documents. Will he ever find out who tricked him and how ??

As a normal employee, Alice does not have any access to secret documents. So Alice decides to fool Bob. Because Bob is still relying on the widely used MD5 hash function, she implements the attack from Wang and Yu [WY05] to find MD5 collisions. When she receives her letter of recommendation (on paper), she prepares two postscript files with the same MD5 hash.

One displays a letter of recommendation(This is a good file), and second, an order from Bob to grant Alice some kind of a security clearance (This is an evil file). Now she asks Bob to sign the letter, who has no obvious reason to decline.

Due to the hash collision, Bob’s signature for the letter of recommendation is valid for the order, as well. Alice then presents order and digital signature to the person in charge of Bob’s files and finally gains access to Bob’s secret documents!

So do we have to worry about such attacks ?? Well… Statistically yes ! Considering the Pigeonhole principle, possibility of such attacks could not be negated completely. However, cracking the hash values and finding a collision would take long time in practical scenario. Reported attacks require an estimated work factor of 2^69 (approximately 590 billion billion) hash computations. While this is well beyond what is currently feasible using a normal computer, this is potentially feasible for attackers who have specialized hardware. For example, with 10,000 custom ASICs that can each perform 2 billion hash operations per second, the attack would take about one year.

For those who find this article interesting and would like to know more on hash collision attack, here are the links. These are the links, that would provide you with all the stuff one would require to actually construct a MD5 hash collision attack !

http://www.mscs.dal.ca/~selinger/md5collision/

http://www.codeproject.com/KB/security/HackingMd5.aspx

Enjoy !

Similar Posts you might be interested in:

7 Responses to “MD5 Collision Attack”

  1. w0lf says:

    Wow!!! Ne0 this is really quite an interesting stuff to read. But practically is this attack possible considering the computation time?

  2. Ne0 says:

    Hi w0lf ! Thanks for your comment. In practical scenario, this attack could be possible, if you have a high end system with good computational power. However, to get an idea of how this works, I would recommend you to go through the two links, that I’ve given at the end of blog. You’ll find the code and exe files that you could write to exploit this vulnerability.

  3. w0lf says:

    Ahaa!!! That was useful. But for sure this attack can be replayed on my poor configuration system :P

  4. Nirav Dave says:

    hi, Ne0,
    well, really nice article and good to know about it. keep posting such kind of informative articles.

  5. Ne0 says:

    Hi Nirav,

    Thank you very much for the input. I’ll be posting articles on regular basis… Do check them !

  6. b0yh4ck3r says:

    Ne0,

    Nice Article… Keep Rockin…

  7. Ne0 says:

    Thanks b0yh4ck3r ! U have chosen quite complicated name… I shud say !!! :)

Leave a Reply