(14 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | == | + | <div style="text-align: center;"> |
− | + | == Password Hashing & The SHA Algorithm == | |
+ | Ayush Krishnamony, Michael Keeley, Dominic Nale, Arnav Mehra | ||
+ | </div> | ||
− | |||
− | SHA is a family of hashing algorithms developed by the NSA and is published by the National Institute of Standards and Technology (NIST). First SHA algorithm, SHA-0, can input 160 bit messages and outputs a | + | == What Is Hashing? == |
+ | |||
+ | In today’s world, the average person has 70-80 different passwords across multiple accounts. While users wouldn’t care if their Webkinz password got leaked, bank account details being stolen is much more detrimental to users. In order to keep them safe, banks are required by the FTC to have proper security to keep users passwords safe. The common algorithm used is hashing. Technically speaking, hashing is a method of making a finite string of characters into a fixed size. However, in this paper, we will focus on a form of hashing that attempts to scramble the input beyond repair, one-way hashing. In other words, one-way hashing is the idea that while converting input (passwords) to a hash value is computationally simple, reverting to our original input is next to impossible. | ||
+ | |||
+ | |||
+ | == Why do we Hash Passwords? == | ||
+ | When we input our password into a website, it, alongside some website-specific string is converted into a hashed value. It is this hash value that gets stored in the website's database, and later, used to verify your password when logging in. | ||
+ | |||
+ | To better understand why we use hashing for password verification, let us imagine a world without hashing. Facebook pushes an update that, unbeknownst to them, creates a vulnerability in their username/password database. This data is now vulnerable to hackers. If an adversary were to exploit this vulnerability, they could use the information to not only log into any user's Facebook, but any site a user uses a similar password on as the password is visible to the naked eye. One such adversary does exactly that, and because the passwords are not hashed, easily identifies user passwords across multiple sites, rather than just one. Many users can now kiss not only their Facebook goodbye, but also their bank details, insurance details, etc. | ||
+ | |||
+ | Now, let us redo this scenario in a world where hashing, more specifically one-way hashing, is used. The adversary, once again, exploits Facebook's vulnerability and has access to their username/password database. But now, because passwords only appear as nonsensical hashes, they have no clue as to each user's original password. At this point in time, the hacker would be left with only one option, to reverse the hashing algorithm. In our paper, we will go into detail about what makes this an almost impossible task and the most common algorithm used to create this hash in the first place, the Secure Hash Algorithm, or SHA. | ||
+ | |||
+ | |||
+ | ==What is SHA?== | ||
+ | SHA is a family of one-way hashing algorithms developed by the NSA and is published by the National Institute of Standards and Technology (NIST). First SHA algorithm, SHA-0, can input 160 bit messages and outputs a 160-bit hash value. This type of algorithm didn’t have much of a use due to the fact that it had a security flaw and was replaced with SHA-1 in 1995. There are several uses for SHA-1. We talked about the use of SHA-1 for password hashing, but there are also uses for verification. Say we want to download a file from a website other than the developers website. If we can use the SHA-1 hashed value of the file from the website and compare it to the developers, if the values don’t line up, we know that the file is not only not the same but could also contain malware. These are just some of the uses for SHA 1. But the question remains, how exactly does the SHA-1 algorithm work? | ||
+ | |||
== SHA-1: A Good Starting Point == | == SHA-1: A Good Starting Point == | ||
+ | In this section, we will be walking you though how the SHA-1 algorithm would generate a hash-value for the password/input of "Hello". | ||
+ | |||
+ | <small> | ||
===== Step 1: ===== | ===== Step 1: ===== | ||
+ | </small> | ||
Initialize 5 strings as "Hash Values" | Initialize 5 strings as "Hash Values" | ||
Line 18: | Line 38: | ||
H5 = 0xC3D2E1F0 <br/> | H5 = 0xC3D2E1F0 <br/> | ||
− | + | <small> | |
===== Step 2:===== | ===== Step 2:===== | ||
+ | </small> | ||
Take the word you want to hash (in binary), and append a 1 to the end of it. Then append as many zeros as it takes to make it divisible by 512, with the length of the message in a 64 bit integer appended at the end of the string. | Take the word you want to hash (in binary), and append a 1 to the end of it. Then append as many zeros as it takes to make it divisible by 512, with the length of the message in a 64 bit integer appended at the end of the string. | ||
Line 45: | Line 66: | ||
'''010010000110010101101100011011000110111110………0101000''' | '''010010000110010101101100011011000110111110………0101000''' | ||
− | + | <small> | |
====Step 3A:==== | ====Step 3A:==== | ||
+ | </small> | ||
Break message into 512-bit chunks | Break message into 512-bit chunks | ||
Line 52: | Line 74: | ||
Taking the example from above, since it is 512 bits, we will only have one chunk | Taking the example from above, since it is 512 bits, we will only have one chunk | ||
+ | <small> | ||
====Step 3B:==== | ====Step 3B:==== | ||
+ | </small> | ||
Break each chunk into 16, 32-bit words | Break each chunk into 16, 32-bit words | ||
Line 65: | Line 89: | ||
W<sub>15</sub> = 00000000000000000000000000101000<br/> | W<sub>15</sub> = 00000000000000000000000000101000<br/> | ||
+ | <small> | ||
====Step 4:==== | ====Step 4:==== | ||
+ | </small> | ||
Extend the first chunk to 80 32-bit words using the function | Extend the first chunk to 80 32-bit words using the function | ||
Line 117: | Line 143: | ||
(From now on we will be switching from binary representation to hex for simplicity) | (From now on we will be switching from binary representation to hex for simplicity) | ||
+ | <small> | ||
====Step 5A:==== | ====Step 5A:==== | ||
+ | </small> | ||
Initialize The "Holding Values" for the chunk with "Hash Values" | Initialize The "Holding Values" for the chunk with "Hash Values" | ||
Line 127: | Line 155: | ||
E = H5 | E = H5 | ||
+ | <small> | ||
====Step 5B:==== | ====Step 5B:==== | ||
+ | </small> | ||
For each word in the chunk, obtain an F and K value | For each word in the chunk, obtain an F and K value | ||
Line 162: | Line 192: | ||
+ | <small> | ||
====Step 5C:==== | ====Step 5C:==== | ||
+ | </small> | ||
Assign new "Holding Values" after every iteration | Assign new "Holding Values" after every iteration | ||
Line 196: | Line 228: | ||
A = 281919E97 | A = 281919E97 | ||
+ | <small> | ||
====Step 6:==== | ====Step 6:==== | ||
+ | </small> | ||
Add "Holding Values" to "Hash Values" | Add "Holding Values" to "Hash Values" | ||
Line 219: | Line 253: | ||
Repeat steps 5 and 6 for each chunk. After the last chunk, move on to step 7 | Repeat steps 5 and 6 for each chunk. After the last chunk, move on to step 7 | ||
+ | <small> | ||
====Step 7==== | ====Step 7==== | ||
+ | </small> | ||
Complete hash by appending H values back together using: | Complete hash by appending H values back together using: | ||
Line 250: | Line 286: | ||
HH = '''f7ff9e8b7bb2e09b70935a5d785e0cc5d9d0abf0''' | HH = '''f7ff9e8b7bb2e09b70935a5d785e0cc5d9d0abf0''' | ||
− | |||
− | + | == Why We Don't Still Use SHA-1: A Matter of Ethics == | |
− | + | An intuitive way to break SHA-1 relies on a simple concept in Mathematics known as the pigeonhole principle. Where there are 20 pigeon holes and 21 pigeons, if we fit all the pigeons in a box there will have to have more than one pigeon in a box. It is the case where a function maps a set m to a set n where the length of set n is less than m. The same applies for SHA-1, but also SHA-2 and all hashing functions since the inputs are [theoretically] infinite, but the output is finite. When two different inputs have the same output this is known as a “collision”. | |
− | + | The issue with collisions is that there is no way of knowing if the input given matches the original or intended input. So, if an attacker can consistently get a collision this would make SHA-1 a weak algorithm. | |
− | + | Google was successfully able to find a collision and employed techniques that reduced the complexity of the computations making it 100,000 times faster than brute forcing it. Even with this Google still used expensive hardware making this only feasible for well funded adversaries, but this attack could get easier as computers get faster. As well, the Chosen-Prefix Collision reduces the complexity from 2^80 (brute force) to 2^~61 which because it is exponential is a significant difference in terms of effort needed to accomplish this. This attack however still requires the adversaries to be well funded. Regardless, its use poses an ethical issue for those you entrust with securing your data and is proof that SHA-1 needs to be retired. | |
+ | Now it would seem that the issue of using collisions to bypass a hashing algorithm would seem universal and it is, but it is more challenging when the output is longer, as it is in SHA-2 (2^160 vs 2^224 Minimum), since the probability of getting a collision is less likely and SHA-1 collisions attacks were mostly reserved for well funded parties. In addition to SHA-2 being more complex as will be shown. Discouraging adversaries from using it since it is not practical to do, which is why organizations have moved away from SHA-1 to newer, more ethical hashing algorithms. | ||
− | + | When it comes to password cracking, an intuitive method hackers use to breach passwords involves getting the hashed version of the password from a data breach. Then hashing a series of guesses as to what the password is and if the hashes match up then the hacker has the password of the account or a collision. Either is sufficient in getting access to the account. A way switching from SHA-1 to SHA-2 corrects this is that SHA-2 takes longer and has less chances for collisions. Running through a dictionary to find a match in a database is essentially looking for needle in a haystack requiring thousands to millions of hashes. So if a hashing algorithm is even twice as long as SHA-1 it can make a big difference as to how reasonable it is to crack a hashed password. De incentivising attackers from doing it in the first place since it could take a while and not even find a match. | |
− | Due to the limitations of | + | |
+ | == The Solution: SHA-2 == | ||
+ | |||
+ | Due to the limitations of SHA-1 previously explained, the NSA designed and published SHA-2 in 2011. SHA-2 aims to solve some of the vulnerabilities of SHA-1 by performing more computation and shuffling at each iteration of the algorithm, allowing us to generate larger output hashes, and utilizing more "random" values. SHA-2 comes in a variety of output sizes, but for the purpose of this paper, we will be looking at SHA-256, which outputs a 256-bit hash value. | ||
<small> | <small> | ||
Line 269: | Line 308: | ||
</small> | </small> | ||
− | Similar to | + | Similar to SHA-1, we begin with the same preprocessing step that turns our input value into 512-bit chunks. For the string “Hello”, we will again have 1 chunk with the value: |
<div style="text-align: center;"> | <div style="text-align: center;"> | ||
Line 279: | Line 318: | ||
</small> | </small> | ||
− | For each chunk, we will run a similar extension function as | + | For each chunk, we will run a similar extension function as SHA-1, except we will only extend our original 16 32-bit words to 64 words rather than 80. While this may seem less secure, the actual extension function is a bit more complex and performs 9 additional operations at each step: |
<div style="margin-left: 50px"> | <div style="margin-left: 50px"> | ||
− | For i = 16 to 63: | + | For i = 16 to 63:<div style="margin-left: 50px"> |
− | <div style="margin-left: 50px"> | + | s0 = (w[i-15] '''rightrotate''' 7) '''xor''' (w[i-15] '''rightrotate''' 18) '''xor''' (w[i-15] '''rightshift''' 3)<br/> |
− | s0 = (w[i-15] | + | s1 = (w[i-2] '''rightrotate''' 17) '''xor''' (w[i-2] '''rightrotate''' 19) '''xor''' (w[i-2] '''rightshift''' 10)<br/> |
− | s1 = (w[i-2] | + | |
w[i] = w[i-16] + w[i-7] + s0 + s1 | w[i] = w[i-16] + w[i-7] + s0 + s1 | ||
</div> | </div> | ||
Line 303: | Line 341: | ||
</div> | </div> | ||
} | } | ||
− | + | <br/> | |
− | For i = 16 (the first iteration): | + | For i = 16 (the first iteration):<div style="margin-left: 50px"> |
− | <div style="margin-left: 50px"> | + | s0 = (w[1] '''rightrotate''' 7) '''xor''' (w[1] '''rightrotate''' 18) '''xor''' (w[1] '''rightshift''' 3)<br/> |
− | s0 = (w[1] | + | = (0x6F800000 '''rightrotate''' 7) '''xor''' (0x6F800000 '''rightrotate''' 18) '''xor''' (0x6F800000 '''rightshift''' 3)<br/> |
− | = (0x6F800000 | + | |
= 0xD2F1BE0<br/> | = 0xD2F1BE0<br/> | ||
− | s1 = (w[i-2] | + | s1 = (w[i-2] '''rightrotate''' 17) '''xor''' (w[i-2] '''rightrotate''' 19) '''xor''' (w[i-2] '''rightshift''' 10)<br/> |
− | = (w[i-2] | + | = (w[i-2] '''rightrotate''' 17) '''xor''' (w[i-2] '''rightrotate''' 19) '''xor''' (w[i-2] '''rightshift''' 10)<br/> |
= 0<br/> | = 0<br/> | ||
w[16] = w[i-16] + s0 + w[i-7] + s1<br/> | w[16] = w[i-16] + s0 + w[i-7] + s1<br/> | ||
Line 339: | Line 376: | ||
</small> | </small> | ||
− | From here, | + | From here, SHA-2 will compress these w-values onto our final hash values. Similar to SHA-1, our final hash values, or h-values, will be initialized to seemingly random numbers (the fractional portions of the roots of the first few primes in this case) and will transform as we consider each chunk. But, unlike SHA-1, SHA-2 will chew on 8 of these hash values. The pseudo-code for this step is as follows: |
<div style="margin-left: 50px"> | <div style="margin-left: 50px"> | ||
Line 349: | Line 386: | ||
</div> | </div> | ||
}<br/> | }<br/> | ||
− | For each chunk: | + | For each chunk:<div style="margin-left: 50px"> |
− | <div style="margin-left: 50px"> | + | |
// copy current h-values into an array, c<br/> | // copy current h-values into an array, c<br/> | ||
− | c = h<br/> | + | c = h<br/> |
− | // hash up c-values<br/> | + | // hash up c-values<br/>For i = 0 to 63:<div style="margin-left: 50px"> |
− | For i = 0 to 63: | + | |
− | <div style="margin-left: 50px"> | + | |
// compute 2 terms t1 and t2 to factor into hash-values<br/> | // compute 2 terms t1 and t2 to factor into hash-values<br/> | ||
− | s1 = (c[4] | + | s1 = (c[4] '''rightrotate''' 6) '''xor''' (c[4] '''rightrotate''' 11) '''xor''' (c[4] '''rightrotate''' 25)<br/> |
− | ch = (c[4] and c[5]) xor ((not c[4]) and c[6])<br/> | + | ch = (c[4] '''and''' c[5]) '''xor''' (('''not''' c[4]) '''and''' c[6])<br/> |
t1 = c[7] + s1 + ch + k[i] + w[i]<br/> | t1 = c[7] + s1 + ch + k[i] + w[i]<br/> | ||
− | s0 = (c[0] | + | s0 = (c[0] '''rightrotate''' 2) '''xor''' (c[0] '''rightrotate''' 13) '''xor''' (c[0] '''rightrotate''' 22)<br/> |
− | maj = (c[0] and c[1]) | + | maj = (c[0] '''and''' c[1]) '''rightrotate''' (c[0] '''and''' c[2]) '''rightrotate''' (c[1] '''and''' c[2])<br/> |
− | t2 = s0 + maj<br/> | + | t2 = s0 + maj<br/> |
− | // rotate hash-values<br/> | + | // rotate hash-values<br/>For i = 1 to 7:<div style="margin-left: 50px"> |
− | + | c[i] = c[i - 1] | |
− | + | ||
− | c[i] = c[i - 1] | + | |
</div> | </div> | ||
− | // integrate computed terms into hash-values | + | // integrate computed terms into hash-values<br/> |
c[4] += t1<br/> | c[4] += t1<br/> | ||
c[0] = t1 + t2 | c[0] = t1 + t2 | ||
</div> | </div> | ||
− | // modify original h-values using new c-values<br/> | + | // modify original h-values using new c-values<br/>For i = 0 to 7:<div style="margin-left: 50px"> |
− | For i = 0 to 7: | + | h[i] += c[i] |
− | <div style="margin-left: 50px"> | + | |
− | h[i] += c[i] | + | |
</div> | </div> | ||
</div> | </div> | ||
</div> | </div> | ||
− | In contrast to | + | In contrast to SHA-1, this computation differs in 3 main ways: |
* Only 64 rounds of computation are performed as opposed to 80. | * Only 64 rounds of computation are performed as opposed to 80. | ||
− | * More computation is performed at each step. | + | * More computation is performed at each step. SHA-2 utilizes 8 32-bit h-values (and therefore c-values as well) rather than 5, as SHA-1 does. When cycling c-values, SHA-2 will not only modify the first value, c[0], but the fifth as well, c[4]. |
− | * Rather than using 4 distinct k-values, | + | * Rather than using 4 distinct k-values, SHA-2 utilizes 64. In SHA-2, our k-values are the fractional portion of the cubic root of the first 64 primes. |
For our example, the first iteration would look like: | For our example, the first iteration would look like: | ||
<div style="margin-left: 50px"> | <div style="margin-left: 50px"> | ||
− | c = original 8 h-values<br/> | + | c = [original 8 h-values]<br/> |
− | For i = 0: | + | For i = 0:<div style="margin-left: 50px"> |
− | <div style="margin-left: 50px"> | + | s1 = (c[4] '''rightrotate''' 6) '''xor''' (c[4] '''rightrotate''' 11) xor (c[4] '''rightrotate''' 25)<br/> |
− | + | = (0x510E527Ff '''rightrotate''' 6) '''xor''' (0x510E527Ff '''rightrotate''' 11) xor (0x510E527Ff '''rightrotate''' 25)<br/> | |
− | + | = 0x3587272B<br/> | |
− | + | ch = (c[4] '''and''' c[5]) '''xor''' (('''not''' c[4]) '''and''' c[6])<br/> | |
− | + | = (0x510E527Ff '''and''' 0x9b05688c) '''xor''' (('''not''' 0x510e527f) and 0x1f83d9ab)<br/> | |
− | + | = 0x1F85C98C<br/> | |
− | + | t1 = c[7] + s1 + ch + k[i=0] + w[i=0]<br/> | |
− | + | = 0x5BE0CD19 + 0x3587272B + 0x1F85C98C + 0x428A2F98 + 0x48656C6C<br/> | |
− | + | = 0x3BDD59D4<br/> | |
− | s0 = (c[0] | + | s0 = (c[0] '''rightrotate''' 2) '''xor''' (c[0] '''rightrotate''' 13) '''xor''' (c[0] '''rightrotate''' 22)<br/> |
− | + | = 0xCE20B47E<br/> | |
− | + | mj = (c[0] '''and''' c[1]) '''xor''' (c[0] '''and''' c[2]) '''xor''' (c[1] '''and''' c[2])<br/> | |
− | + | = 0x3A6FE667<br/> | |
− | + | t2 = s0 + mj<br/> | |
− | + | = 0x08909AE5 | |
</div> | </div> | ||
c[7] = c[6] = 0x1F83D9AB<br/> | c[7] = c[6] = 0x1F83D9AB<br/> | ||
Line 434: | Line 464: | ||
</small> | </small> | ||
− | After all previous chunks have been compressed onto our h-values, our final hash is simply appending these | + | After all previous chunks have been compressed onto our h-values, our final hash is formed by simply appending these values as so: |
<div style="text-align: center"> | <div style="text-align: center"> | ||
Line 442: | Line 472: | ||
0x185F8DB3 2271FE25 F561A6FC 938B2E26 4306EC30 4EDA5180 7D17648 26381969 | 0x185F8DB3 2271FE25 F561A6FC 938B2E26 4306EC30 4EDA5180 7D17648 26381969 | ||
</div> | </div> | ||
+ | |||
+ | |||
+ | ==Ethics of using SHA-2 over SHA-1== | ||
+ | Organizations, especially ones that handle sensitive information such as banks, hospitals, schools, etc. have an obligation to take the security of their user’s accounts and information seriously. No one would want their money or health records stored in a vault that has decent, but not the most up to date security. The same applies for hashing passwords. As previously shown SHA-1 has issues while not to the point of being completely broken, vulnerabilities only become more of an issue as computing power increases. Our smartphones are more powerful than computers of two decades ago. | ||
+ | |||
+ | |||
+ | ==Conclusion== | ||
+ | The importance of hashing algorithms are increasingly useful as we go more and more digital. And not only are these hashing algorithms used to protect your passwords, they are also used within cryptocurrency. As time goes on, these hashing algorithms will need to get better to keep up with the ever growing need for security. In fact, a new algorithm, SHA-3, was created in 2015 as a new and better version of SHA-2. This algorithm is not only faster but also more secure against length extension hack. However, SHA-2 isn’t going to be immediately replace and still has many good uses. Highlighting SHA-3 is meant to show there is a constant need for innovation in the encryption field and this will continue to be as long as we are digital with information important to us. | ||
+ | |||
+ | |||
+ | ==References== | ||
+ | https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html | ||
+ | https://shattered.io/ | ||
+ | https://www.geeksforgeeks.org/difference-between-sha1-and-sha2/ | ||
+ | https://eprint.iacr.org/2020/014.pdf | ||
+ | https://www.nottingham.ac.uk/research/beacons-of-excellence/future-food/meet-the-team/michael-pound/index.aspx (https://youtu.be/7U-RbOKanYs) | ||
+ | https://csrc.nist.gov/csrc/media/publications/fips/180/2/archive/2002-08-01/documents/fips180-2.pdf | ||
+ | https://www.comparitech.com/blog/information-security/what-is-sha-2-how-does-it-work/ |
Latest revision as of 14:31, 10 December 2022
Contents
Password Hashing & The SHA Algorithm
Ayush Krishnamony, Michael Keeley, Dominic Nale, Arnav Mehra
What Is Hashing?
In today’s world, the average person has 70-80 different passwords across multiple accounts. While users wouldn’t care if their Webkinz password got leaked, bank account details being stolen is much more detrimental to users. In order to keep them safe, banks are required by the FTC to have proper security to keep users passwords safe. The common algorithm used is hashing. Technically speaking, hashing is a method of making a finite string of characters into a fixed size. However, in this paper, we will focus on a form of hashing that attempts to scramble the input beyond repair, one-way hashing. In other words, one-way hashing is the idea that while converting input (passwords) to a hash value is computationally simple, reverting to our original input is next to impossible.
Why do we Hash Passwords?
When we input our password into a website, it, alongside some website-specific string is converted into a hashed value. It is this hash value that gets stored in the website's database, and later, used to verify your password when logging in.
To better understand why we use hashing for password verification, let us imagine a world without hashing. Facebook pushes an update that, unbeknownst to them, creates a vulnerability in their username/password database. This data is now vulnerable to hackers. If an adversary were to exploit this vulnerability, they could use the information to not only log into any user's Facebook, but any site a user uses a similar password on as the password is visible to the naked eye. One such adversary does exactly that, and because the passwords are not hashed, easily identifies user passwords across multiple sites, rather than just one. Many users can now kiss not only their Facebook goodbye, but also their bank details, insurance details, etc.
Now, let us redo this scenario in a world where hashing, more specifically one-way hashing, is used. The adversary, once again, exploits Facebook's vulnerability and has access to their username/password database. But now, because passwords only appear as nonsensical hashes, they have no clue as to each user's original password. At this point in time, the hacker would be left with only one option, to reverse the hashing algorithm. In our paper, we will go into detail about what makes this an almost impossible task and the most common algorithm used to create this hash in the first place, the Secure Hash Algorithm, or SHA.
What is SHA?
SHA is a family of one-way hashing algorithms developed by the NSA and is published by the National Institute of Standards and Technology (NIST). First SHA algorithm, SHA-0, can input 160 bit messages and outputs a 160-bit hash value. This type of algorithm didn’t have much of a use due to the fact that it had a security flaw and was replaced with SHA-1 in 1995. There are several uses for SHA-1. We talked about the use of SHA-1 for password hashing, but there are also uses for verification. Say we want to download a file from a website other than the developers website. If we can use the SHA-1 hashed value of the file from the website and compare it to the developers, if the values don’t line up, we know that the file is not only not the same but could also contain malware. These are just some of the uses for SHA 1. But the question remains, how exactly does the SHA-1 algorithm work?
SHA-1: A Good Starting Point
In this section, we will be walking you though how the SHA-1 algorithm would generate a hash-value for the password/input of "Hello".
Step 1:
Initialize 5 strings as "Hash Values"
H1 = 0x67452301
H2 = 0xEFCDAB89
H3 = 0x98BADCFE
H4 = 0x10325476
H5 = 0xC3D2E1F0
Step 2:
Take the word you want to hash (in binary), and append a 1 to the end of it. Then append as many zeros as it takes to make it divisible by 512, with the length of the message in a 64 bit integer appended at the end of the string.
Example:
The word "Hello" in binary is:
01001000 01100101 01101100 01101100 01101111 (H) (e) (l) (l) (o)
Add 1 to the end:
= 01001000011001010110110001101100011011111
Since "Hello" is less than 448 we add 0’s until the string is 448 bits long:
= 01001000011001010110110001101100011011111000…0 (len = 448 bits)
Lastly take the length of the string before processing (40 bits in this case) append that as a 64 bit integer
= ...0000101000 (length of added bits = 64)
So total length is 512 in this example
010010000110010101101100011011000110111110………0101000
Step 3A:
Break message into 512-bit chunks
Taking the example from above, since it is 512 bits, we will only have one chunk
Step 3B:
Break each chunk into 16, 32-bit words
Example:
The words would be
W0 = 01001000011001010110110001101100
W1 = 01101111100000000000000000000000
W2-W14 = 00000000000000000000000000000000
W15 = 00000000000000000000000000101000
Step 4:
Extend the first chunk to 80 32-bit words using the function
for i from 16 to 79
w[i] = (w[i-3] xor w[i-8] xor w[i-14] xor w[i-16]) leftrotate 1
Example:
Using the example we have been using, the first iteration would go like this:
w[16] = (w[16-3] xor w[16-8] xor w[16-14] xor w[16-16]) leftrotate 1 w[16] = (w[13] xor w[8] xor w[2] xor w[0]) leftrotate 1
The xor function will compare the bits of the two strings in in each index
and give an output based on the input of the the strings
A B A xor B 0 0 -> 0 0 1 -> 1 1 0 -> 1 1 1 -> 0
w[13] = 00000000000000000000000000000000 xor w[8] = 00000000000000000000000000000000 xor w[2] = 00000000000000000000000000000000 xor w[0] = 01001000011001010110110001101100 = 01001000011001010110110001101100
The leftrotate x function shifts each bit x times to the left. Any bits
that are shifted past the beginning of the string are cycled back to the end of the string.
Taking the string we got, we leftrotate the string one bit
01001000011001010110110001101100
= 10010000110010101101100011011000
So, after the first iteration W16 is added to the chunk as:
W16 = 10010000110010101101100011011000
After all of the iterations, we should have a chunk of 2560 bits
(From now on we will be switching from binary representation to hex for simplicity)
Step 5A:
Initialize The "Holding Values" for the chunk with "Hash Values"
A = H1 B = H2 C = H3 D = H4 E = H5
Step 5B:
For each word in the chunk, obtain an F and K value
First 20 words:
K = 0x5A827999 F = (B ∧ C)∨((¬B) ∧ D)
Next 20 words:
K = 0x6ED9EBA1 F = B ⊕ C ⊕ D
Next 20 words:
K = 0x8F1BBCDC F = (B ∧ C) v (B ∧ D) v (C ∧ D)
Last 20 words:
K = 0xCA62C1D6 F = B ⊕ C ⊕ D
Example:
On the first iteration,
K = 0x5A827999 F = (0xEFCDAB89 ∧ 0x98BADCFE) v ((NOT 0xEFCDAB89) ∧ 0x10325476) (B) (C) (¬B) (D)
F = 0x32327676
Step 5C:
Assign new "Holding Values" after every iteration
Temp = (A leftrotate 5) + F + E + K + w[i]
E = D
D = C
C = (B leftrotate 30)
B = A
A = Temp
Example:
After iteration 1:
Temp = (A leftrotate 5) + F + E + K + w[0]
A leftrotate 5 = E8A4602C F = 32327676 E = C3D2E1F0 K = 5A827999 W0 = 48656C6C
So we assign the "Holding Values" as:
Temp = 281919E97
E = 10325476 D = 98BADCFE C = (leftrotate 30 B) = EFCDAB89 B = 67452301 A = 281919E97
Step 6:
Add "Holding Values" to "Hash Values"
H1 = H1 + A H2 = H2 + B H3 = H3 + C H4 = H4 + D H5 = H5 + E
Examples:
(Assume "Holding Values" from the last step are those of the final iteration)
H1 = 0x67452301 + 0x281919E97 = 0x2E8D6C198 H2 = 0xEFCDAB89 + 0x67452301 = 0x15712CE8A H3 = 0x98BADCFE + 0xEFCDAB89 = 0x188888887 H4 = 0x10325476 + 0x98BADCFE = 0xA8ED3174 H5 = 0xC3D2E1F0 + 0x10325476 = 0xD4053666
Repeat steps 5 and 6 for each chunk. After the last chunk, move on to step 7
Step 7
Complete hash by appending H values back together using:
HH = (H1 leftrotate 128) V (H2 leftrotate 96) V (H3 leftrotate 64) V (H4 leftrotate 32) V H5
Example: If these are the final hashes from iterating through all chunks
H1 = B3099B3DF4683F306175ED806E58C04AFFE622F6
H2 = 372AE26109B3DE38C9E56BB84E9D6DE2FB28ABFB
H3 = 28F142FF30F862A2405D9232E85714C23C567E9B
H4 = 91F2DF811E4615B20F161B86FD746874FA139A6E
H5 = ABED8613F2D98B8B7166B11E2AF8EFEC08197000
Then after doing the leftrotates
H1 leftrotate 128 = FFE622F6B3099B3DF4683F306175ED806E58C04A
H2 leftrotate 96 = 4E9D6DE2FB28ABFB372AE26109B3DE38C9E56BB0
H3 leftrotate 64 = 405D9232E85714C23C567E9B28F142FF30F862E0
H4 leftrotate 32 = F2D98B8B7166B11E2AF8EFEC0819700C91F2DF90
H5 stays the same
Finally, adding the strings together gives us:
HH = f7ff9e8b7bb2e09b70935a5d785e0cc5d9d0abf0
Why We Don't Still Use SHA-1: A Matter of Ethics
An intuitive way to break SHA-1 relies on a simple concept in Mathematics known as the pigeonhole principle. Where there are 20 pigeon holes and 21 pigeons, if we fit all the pigeons in a box there will have to have more than one pigeon in a box. It is the case where a function maps a set m to a set n where the length of set n is less than m. The same applies for SHA-1, but also SHA-2 and all hashing functions since the inputs are [theoretically] infinite, but the output is finite. When two different inputs have the same output this is known as a “collision”.
The issue with collisions is that there is no way of knowing if the input given matches the original or intended input. So, if an attacker can consistently get a collision this would make SHA-1 a weak algorithm.
Google was successfully able to find a collision and employed techniques that reduced the complexity of the computations making it 100,000 times faster than brute forcing it. Even with this Google still used expensive hardware making this only feasible for well funded adversaries, but this attack could get easier as computers get faster. As well, the Chosen-Prefix Collision reduces the complexity from 2^80 (brute force) to 2^~61 which because it is exponential is a significant difference in terms of effort needed to accomplish this. This attack however still requires the adversaries to be well funded. Regardless, its use poses an ethical issue for those you entrust with securing your data and is proof that SHA-1 needs to be retired.
Now it would seem that the issue of using collisions to bypass a hashing algorithm would seem universal and it is, but it is more challenging when the output is longer, as it is in SHA-2 (2^160 vs 2^224 Minimum), since the probability of getting a collision is less likely and SHA-1 collisions attacks were mostly reserved for well funded parties. In addition to SHA-2 being more complex as will be shown. Discouraging adversaries from using it since it is not practical to do, which is why organizations have moved away from SHA-1 to newer, more ethical hashing algorithms.
When it comes to password cracking, an intuitive method hackers use to breach passwords involves getting the hashed version of the password from a data breach. Then hashing a series of guesses as to what the password is and if the hashes match up then the hacker has the password of the account or a collision. Either is sufficient in getting access to the account. A way switching from SHA-1 to SHA-2 corrects this is that SHA-2 takes longer and has less chances for collisions. Running through a dictionary to find a match in a database is essentially looking for needle in a haystack requiring thousands to millions of hashes. So if a hashing algorithm is even twice as long as SHA-1 it can make a big difference as to how reasonable it is to crack a hashed password. De incentivising attackers from doing it in the first place since it could take a while and not even find a match.
The Solution: SHA-2
Due to the limitations of SHA-1 previously explained, the NSA designed and published SHA-2 in 2011. SHA-2 aims to solve some of the vulnerabilities of SHA-1 by performing more computation and shuffling at each iteration of the algorithm, allowing us to generate larger output hashes, and utilizing more "random" values. SHA-2 comes in a variety of output sizes, but for the purpose of this paper, we will be looking at SHA-256, which outputs a 256-bit hash value.
STEP 1: Padding
Similar to SHA-1, we begin with the same preprocessing step that turns our input value into 512-bit chunks. For the string “Hello”, we will again have 1 chunk with the value:
0x4865 6C6C 6F80 [28 16-bit words of 0] 0028
STEP 2: Word Chunking & Extension
For each chunk, we will run a similar extension function as SHA-1, except we will only extend our original 16 32-bit words to 64 words rather than 80. While this may seem less secure, the actual extension function is a bit more complex and performs 9 additional operations at each step:
s0 = (w[i-15] rightrotate 7) xor (w[i-15] rightrotate 18) xor (w[i-15] rightshift 3)
s1 = (w[i-2] rightrotate 17) xor (w[i-2] rightrotate 19) xor (w[i-2] rightshift 10)
w[i] = w[i-16] + w[i-7] + s0 + s1
Conceptually, this step is just like a more advanced version of the Fibonacci sequence. Both iteratively generate proceeding terms by some function of previous terms, however, this algorithm utilizes 4 non-consecutive, pre-existing terms rather than 2 consecutive pre-existing terms. This relation to Fibonacci helps us better realize why this step in SHA is so good at producing seemingly random output. Much like Fibonacci, even though change may occur slowly at the start, it will quickly snowball into more and more complex terms, and this is exactly what we will see in the upcoming example.
If we apply this algorithm to the example chunk, we have:
// the 64 words/terms we are generating
w[64] = {
0x48656C6C, 0x6F800000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x28,
...remaining 48 are not set yet
}
s0 = (w[1] rightrotate 7) xor (w[1] rightrotate 18) xor (w[1] rightshift 3)
= (0x6F800000 rightrotate 7) xor (0x6F800000 rightrotate 18) xor (0x6F800000 rightshift 3)
= 0xD2F1BE0
s1 = (w[i-2] rightrotate 17) xor (w[i-2] rightrotate 19) xor (w[i-2] rightshift 10)
= (w[i-2] rightrotate 17) xor (w[i-2] rightrotate 19) xor (w[i-2] rightshift 10)
= 0
w[16] = w[i-16] + s0 + w[i-7] + s1
= 0x68698000 + 0xD2F1BE0 + 0 + 0
= 0x5594884C
As you may notice, in this first iteration, w[16] is only 2 of the 4 numbers summed were non-zero, making for somewhat poor randomness. Think of this like the duplicate 1 in the Fibonacci sequence, because most of the original 16 terms are 0s, we will observe very little change at the start. But as we continue through iterations, terms will appear more and more complex and seemingly random. To exemplify this, after all 64 iterations, we will have:
w[64] = {
0x48656C6C, 0x6F800000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x28,
0x5594884C, 0x6F910000, 0xD53AC55A, 0xA01BDE7A, 0x3A337E8B, 0x94DA02F9, 0xD09A76A8, 0x969B56C2,
0xE54654C3, 0x96D784A5, 0x80DCBE18, 0x6D174ADB, 0x5DC96A53, 0x1CC815A3, 0x7F1A0DCE, 0x9DB3E25F,
0x80DA5DC2, 0x9C0CA81E, 0xBEA91B8B, 0x8DA1AF1A, 0x66A32861, 0xCC59DDF4, 0xA1C28A29, 0xFB405DB4,
0x4EAACB84, 0x896EE197, 0xB4ED8DEF, 0x782C4A4A, 0xE252B57C, 0x7B6C0231, 0xDA9EE233, 0x700956F3,
0xBC709C4D, 0x1900A949, 0xE3BFB13B, 0xB581CF14, 0x980047E9, 0x6B30284B, 0x9EEF960A, 0x7C902E42,
0x27E9FFC6, 0x2AB47040, 0xB3E8DA4F, 0xF211A94, 0x30DCBA1F, 0x8CE0FD92, 0xDECD3E41, 0xA5312439
}
STEP 3: Compression
From here, SHA-2 will compress these w-values onto our final hash values. Similar to SHA-1, our final hash values, or h-values, will be initialized to seemingly random numbers (the fractional portions of the roots of the first few primes in this case) and will transform as we consider each chunk. But, unlike SHA-1, SHA-2 will chew on 8 of these hash values. The pseudo-code for this step is as follows:
// initial hash-values, or h-values
h[8] = {
0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a,
0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19
}
// copy current h-values into an array, c
c = h
For i = 0 to 63:
// compute 2 terms t1 and t2 to factor into hash-values
s1 = (c[4] rightrotate 6) xor (c[4] rightrotate 11) xor (c[4] rightrotate 25)
ch = (c[4] and c[5]) xor ((not c[4]) and c[6])
t1 = c[7] + s1 + ch + k[i] + w[i]
s0 = (c[0] rightrotate 2) xor (c[0] rightrotate 13) xor (c[0] rightrotate 22)
maj = (c[0] and c[1]) rightrotate (c[0] and c[2]) rightrotate (c[1] and c[2])
t2 = s0 + maj
For i = 1 to 7:
c[i] = c[i - 1]
// integrate computed terms into hash-values
c[4] += t1
c[0] = t1 + t2
For i = 0 to 7:
h[i] += c[i]
In contrast to SHA-1, this computation differs in 3 main ways:
- Only 64 rounds of computation are performed as opposed to 80.
- More computation is performed at each step. SHA-2 utilizes 8 32-bit h-values (and therefore c-values as well) rather than 5, as SHA-1 does. When cycling c-values, SHA-2 will not only modify the first value, c[0], but the fifth as well, c[4].
- Rather than using 4 distinct k-values, SHA-2 utilizes 64. In SHA-2, our k-values are the fractional portion of the cubic root of the first 64 primes.
For our example, the first iteration would look like:
c = [original 8 h-values]
s1 = (c[4] rightrotate 6) xor (c[4] rightrotate 11) xor (c[4] rightrotate 25)
= (0x510E527Ff rightrotate 6) xor (0x510E527Ff rightrotate 11) xor (0x510E527Ff rightrotate 25)
= 0x3587272B
ch = (c[4] and c[5]) xor ((not c[4]) and c[6])
= (0x510E527Ff and 0x9b05688c) xor ((not 0x510e527f) and 0x1f83d9ab)
= 0x1F85C98C
t1 = c[7] + s1 + ch + k[i=0] + w[i=0]
= 0x5BE0CD19 + 0x3587272B + 0x1F85C98C + 0x428A2F98 + 0x48656C6C
= 0x3BDD59D4
s0 = (c[0] rightrotate 2) xor (c[0] rightrotate 13) xor (c[0] rightrotate 22)
= 0xCE20B47E
mj = (c[0] and c[1]) xor (c[0] and c[2]) xor (c[1] and c[2])
= 0x3A6FE667
t2 = s0 + mj
= 0x08909AE5
c[7] = c[6] = 0x1F83D9AB
c[6] = c[5] = 0x9B05688C
c[5] = c[4] = 0x510E527F
c[4] = c[3] + t1 = 0xE12D4F0E
c[3] = c[2] = 0x3C6EF372
c[2] = c[1] = 0xBB67AE85
c[1] = c[0] = 0x6A09E667
c[0] = t1 + t2 = 0x446DF4B9
After all 64 iterations, our h-values become:
h[7] = h[7] + c[7] = 0x5be0cd19 + 0x1F83D9AB = 0x185F8DB3
h[6] = h[6] + c[6] = 0x1f83d9ab + 0x9B05688C = 0x2271FE25
h[5] = h[5] + c[5] = 0x9b05688c + 0x510E527F = 0xF561A6FC
h[4] = h[4] + c[4] = 0x510e527f + 0xE12D4F0E = 0x938B2E26
h[3] = h[3] + c[3] = 0xa54ff53a + 0x3C6EF372 = 0x4306EC30
h[2] = h[2] + c[2] = 0x3c6ef372 + 0xBB67AE85 = 0x4EDA5180
h[1] = h[1] + c[1] = 0xbb67ae85 + 0x6A09E667 = 0x7D17648
h[0] = h[0] + c[0] = 0x6a09e667 + 0x446DF4B9 = 0x26381969
STEP 4: Final Hash Assembly
After all previous chunks have been compressed onto our h-values, our final hash is formed by simply appending these values as so:
h0 append h1 append h2 append h3 append h4 append h5 append h6 append h7
0x185F8DB3 2271FE25 F561A6FC 938B2E26 4306EC30 4EDA5180 7D17648 26381969
Ethics of using SHA-2 over SHA-1
Organizations, especially ones that handle sensitive information such as banks, hospitals, schools, etc. have an obligation to take the security of their user’s accounts and information seriously. No one would want their money or health records stored in a vault that has decent, but not the most up to date security. The same applies for hashing passwords. As previously shown SHA-1 has issues while not to the point of being completely broken, vulnerabilities only become more of an issue as computing power increases. Our smartphones are more powerful than computers of two decades ago.
Conclusion
The importance of hashing algorithms are increasingly useful as we go more and more digital. And not only are these hashing algorithms used to protect your passwords, they are also used within cryptocurrency. As time goes on, these hashing algorithms will need to get better to keep up with the ever growing need for security. In fact, a new algorithm, SHA-3, was created in 2015 as a new and better version of SHA-2. This algorithm is not only faster but also more secure against length extension hack. However, SHA-2 isn’t going to be immediately replace and still has many good uses. Highlighting SHA-3 is meant to show there is a constant need for innovation in the encryption field and this will continue to be as long as we are digital with information important to us.
References
https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html https://shattered.io/ https://www.geeksforgeeks.org/difference-between-sha1-and-sha2/ https://eprint.iacr.org/2020/014.pdf https://www.nottingham.ac.uk/research/beacons-of-excellence/future-food/meet-the-team/michael-pound/index.aspx (https://youtu.be/7U-RbOKanYs) https://csrc.nist.gov/csrc/media/publications/fips/180/2/archive/2002-08-01/documents/fips180-2.pdf https://www.comparitech.com/blog/information-security/what-is-sha-2-how-does-it-work/