generalized secure hash algorithm
The key to a good password hashing function (whether it’s for storing the hash in an authentication system or to generate a key for encryption) is that it be slow. Well, not actually slow, but difficult. Well, not actually difficult, but expensive. The progression of hashing algorithms reflects this, from crypto to bcrypt to scrypt. The most frequently (only?) cited section of the scrypt paper is the table at the end comparing the cost to build cracking machines for various algorithms. The focus is on designing an algorithm that is computationally expensive, such that it becomes financially expensive to build a cracker. Nevertheless, as expensive as it would be, it is possible to build an scrypt cracker. What if we could use an algorithm that made it impossible, not just expensive, to build a cracking machine?
choices
Let’s start by considering a simple scheme. We start by calculating the MD5 of a password. Then we check the first bit and decide to either run MD5 again, or switch to SHA1. Then we iterate.
h = MD5(pass)
if h & 1
while (count++ < 10000) h = MD5(h)
else
while (count++ < 10000) h = SHA1(h)
This seems a little silly, but let’s look at it from the perspective of an attacker who has decided to go all in, with custom hardware. Their $1 million MD5 cracking supercomputer can only possibly crack half our passwords. In order to get the other half, they need to outlay another $1 million (plus) for a SHA1 cracker.
Now that’s not such a big deal, but what if we add a few hashes into the mix? SHA256, SHA512. Maybe RIPEMD. Whirlpool. We’re making some progress, but so far we’ve still only managed to make it (moderately) more expensive. The people who can build $1 million hash crackers can probably build five, or ten, or twenty of them without much difficulty. What we need are lots more hashes. More hashes than currently exist.
MD5
Let’s back up and look inside MD5. libc has a bit of code like below. (There’s lots more, this is trimmed down to the interesting bits.)
a = state[0];
b = state[1];
c = state[2];
d = state[3];
MD5STEP(F1, a, b, c, d, in[ 0] + 0xd76aa478, 7);
MD5STEP(F1, d, a, b, c, in[ 1] + 0xe8c7b756, 12);
MD5STEP(F1, c, d, a, b, in[ 2] + 0x242070db, 17);
MD5STEP(F1, b, c, d, a, in[ 3] + 0xc1bdceee, 22);
MD5STEP(F1, a, b, c, d, in[ 4] + 0xf57c0faf, 7);
MD5STEP(F1, d, a, b, c, in[ 5] + 0x4787c62a, 12);
MD5STEP(F2, a, b, c, d, in[ 1] + 0xf61e2562, 5);
MD5STEP(F2, d, a, b, c, in[ 6] + 0xc040b340, 9);
MD5STEP(F2, c, d, a, b, in[11] + 0x265e5a51, 14);
MD5STEP(F2, b, c, d, a, in[ 0] + 0xe9b6c7aa, 20);
MD5STEP(F2, a, b, c, d, in[ 5] + 0xd62f105d, 5);
We copy our internal state into four variables (registers?), and then do some funky stuff to them. More on that in a bit. The important thing to note is that the a, b, c, d terms are slowly moving around, and that in the final chunk, we’re accessing the input nonlinearly.
#define F1(x, y, z) (z ^ (x & (y ^ z)))
#define F2(x, y, z) F1(z, x, y)
#define MD5STEP(f, w, x, y, z, data, s) \
( w += f(x, y, z) + data, w = w<<s | w>>(32-s), w += x )
In a nutshell, each “step” of MD5 adds a word of input data to one of our registers, and then mixes a few registers together with xor, plus a roll like operation.
SHA2
Similar functions exist in the implementation of many other hash functions, such as SHA256 and SHA512. The libc implementation is possibly clearer than for MD5.
do {
/* Part of the message block expansion: */
s0 = W256[(j+1)&0x0f];
s0 = sigma0_256(s0);
s1 = W256[(j+14)&0x0f];
s1 = sigma1_256(s1);
/* Apply the SHA-256 compression function to update a..h */
T1 = h + Sigma1_256(e) + Ch(e, f, g) + K256[j] +
(W256[j&0x0f] += s1 + W256[(j+9)&0x0f] + s0);
T2 = Sigma0_256(a) + Maj(a, b, c);
h = g;
g = f;
f = e;
e = d + T1;
d = c;
c = b;
b = a;
a = T1 + T2;
j++;
} while (j < 64);
Where the missing bits look like:
#define S32(b,x) (((x) >> (b)) | ((x) << (32 - (b))))
#define Sigma0_256(x) (S32(2, (x)) ^ S32(13, (x)) ^ S32(22, (x)))
#define Sigma1_256(x) (S32(6, (x)) ^ S32(11, (x)) ^ S32(25, (x)))
Rolls and xors and adds, oh my.
GSHA
This suggests that an entire family of nearly identical hash functions exist. For example, what if we roll by 5 bits instead of 7? Or 9? Or 11? We can generalize the idea of a secure hash algorithm.
Consider a hash function GSHA512, very similar to SHA512, but with slight variations on each of its constants. You could use GSHA512 #42, or GSHA512 #98765, or even GSHA512 #658743092112345678890 if there were enough variants available. 2^512 variants should be enough for anyone.
GSHA512 is a 512 bit hash function, like its namesake SHA512. It has 8 64 bit registers for internal state. It consumes input 512 bits at a time, and stirs it around. To construct a GSHA512 hash function from a 512 bit key, we consume the key 8 bits at a time. Each byte tells us what to do for each of 64 rounds of the hash. (SHA512 has 80 rounds, but we’re out of input key material.)
We always take a word of the input. Then we pick a “source” register and roll it by some number of bits. We xor those two values together, then add them into the “destination” register.
dstreg += roll(srcreg, rbits) ^ input[i]
implementation
A proof of concept implementation is available, mainly to demonstrate and clarify the idea because the above definition isn’t precise. It includes a vaguely PBKDF2 like gsha_kdf
function.
Don’t use these functions for anything but password hashing. (Don’t use them at all is even sounder advice.) Despite the name, these functions are not general purpose. Perhaps generated would be a better name. GSHA will almost certainly give up collisions easier than SHA, but collisions are not a practical concern for password hashing or KDF operations.
Updated note: It doesn’t affect the security of GSHA per se, but Malicious SHA-1 has some good info on the dangers of fiddling with the constants.
analysis
A quick back of the envelope security assessment. Have we made an uncrackable password hashing algorithm?
Safe to say we’ve defeated custom silicon. Nobody has a fab that can trace out millions of distinct custom circuits per second.
FPGA is finished too. Assuming you don’t melt it trying, you can’t reprogram an FPGA fast enough.
GPUs are harder. Without having tried it, my gut tells me you won’t be able to copy out the GSHA code to the GPU fast enough to make it worthwhile.
An attacker with lots of CPUs can still crack our password, but CPUs are very expensive. What if somebody could fab their own very cheap, very limited CPUs? Like a 100000 core CPU with only just enough cache to implement GSHA? Now we may be in trouble. The transistor count for GSHA is quite low, but they need to be the special high speed general purpose kind of transistor circuit. The scrypt paper notes that a CPU could be cheaper than RAM if stripped of all its extra functionality, but in practice it’s hard to calculate all the tradeoffs. Of course, one could also use GSHA as the core to something like scrypt instead of Salsa20.
(This part isn’t very precise. The idea is that a cracker would look less like a SHA512 cracker, capable only of performing one hash, and more like a typical CPU, capable of performing many hashes. Requiring the attacker to be adaptable in this way brings their costs in line with our costs. Maybe. Waves hands.)
I don’t plan to use GSHA, but it was fun to build.