timing attacks vs hash tables
First, start with there are no good constant-time data structures. After reading the HN thread, I wanted to see if the attack was truly viable. Can we recovery a JSESSIONID? My previous efforts attacking Lua took a slightly different tack.
To start, we need a vulnerable server. Worst case would be a simple hash table with chaining, such as one might build with the BSD queue macros. We’ll use a very simple hash function to make controlling the bucket simpler. And, of course, strcmp
.
I’m exploiting some of my own knowledge attacking this server. For instance, I know there’s only four buckets, and I know the hash function, and I know the correct token values. The purpose wasn’t to build a fully weaponized attack, just to identify what timing irregularities may exist.
#include <sys/queue.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
static inline uint64_t
rdtsc(void)
{
uint32_t hi, lo;
__asm volatile("rdtsc" : "=d" (hi), "=a" (lo));
return (((uint64_t)hi << 32) | (uint64_t) lo);
}
static inline uint32_t
hashstr(const char *s)
{
const unsigned char *p = s;
uint32_t hash = 5381;
while (*p)
hash = (hash << 5) + hash + *p++;
return hash;
}
struct token {
TAILQ_ENTRY(token) next;
char ident[16];
int userid;
};
struct hashtable {
TAILQ_HEAD(, token) *buckets;
int numbuckets;
};
void
printhashtable(struct hashtable *ht)
{
struct token *tok;
for (int i = 0; i < ht->numbuckets; i++)
TAILQ_FOREACH(tok, &ht->buckets[i], next)
printf("bucket %d ident %s userid %d\n", i, tok->ident, tok->userid);
}
int
finduser(struct hashtable *ht, char *ident)
{
uint32_t hash = hashstr(ident);
struct token *tok;
TAILQ_FOREACH(tok, &ht->buckets[hash % ht->numbuckets], next)
if (strcmp(tok->ident, ident) == 0)
return tok->userid;
return -1;
}
void
adduser(struct hashtable *ht, char *ident, int userid)
{
struct token *tok = malloc(sizeof(*tok));
strlcpy(tok->ident, ident, sizeof(tok->ident));
tok->userid = userid;
uint32_t hash = hashstr(tok->ident);
TAILQ_INSERT_TAIL(&ht->buckets[hash % ht->numbuckets], tok, next);
}
void
findduration(struct hashtable *ht, char *ident)
{
uint64_t duration = 0;
finduser(ht, ident);
finduser(ht, ident);
for (int i = 0; i < 500; i++) {
uint64_t before = rdtsc();
finduser(ht, ident);
uint64_t after = rdtsc();
duration += after - before;
}
printf("duration for %s (bucket %d): %lld\n", ident,
hashstr(ident) % ht->numbuckets, duration);
}
int
main(int argc, char **argv)
{
struct hashtable *ht = malloc(sizeof(*ht));
ht->numbuckets = 4;
ht->buckets = reallocarray(NULL, ht->numbuckets, sizeof(*ht->buckets));
for (int i = 0; i < ht->numbuckets; i++)
TAILQ_INIT(&ht->buckets[i]);
adduser(ht, "Abcdef123456789", 1);
adduser(ht, "Bbcdef123456789", 2);
adduser(ht, "Cacdef123456789", 3);
adduser(ht, "Dbcdef123456789", 4);
adduser(ht, "Ebcdef123456789", 5);
adduser(ht, "Fbcdef123456789", 6);
printhashtable(ht);
/* find a single entry bucket */
printf("attacking the buckets!\n");
findduration(ht, "Z00000000000002"); /* bucket 1 pop 0 */
findduration(ht, "Z00000000000003"); /* bucket 2 pop 1 */
findduration(ht, "Z00000000000000"); /* bucket 3 pop 2 */
findduration(ht, "Z00000000000001"); /* bucket 0 pop 3 */
/* learn the string in bucket 2 */
printf("attacking the string!\n");
findduration(ht, "000000000000001"); /* match 0 */
findduration(ht, "D00000000000001"); /* match 1 */
findduration(ht, "Db0000000000003"); /* match 2 */
findduration(ht, "Dbc000000000000"); /* match 3 */
findduration(ht, "Dbcdef120000002"); /* match 8 */
findduration(ht, "Dbcdef123000003"); /* match 9 */
}
Compile and run.
gcc -std=c99 -O2 -o hashattack hashattack.c && ./hashattack
bucket 0 ident Bbcdef123456789 userid 2
bucket 0 ident Cacdef123456789 userid 3
bucket 0 ident Fbcdef123456789 userid 6
bucket 2 ident Dbcdef123456789 userid 4
bucket 3 ident Abcdef123456789 userid 1
bucket 3 ident Ebcdef123456789 userid 5
attacking the buckets!
duration for Z00000000000002 (bucket 1): 111388
duration for Z00000000000003 (bucket 2): 142084
duration for Z00000000000000 (bucket 3): 176934
duration for Z00000000000001 (bucket 0): 194512
attacking the string!
duration for 000000000000001 (bucket 2): 143310
duration for D00000000000001 (bucket 2): 167790
duration for Db0000000000003 (bucket 2): 160232
duration for Dbc000000000000 (bucket 2): 168348
duration for Dbcdef120000002 (bucket 2): 194096
duration for Dbcdef123000003 (bucket 2): 216202
That’s only a single run, but it’s enough to get a picture. It’s about the same if hashattack is run again. The numbers show better (less) variability as the iteration count is increased from 500, but pounding a webserver with a half million requests to find what bucket a string hashes into is likely to be noticed. At least by anyone except Apple (ha!).
A few things are visible. Chasing down pointers as we traverse the list for a bucket takes time. I’ve skipped writing the code that determines how many buckets there are (exercise for the reader!) and jumped straight to the part where I probe each bucket in order. Each probe takes a little longer as the population increases. For reference, a difference of about 30000 represents a real time difference of about 12 microseconds. These numbers are also all the accumulated durations, not the average, fwiw.
Now that we’ve identified bucket 2 as containing a single token, let’s attack the string. Longer strings take longer to compare, but it’s harder to see any byte by byte differences. Once we’ve gotten the first letter correct, it’s hard to tell if the second or third letters are correct. If we’ve guessed eight letters, we can see a difference compared to none, but getting to that point may be difficult.
That’s with my laptop running at low speed. At turbo speed, the numbers look more like this.
attacking the buckets!
duration for Z00000000000002 (bucket 1): 39993
duration for Z00000000000003 (bucket 2): 50488
duration for Z00000000000000 (bucket 3): 51985
duration for Z00000000000001 (bucket 0): 43142
attacking the string!
duration for 000000000000001 (bucket 2): 28357
duration for D00000000000001 (bucket 2): 30691
duration for Db0000000000003 (bucket 2): 32046
duration for Dbc000000000000 (bucket 2): 32762
duration for Dbcdef120000002 (bucket 2): 70072
duration for Dbcdef123000003 (bucket 2): 70193
Now it seems we have a challenge on our hands. We’ve lost our bucket ordering! And the string timings are getting really close. Actually, that’s just one run. Here’s another.
attacking the buckets!
duration for Z00000000000002 (bucket 1): 45178
duration for Z00000000000003 (bucket 2): 48686
duration for Z00000000000000 (bucket 3): 52857
duration for Z00000000000001 (bucket 0): 61896
attacking the string!
duration for 000000000000001 (bucket 2): 46778
duration for D00000000000001 (bucket 2): 53483
duration for Db0000000000003 (bucket 2): 49679
duration for Dbc000000000000 (bucket 2): 63234
duration for Dbcdef120000002 (bucket 2): 63091
duration for Dbcdef123000003 (bucket 2): 64418
Hurray. The bucket ordering came back, though it’s still very close. The string timings are useless, now, though.
There’s something to be said about using real statistics instead of running a program twice, but even if we tune up the attack, it’s running against code written to be easy to exploit and under ideal conditions. Attacking over the internet will add more noise and require more probes to extract a signal from.
A real world attack, targeting a system using better hashing and chaining algorithms, may defeat us entirely. Maybe viable, maybe not, but not my first choice of attack against a web site.
Somebody beat me to it! Leaking information with timing attacks on hashtables describes another slightly different approach.