analysis of d2i_X509 reuse
A little while ago, Tavis Ormandy twitterated about an OpenSSL bug he reported. This didn’t sound good, so I took a look.
d2i_X509
Read the linked email for all the details and a test case. The short version is that cX509 *d2i_X509(X509 **px, const unsigned char **in, int len);
says you can reuse allocated memory, except when you can’t, but the failure mode isn’t pretty. Basically, there are three ways to call this function. With px == NULL
, it will allocate and return memory. With *px == NULL
it will allocate and return memory, and set px. With *px != NULL
, it will use the memory provided. The bug is that if you choose door number three, it needs to be freshly allocated. You can’t reuse an existing structure, or it may lead to false cert validation.
I had some ideas about the cause of this. First thing to do is crack open the source to d2i_X509
. That’s this code:
IMPLEMENT_ASN1_FUNCTIONS(X509)
Err, sorry. This code:
#define IMPLEMENT_ASN1_FUNCTIONS(stname) IMPLEMENT_ASN1_FUNCTIONS_fname(stname, stname, stname)
Err, this code:
#define IMPLEMENT_ASN1_FUNCTIONS_fname(stname, itname, fname) \
IMPLEMENT_ASN1_ENCODE_FUNCTIONS_fname(stname, itname, fname) \
IMPLEMENT_ASN1_ALLOC_FUNCTIONS_fname(stname, itname, fname)
Ok, clearly cutting this Möbius strip in half is going to take a while. Let’s skip ahead a bit:
X509 *d2i_X509(X509 **a, const unsigned char **in, long len) { return (X509 *)ASN1_item_d2i((ASN1_VALUE **)a, in, len, (&(X509_it))); } int i2d_X509(X509 *a, unsigned char **out) { return ASN1_item_i2d((ASN1_VALUE *)a, out, (&(X509_it))); } X509 *X509_new(void) { return (X509 *)ASN1_item_new((&(X509_it))); } void X509_free(X509 *a) { ASN1_item_free((ASN1_VALUE *)a, (&(X509_it))); }
One more indirection and we arrive at ASN1_item_ex_d2i
in crypto/asn1/tasn_dec.c. This function could be doing basically anything, but it’s a place to start vivisecting. (Note that ASN1_VALUE
contains the values for an object, and ASN1_ITEM
is the vtable for that object.)
root cause
Popping back up a level, the other interesting function for the test case is X509_verify_cert
. This lives in the x509 directory. Skipping ahead to the good parts, in internal_verify
:
if (!xs->valid && (xs != xi || (ctx->param->flags & X509_V_FLAG_CHECK_SS_SIGNATURE)))
{
/* do some verification stuff */
if (!ok) goto end;
}
xs->valid = 1;
Now our theory comes together. The valid flag gets set because the first cert verified, but when we parse the second cert it’s not cleared. If the first cert was valid, why shouldn’t the second one be, right?
quantum entanglement
The X509 structure is ref counted. I’m not smart enough to understand all the ways in which shallow copies might be made, but I do know what happens when you overwrite the data that those copies are pointing to. Spooky action at a distance.
documentation
To be fair, the man page does have a warning.
In some versions of OpenSSL the "reuse" behaviour of d2i_X509() when *px is valid is broken and some parts of the reused structure may persist if they are not present in the new one.
No mention is made of which versions. (The above sentence has been there for at least 12 years.) No mention as to what broken means and no indication of how one might detect this behavior. And this is only after the first paragraph of the man page tells you that reuse is ok. (In context, the word “valid” here only means well formed, not verified as in the code above. See also the warning about an uninitialized pointer being invalid.)
The broken code looks very similar to working code. It will probably pass all your unit tests because they won’t reuse the cert even while your production code does.
fix
This function is too dangerous to go on living. We decided just clearing the valid flag is insufficient because other fields may need clearing as well and it’s hard to even know what they are thanks to the Macros of Mordor. Also, that quantum entanglement thing. The easiest fix is to have d2i_X509
(and other ASN.1 functions) always allocate a new structure. If the caller provided us with data, we free it and start over. Wasteful, but safe. (It’s not actually freed in the case where it has been shared; see cif (asn1_do_lock(pval, -1, it) > 0)
in ASN1_item_ex_free
.)
UGH. DON’T ACTUALLY DO THIS. NOT ALL ASN1 OBJECTS ARE HEAP ALLOCATED. THIS API IS A POX ON HUMANITY.
diff -u -p -r1.16 tasn_dec.c
--- tasn_dec.c 18 Apr 2014 14:34:07 -0000 1.16
+++ tasn_dec.c 18 Apr 2014 14:36:47 -0000
@@ -171,6 +171,11 @@ ASN1_item_ex_d2i(ASN1_VALUE **pval, cons
if (!pval)
return 0;
+ /* always start fresh */
+ if (*pval) {
+ ASN1_item_ex_free(pval, it);
+ *pval = NULL;
+ }
if (aux && aux->asn1_cb)
asn1_cb = aux->asn1_cb;
else
Back to clearing. Seems safer, given the uncertain nature of the caller, but solves fewer cases in the worst case. Arguably, overwriting shared data is a form of use after free, and the caller’s fault, but the API makes it impossible for the caller to know that. Really, features that cannot possibly be used correctly should simply not exist.
diff -u -p -r1.19 tasn_dec.c
--- tasn_dec.c 19 Apr 2014 16:12:39 -0000 1.19
+++ tasn_dec.c 19 Apr 2014 17:36:01 -0000
@@ -388,10 +382,14 @@ ASN1_item_ex_d2i(ASN1_VALUE **pval, cons
goto err;
}
- if (!*pval && !ASN1_item_ex_new(pval, it)) {
- ASN1err(ASN1_F_ASN1_ITEM_EX_D2I,
- ERR_R_NESTED_ASN1_ERROR);
- goto err;
+ if (!*pval) {
+ if (!ASN1_item_ex_new(pval, it)) {
+ ASN1err(ASN1_F_ASN1_ITEM_EX_D2I,
+ ERR_R_NESTED_ASN1_ERROR);
+ goto err;
+ }
+ } else {
+ memset(*pval, 0, it->size);
}
if (asn1_cb && !asn1_cb(ASN1_OP_D2I_PRE, pval, it, NULL))
Ugh. Don’t do this either. Key comment from the code:
struct ASN1_ITEM_st {
long size; /* Structure size (usually)*/
};
impact
A quick scan of the OpenBSD src tree didn’t reveal any problems, but somewhere out there is a dogecoin exchange that is much vulnerable. The best thing to do is to always pass NULL as the first argument. The only way to allocate an X509 structure is with the same function that d2i will call, so just let it do the work.
As demonstrated, the two hardest problems in CS are naming things and validation caching.