flak rss random

comparative unsafety

I wrote some rust code. I used unsafe. It was unsafe. After months of contemplating this unfortunate result, I’ve found someone else to blame.

For context, I was writing an smtp server. Not necessarily a good smtp server, but the kind of smtp server you’d write in two days to see what happens. As such, any conclusions from this experiment are invalid.

gift

At the heart of the problem, and core to an smtp server, is a function to gift files from the server to the recipient.

#[link(name = "c")]
extern "C" {
    fn chown(path: *const i8, uid: i32, gid: i32) -> i32;
}

fn gift(filename: &String, userid: i32) {
    unsafe {
        let path = CString::new(filename.as_str()).unwrap().as_ptr();
        let none = -1;
        chown(path, userid, none);
    }
}

Why am I using my own ffi version of chown instead of the libc crate? The libc crate prototypes chown with unsigned uid_t and gid_t types, and I want to pass -1 because I’m not interested in changing the group. I’ll spare you the long rant about how one should never redeclare system interfaces if you can’t take the time to do so properly, because it turns out if you dig into it, uid_t boils down to uint32_t, but even so, the manual for chown says -1 should work and I want it to work. Using the libc chown and adding try_into to none causes it to panic.

In the face of adversity, I forged my own path of least resistance, and gave myself a chown prototype that worked just the way I wanted. But none of this is really relevant to the matter at hand. The example code uses ffi because the real code used ffi and that’s why.

The real problem is that path is a pointer to freed memory. The CString result burrito has already been consumed and recycled by the time we reach chown. This is bad.

This is also not a bug class I’m unfamiliar with. In the C++ days, one learned to be very careful around sloppy c_str usage lest a temporary string disappear from underneath you. So I’m aware of the disappearing temporary as a bug class, theoretically and practically. And I know rust has lots of temporary objects, and makes a big deal of object lifetimes, the whole deal. Probably could have figured this all out.

But still, I wasn’t quite expecting this to happen. Assorted chains of unwrap, as_this, as_that, over_here, what_now are everywhere. The borrow checker keeps me safe. And this is unsafe code, but... You can’t “turn off the borrow checker” in Rust. This isn’t quite what Steve had in mind, I think, but to reiterate, I had two expectations. Borrow checker doesn’t let temporaries disappear, and really, it doesn’t. Elsewhere, I’d get all sorts of compiler errors (and helpful fix it hints) whenever something bad would happen. I mostly became accustomed to writing code in this fashion.

Nevertheless, tragedy struck. I was graciously informed of the error, and looked into it a bit to see if there was a better way to avoid it.

My first inclination was that I was lazy and had made the unsafe block too large.

fn gift(filename: &String, userid: i32) {
    let path = CString::new(filename.as_str()).unwrap().as_ptr();
    let none = -1;
    unsafe {
        chown(path, userid, none);
    }
}

Surely this will elicit a compiler error? Nope. I was surprised to discover that CString::as_ptr is not unsafe. It gives you a poisoned pointer that can’t be touched except in unsafe, but the call itself is perfectly safe. ish.

The fix is to write it correctly, of course.

fn gift(filename: &String, userid: i32) {
    let path = CString::new(filename.as_str()).unwrap();
    let none = -1;
    unsafe {
        chown(path.as_ptr(), userid, none);
    }
}

What disturbs me about this solution is that it doesn’t look obviously better. If anything, let’s pack all the unwrap, as_this, as_that, over_here, what_now onto a single line so that we end up with just what we want seems like the more rusty way to do things.

The documentation does specifically call out this error. I’m pretty sure I read it. But I think there was a delay between reading the documentation, in which I wrote some other code, then came back to write this code and kinda winged it.

clippy

I was also informed that clippy will find and report this error. It does.

It also complains about a million other things, including the fact that I write numbers as 1000000 and not 1_000_000. There’s something about errors and warnings, but some of my other stylistic atrocities like single iteration loops are marked as errors as well. I used it for a while, but generally found it too tiresome.

If there’s an error which is a clear cut bug, I think it should be reported by an error detecting tool, not a linter. In the case of rust, that may be the compiler. Or a better tuned version of clippy. Given the focus of rust, it just feels like this error slipped between the cracks of should have been caught by the compiler and actually caught by the thing that tells me I’m not allowed to use w, x, y, and z variable names.

(In C, the compiler will let you get away with murder, but the compiler doesn’t purport to be a life saving device, either.)

go

And now for the inevitable comparison to go. I have not had occasion to ffi my own chown in go because it’s included in the stdlib, conveniently prototyped with signed ints so -1 just works, but I have done so to get access to functions like unveil. For consistency, I’ll just stick with writing gift to use cgo to call chown.

/*         
#include <stdlib.h>
int chown(const char *, int, int);
*/
import "C"

func gift(filename string, userid int) {
        path := C.CString(filename)
        C.free(unsafe.Pointer(path))
        C.chown(path, C.int(userid), -1)
}

This contains the same bug, freeing path before calling chown. But I think the error stands out a bit more. It should be obvious, even to casual reviewers, that calling free here is bad.

func gift(filename string, userid int) {
        path := C.CString(filename)
        defer C.free(unsafe.Pointer(path))
        C.chown(path, C.int(userid), -1)
}

The fixed code looks a lot like other go code I write. We use defer to cleanup http.Response.Body, sql.Rows, os.File, etc. And so, at a glance, I can tell the difference between the corrected version and the mistaken version. I use the same idiomatic style in both safe and unsafe code.

The go version is also susceptible to other errors, like completely failing to free the allocated string. I would like to believe this is avoidable with a little practice. The use of defer certainly makes it easy to cover every path, and it’s a familiar pattern.

thoughts

I don’t know how confident I am that I won’t make the same error again.

Reviewing the rest of the code, I did use CString without error in several other places, but in that one instance I put the as_ptr right there with the constructor. There’s no conscious reason for the difference, and if I hadn’t been aware of the trouble it can cause, I may even have tidied up the good calls to look like the bad call.

Elsewhere, there’s code like captures.get(1).unwrap().as_str().to_string() where I just kept tacking accessors onto the end until I got what I wanted. This code is admittedly hastily written and would not be my entrant for a programming pearls contest, but it mostly reads ok I think. I read rust code like this, I write code like this, it compiles, it works, everything is fine.

When I went back to read the documentation for as_ptr again, it seemed very familiar, so it’s plausible I wrote the first few instances correctly and carefully. Then when it came time to write the problematic instance, I used my own existing code as a guide, and tuned it up a bit as I retyped it to more closely resemble the code I’d written in between.

And that’s what troubles me. I learn this is normal, but then there are certain magic situations where it’s not normal and different rules apply. Or rather the same rules, but now they matter in a different way. There’s a feeling like I was lulled into false sense of security. The borrow checker works very well, I expect it to keep working, and it even does in unsafe blocks, just unexpectedly not for this function.

Even now, aware of the problem, I can think to myself that surely next time I’ll look out for this, but I also know it usually takes a few lessons before the learning is done. It would be optimistic to assume I’ll never blink and slide right past this error again.

In fairness, I should add that the error was pointed out to me by several people, in quick succession, so it’s entirely possible for reviewers attuned to it to spot it more readily.

The go approach to cgo and unsafety certainly has more failure modes. And less tooling assistance, since go vet won’t warn about using a freed pointer. Nevertheless, just glancing at the incorrect version of the function, the error pops out at me. This may be partially due to familiarity, but I don’t believe that fully explains it.

I can see the error in the go code, because the error is right there to see. The error is in the code. The error in the rust code isn’t in the code. It’s in the invisible spaces, the gaps between the code. It’s a subtle difference, but it feels significant.

ps

The error would have been detected far sooner had I not been lazy and checked the return value of chown (for a file not found error). But I was running as root, knew the file existed, and wasn’t taking things too seriously. Why even bother checking? So score one for team check every function for failure. That’s my usual team, except when it isn’t.

I’m also curious if the mistake would have stung nearly so much if I’d found it immediately instead of putting it out there for somebody else to find. I suspect I would have quickly dismissed it as just another quirk of how things are done, and failed to learn the bigger lesson that this is something that one should be vigilant in guarding against. Teaching mode requires the stove be preheated to the optimal temperature.

Posted 17 Aug 2020 13:53 by tedu Updated: 17 Aug 2020 18:24
Tagged: go programming rust