too much go misdirection
Poking through layers of indirection in go trying to recover some efficiency.
Many functions in go take an io.Reader
interface as input. This is a sensible default, allowing for streaming of data, instead of loading the entirety into memory. In some cases, though, we need the bytes. Which we may already have, in which case it would be great to simply use those bytes. Alas, this can be quite difficult.
context
I’m decoding some images. I’m using libavif and libheif via C bindings. For reasons primarily motivated by simplicity, I’m using the simple memory interfaces for these libraries, which makes it much easier to get the data from go into C. The streaming interface is much more work, and anyway the libraries would then just buffer the data internally, making another copy. Not every decoder fully works in a streaming fashion.
So the primary do the work function takes a []byte
and passes it to C, and there’s a wrapper that does things the go way with an io.Reader
, which does a full read into a temporary buffer before sending it along. Now, as it happens, my application also uses []byte
internally because that’s what I’m getting out of libsqlite3 (because again, the streaming interface is much trickier to wire up) and also because that’s what you get when doing RPC with encoding/gob. I think this is not an unusual scenario.
bytes
What I would like is for my image decoding function to notice that the io.Reader
it has been given is in fact a bytes.Reader
so we can skip the copy. Anyone who’s spent any time looking around in the go standard library has noticed that similar shortcuts are commonplace. Interfaces are type checked against specific implementations, and then optimized code paths are taken. Well, we can do the check, but it doesn’t immediately help, because bytes.Reader
doesn’t expose its internal byte slice.
But it’s in there somewhere and I will not be denied.
if br, ok := r.(*bytes.Reader); ok {
data = *(*[]byte)(unsafe.Pointer(br))
} else {
var buf bytes.Buffer
io.Copy(&buf, r)
data = buf.Bytes()
}
This seems to work in simple tests, but not when using the image.Decode
function. A copy is still made. What’s wrong?
func Decode(r io.Reader) (Image, string, error) {
rr := asReader(r)
}
func asReader(r io.Reader) reader {
if rr, ok := r.(reader); ok {
return rr
}
return bufio.NewReader(r)
}
type reader interface {
io.Reader
Peek(int) ([]byte, error)
}
Turns out the go image library does its own type inspection, looking for a Peek
function, and if it’s not found, wraps the reader in a bufio.Reader
instead. So the bytes.Reader
never makes it into our function as is.
Now, why doesn’t bytes.Reader
implement Peek
? It’s just a byte slice, it’s definitely possible to peek ahead without altering stream state. But it was overlooked, and instead this workaround is applied.
Just knowing that we have a bufio.Reader
isn’t sufficient, because, again, it doesn’t expose the underlying reader to us. It’s fine, okay, whatever. I am a master of unception.
type bufioReader struct {
buf []byte
rd io.Reader
}
if br, ok := r.(*bufio.Reader); ok {
insides := (*bufioReader)(unsafe.Pointer(br))
r = insides.rd
}
The new procedure is to look for a bufio.Reader
and if so, unpack the inner reader. And then, as before, if it’s a bytes.Reader
, we extract the bytes. The zero copy dream is alive.
trees
The bufio.Reader
should probably expose the underling reader.
The bytes.Reader
should really implement Peek
. I’m pretty sure the reason it doesn’t is because this is the only way of creating read only views of slices. And a naughty user could peek at the bytes and then modify them. Sigh. People hate const poisoning, but I hate this more.
bytes.Buffer
provides a Bytes
function, but still not Peek
, so even if you know that’s required or useful, it’s not a simple swap.
forest
I’ve said this before, but the way go does structural typing, and the way the standard library uses it, creates these shadow APIs where blessed types work better than others. It’s almost never documented what the secret requirements are. But can you blame me for wanting to join the party?
I think two interpretations are possible. Casting was added to the language, and it’s used throughout the standard library, thus proving the feature is useful. Or pessimistically, every cast is a design oversight. The approach (in general, not my specific wizardry) only scales to the extent people stick with the standard types. It’s obviously not going to be feasible to specialize on third party types.