flak rss random

adding comefrom to luajit

After reading about a function decorator that rewrites the bytecode to enable goto in Python, I realized a very similar technique could be used to manipulate the bytecode in luajit. Of course, being the superior language, Lua already has goto, so for this example we need to add even more advanced control flow, comefrom.

comefrom

The interface is much the same. Wrap the function with a rewriting function (decorator) and add keywords by using Lua’s shortcut syntax for string function calls.

local comefromulate = require "comefrom"

local fn = comefromulate(function(x)
        comefrom "end"
        if not x then return end
        label "start"
        print("where'd it go?")
        comefrom "start"
        print(x)
        x = nil
        label "end"
end)
fn("hello")

This will print “hello”.

notes

The Lua syntax/grammar is both easy and hard to hack on. The shortcut that allows calling a function makes it look pretty clean. On the other hand, Lua’s rules to prevent errors often get in the way.

A return statement is only valid at the end of a block, which means if the control flow introduces new blocks, they may not behave as such without if true return end wrapping.

Lua draws a hard line between statements and expressions. New keywords had best look like function calls in order to qualify as statements.

The jit.bc module has some useful functions in it, particularly dump. The bytecode documentation exists, but it’s pretty spartan. I found it best to use the “official” dump function and write my own and compare results.

implementation

We need to turn the function into a string, then parse the output, and return our newly rebuilt function.

local function comefromulate(fn)
        local d = string.dump(fn)
        local r = sx.reader(d)
        local bcpos, numbc = readhdr(r)
        local newbc = patch(fn, r, numbc)
        d = d:sub(1, bcpos) .. newbc .. d:sub(1 + bcpos + numbc * 4)
        return loadstring(d)
end

The header has lots of fields in it, but we aren’t interested in most of them. It’s slightly wasteful, but we’re going to make the new bytecode exactly the same length, so we can weave it back in without patching up everything. All the offsets will remain as they were.

The key fact that makes everything work is that comefrom "there" is always turned into the same three bytecode sequence. A load of a global, comefrom, a load of a string, “there”, and then a call.

The first pass of patch just reads the bytecode stream and turns it into an array for later modification. We watch for comefrom as we go.

        for i = 1, num do
                local op = byte(r)
                local a = byte(r)
                local c = byte(r)
                local b = byte(r)
                local d = b * 256 + c
                if op == 52 then
                        local s = jutil.funck(fn, -d - 1)
                        if s == "label" then
                                waslabel = i
                        end
                        if s == "comefrom" then
                                wascomefrom = i
                        end
                end
                if op == 37 then
                        local s = jutil.funck(fn, -d - 1)
                        if waslabel == i - 1 or wascomefrom == i - 1 then
                                labels[i - 1] = s
                        end
                        if wascomefrom == i - 1 then
                                if targets[s] then
                                        error("already defined comefrom " .. s)
                                end
                                targets[s] = i + 1
                        end
                end
                newbc[i] = { op, a, c, b }
        end

We can’t do anything about a label or comefrom until we see the string, so we use a kind of backwards looking peephole whenever we see a string load. Then we check if the previous instruction was also of interest. (Could also peek ahead, but it’s a little easier this way.)

For labels, we record where they are so they can be patched. For comefrom, we map from name to location. Then we patch.

        for i, l in pairs(labels) do
                local targ = targets[l] or i + 2
                local d = targ - i + 0x8000
                local b = math.floor(d / 256)
                local c =  bit.band(d, 0xff)
                newbc[i] = { 84, 99, c, b }
        end
        for _, l in pairs(labels) do
                targets[l] = nil
        end
        for l, _ in pairs(targets) do
                error("can't comefrom " .. l)
        end

After fixing up all the labels, we even check to make sure there’s no attempt to comefrom nowhere. You are allowed to have extraneous labels, though. Note that a fair bit of the original bytecode (string load and call) is left behind. Only the first instruction gets changed to a jump.

JMP takes a free slot argument, but I’ve no idea what to use here. I picked 99.

todo

Lua’s goto is only half a keyword. In statements like goto "there" it’s actually compiled as a function, meaning it would be possible to use the label syntax above. One could even goto a label, and then immediately comefrom it.

Computed comefrom. The argument to label should support variables, which will then jump to the right comefrom.

label(x)
comefrom "good"
        rv = 1
comefrom "bad"
        rv = -1
comefrom "indifferent"
        rv = 0

It’s not as ridiculous if it’s spelled switch(x) case "val".

It may be possible to permit cross function comefrom using the debug library’s introspection to match up lonely labels with eligible comefroms.

Support for variable arguments to comefrom could also be added. Useful for implementing the devil’s control flow.

code

Don’t feed the animals.

Posted 22 Sep 2015 15:09 by tedu Updated: 22 Sep 2015 15:23
Tagged: lua programming