adding comefrom to luajit
After reading about a function decorator that rewrites the bytecode to enable goto in Python, I realized a very similar technique could be used to manipulate the bytecode in luajit. Of course, being the superior language, Lua already has goto, so for this example we need to add even more advanced control flow, comefrom.
comefrom
The interface is much the same. Wrap the function with a rewriting function (decorator) and add keywords by using Lua’s shortcut syntax for string function calls.
local comefromulate = require "comefrom"
local fn = comefromulate(function(x)
comefrom "end"
if not x then return end
label "start"
print("where'd it go?")
comefrom "start"
print(x)
x = nil
label "end"
end)
fn("hello")
This will print “hello”.
notes
The Lua syntax/grammar is both easy and hard to hack on. The shortcut that allows calling a function makes it look pretty clean. On the other hand, Lua’s rules to prevent errors often get in the way.
A return
statement is only valid at the end of a block, which means if the control flow introduces new blocks, they may not behave as such without if true return end
wrapping.
Lua draws a hard line between statements and expressions. New keywords had best look like function calls in order to qualify as statements.
The jit.bc module has some useful functions in it, particularly dump. The bytecode documentation exists, but it’s pretty spartan. I found it best to use the “official” dump function and write my own and compare results.
implementation
We need to turn the function into a string, then parse the output, and return our newly rebuilt function.
local function comefromulate(fn)
local d = string.dump(fn)
local r = sx.reader(d)
local bcpos, numbc = readhdr(r)
local newbc = patch(fn, r, numbc)
d = d:sub(1, bcpos) .. newbc .. d:sub(1 + bcpos + numbc * 4)
return loadstring(d)
end
The header has lots of fields in it, but we aren’t interested in most of them. It’s slightly wasteful, but we’re going to make the new bytecode exactly the same length, so we can weave it back in without patching up everything. All the offsets will remain as they were.
The key fact that makes everything work is that comefrom "there"
is always turned into the same three bytecode sequence. A load of a global, comefrom, a load of a string, “there”, and then a call.
The first pass of patch just reads the bytecode stream and turns it into an array for later modification. We watch for comefrom as we go.
for i = 1, num do
local op = byte(r)
local a = byte(r)
local c = byte(r)
local b = byte(r)
local d = b * 256 + c
if op == 52 then
local s = jutil.funck(fn, -d - 1)
if s == "label" then
waslabel = i
end
if s == "comefrom" then
wascomefrom = i
end
end
if op == 37 then
local s = jutil.funck(fn, -d - 1)
if waslabel == i - 1 or wascomefrom == i - 1 then
labels[i - 1] = s
end
if wascomefrom == i - 1 then
if targets[s] then
error("already defined comefrom " .. s)
end
targets[s] = i + 1
end
end
newbc[i] = { op, a, c, b }
end
We can’t do anything about a label or comefrom until we see the string, so we use a kind of backwards looking peephole whenever we see a string load. Then we check if the previous instruction was also of interest. (Could also peek ahead, but it’s a little easier this way.)
For labels, we record where they are so they can be patched. For comefrom, we map from name to location. Then we patch.
for i, l in pairs(labels) do
local targ = targets[l] or i + 2
local d = targ - i + 0x8000
local b = math.floor(d / 256)
local c = bit.band(d, 0xff)
newbc[i] = { 84, 99, c, b }
end
for _, l in pairs(labels) do
targets[l] = nil
end
for l, _ in pairs(targets) do
error("can't comefrom " .. l)
end
After fixing up all the labels, we even check to make sure there’s no attempt to comefrom nowhere. You are allowed to have extraneous labels, though. Note that a fair bit of the original bytecode (string load and call) is left behind. Only the first instruction gets changed to a jump.
JMP takes a free slot argument, but I’ve no idea what to use here. I picked 99.
todo
Lua’s goto is only half a keyword. In statements like goto "there"
it’s actually compiled as a function, meaning it would be possible to use the label syntax above. One could even goto a label, and then immediately comefrom it.
Computed comefrom. The argument to label should support variables, which will then jump to the right comefrom.
label(x)
comefrom "good"
rv = 1
comefrom "bad"
rv = -1
comefrom "indifferent"
rv = 0
It’s not as ridiculous if it’s spelled switch(x) case "val"
.
It may be possible to permit cross function comefrom using the debug
library’s introspection to match up lonely labels with eligible comefroms.
Support for variable arguments to comefrom could also be added. Useful for implementing the devil’s control flow.