enjargo - another way to generate go json encoders
Some people, when presented with a data structure, think let’s encode this to json. Now they have two problems. Encoding and decoding. In response to this dilemma, various libraries were created, such as rust serde or go encoding/json, to facilitate drama and debate about which approach is best.
Enter enjargo, another approach for go quite different from the standard library, which exists mostly to complete the D triumvirate with a bit of didacticism, and not so much to be a practical option.
In go, encoding/json works by using the reflect package to iterate over struct fields. Then it encodes them. It does this at run time. So you don’t have to write any code, or generate any code, and it mostly just works with any type you can cook up, without much prior effort. It has some limitations, such as only working on public fields, and there are some performance considerations.
In rust, I had a vague idea how serde worked. You annotate some types, and then it creates an encoder for you. Something about macros which turn your code into more code. Didn’t think too much about how specifically it accomplished this, and apparently not many other people did either until recently. Another way to describe it would be to say that serde is a program that looks at your code, then it outputs some json encoding functions, but it must reside in its own crate with a special type for complex technical reasons.
Wait, so what’s the difference between a macro and a code generator? The requirement that a macro runs in the compiler seems pretty arbitrary. The more I read about how serde works, the more I thought it doesn’t sound so different from the stringer utility from the go team. I mean, I like compilers, but not quite at the level of yo dawg, I heard you like compilers. I’m okay if one of my compilers runs alongside my other compiler, instead of inside it.
And so I thought, what if I make a json encoder for go that works more like the way serde works? How different would it be?
I personally don’t use code generation because I don’t trust computers. They are not our friends. But it’s important to study them and know how they work.
The go compiler does not support generating code while you compile. You have to do that in advance. So using enjargo will require two additional lines in your Makefile, but such lines are cheap. There is also this weird go generate thing, but go build doesn’t run it automatically, so I don’t really see the point. Just run the command you want to run.
On to what enjargo does, it reads your code, looks for structs, then creates a go file with
WriteJson(w io.Writer) methods for them. It specifically only looks for structs with a comment containing the word enjargo. encoding/json relies on tags, which are like comments that survive into the executable and are available via reflection, but we can just use comments here. Then you run go build, and everything gets compiled.
To differentiate from encoding/json, enjargo will output private fields. No more tagging every field with the same name, but lowercase. The advantage of having access to the full parse tree, not just the information available via reflection.
It doesn’t use the same interface as encoding/json because presumably you’re a high performance commando and you live your life zero allocations at a time. Obviously, it could be made to implement the json marshal interface if we wanted to be useful.
I started off using stringer as a reference. It adds String methods, I’m adding WriteJson methods, seems really similar. This got me as far as parsing the code, and setting up the ast walk. But then quickly hit a brick wall. I need more than just the names of constants passing by; I need the names and types of fields. We’ve reached the staring at the screen while awaiting lightning stage of development.
First, it’s really convenient that go includes as part of the standard library a well maintained and up to date parser and ast package. Every similar project in C would begin with, let’s write a C parser. This entire goofy project was only viable because I could hook in at almost exactly the same point as a compiler macro.
Second, expectations may vary, but the ast you get is pretty basic. The walker will call your visitor with nil after visiting children, but unless you keep very careful track of which children can have children, it’s easy to get desynced. The ast.Node type itself contains no useful information. You need to type cast everything to another type, presumably in a big switch, to see what it is. Alas, there’s no complete list of node types? I spent a frustrating amount of time trying to build out a visitor function state machine that would record needed info as things passed by. Would not recommend.
So if there’s no way to work with the ast as it’s pushed to you, what else is there? Guess and check, via ast.Print. The walker is good for finding nodes to process, but then you’re on your own to traverse it further. The ast printout helpfully includes the actual types of nodes in the tree, so you know which casts to use. In the end, it worked out, and I probably would have used this technique anyway, but I think it’s impossible to write working code from just the meager documentation.
Initially, I started out just emitting code directly, but Fprintf generating Fprintf calls with quotes inside of format strings inside of format strings got a bit weird. Switched to text templates. This removed several layers of indirection at the cost of the generated code looking bloaty. It’s possible to control whitespace generation, but it’s tedious to get just right. Conveniently, the go format package is available to us as well. I guess for some applications you’d want to create and format your own ast, but not today.
In the end, an enjargo_methods.go file gets created in about one hundredth of a second. Fast enough?
In reviewing my notes and with another read of the documentation, I may have struggled a bit by using ast.Inspect instead of ast.Walk. At the cost of writing some more code, using ast.Visitor gives some more control. Hand traversal worked well enough for purposes here.
The stringer utility uses an external packages package. I found it unnecessary, and it makes parsing much slower, so I replaced it with just go/parser.
Jason. Argo. Read a book.
I wanted to call it J’nargo, but go.mod has ridiculous rules about permitted characters.
I would not ever recommend using enjargo. It doesn’t even escape strings. That didn’t seem like a particularly interesting or relevant challenge.
The encoder covers basic types, and structs and arrays thereof. Enough to prove the concept. Decoding is an entirely separate can of worms.
I don’t have strong opinions about where a code generator runs, such as in which address space or with which parent process, or where the magic line that invokes it goes. So long as it’s possible, the end result is about the same, although some configurations are more transparent to the user about what code is running where.