Every once in a while somebody asks if they can run flak, and the answer is usually some variant of no, not right now, but maybe after I’m done rewriting it four times. Well, it’s been stuck at rewrite number 3 for quite a while, so time to push the button. Of course, putting code on the internet requires a place to put it, but Microsoft shut down Codeplex. Guess I have to build my own.
Enter humungus, a web based proxy for mercurial repos, which allows browsing, cloning, and not much else. The whole apparatus is up and running at humungus.tedunangst.com along with some code odds and ends. There’s no documentation for most of it because you’re not supposed to actually run it, just criticize my commit messages. (Haha, that was fast. Looks like somebody broke it. Well, maybe it works, maybe it doesn’t. Some turbulence expected.)
Introduction out of the way, some notes and observations on the implementation. It’s written in go, which seemed well suited to the task of wrangling a bunch of sockets and processes and connections. I think so. There’s a lot of potential for a web request to block while talking to a mercurial process, but everything is written in a simple synchronous style, letting go worry about concurrency.
I’m using a mix of channels and mutexes, resisting the urge to use channels for everything because they’re there, and trying not to use mutexes just because they’re familiar. I’m not entirely sold on channels, because they’re still a bit cumbersome. I’m using them for some caches, where the cache is controlled by a separate goroutine, and it works well for that purpose. Like running a mini process inside the main process.
The only hard part was putting together all the pieces of the mercurial protocol secret decoder ring. No fewer than three protocols are in use. And each protocol has its own, possibly nested, encoding scheme. So a few notes on that for the curious.
The programmatic way to interact with mercurial is via the command server, which lets you talk to a long running hg process over a pipe. The most interesting thing you can do is execute the runcommand command, which runs a command. Which command? The documentation says you must consult the capabilities, but that only tells you that runcommand exists, not what runcommand can run. The answer is anything you’d normally put on the command line. clone, push, commit.
The other two protocols are the web protocol and the wire protocol. They’re a bit stranger. I maybe didn’t need to deal with all this, but I wanted to support clone via the web interface but without running a zoo of hg web servers (and the requisite port wrangling that entails). Instead, the web protocol is proxied and translated to the stdin wire protocol. Thus even though the protocols are very similar, and speak the same language, humungus is up to its eyeballs in encoding and decoding.
The first thing to note is that HTTP commands are not sent with parameters in the URL. Instead they are stuck in a hidden HTTP header, X-Hgarg-1 (and maybe X-Hgarg-2 and so on). Next you URL decode that header as if it were a form to get the actual request parameters.
Now we want to feed this to the mercurial server in what they call SSH command format. There’s documentation there for one command, but that’s about the only command that works. Some very important commands like known and getbundle take an argument like “* 0” before the other arguments. This is some internal python dictionary magic? If you get it wrong, the server will make very angry noises about unpacking the wrong values and print stack traces. Another command that requires the star zero magic is batch, which can run several other commands. They are separated by semicolons, and their arguments are separated by spaces. So URL decode, split on semicolon, check for space, print the star maybe, print some other numbers, then arguments. Done.
The wire protocol doesn’t tell you how much data is coming out of getbundle, which can be problematic. If we try to read too much, we’ll get stuck. Unfortuately go’s facilities for timed blocking are really limited. I went in circles trying to do something as simple as read from a pipe, but don’t block for more than one second. It devolves into a maze of goroutines and channels and even then isn’t what you want. This is one point where the go model breaks down hard, but I think it’s just an API limitation, not anything completely fundamental.
The workaround is to note that the command server will exit, thus reliably triggering EOF, when we close the write end of the pipe. But we’re specifically trying to keep the wire server running to avoid the overhead of starting a new one every request. What seems to work is assuming if the first read is short, that’s the end of transmission and we can reuse this server. If we have to loop, making multiple reads, then we close the write end. It’s weird. There is some documentation for the bundle format but I’ve avoided parsing in depth.
It’s also important to note that the web protocol always zlib compresses the getbundle response (regardless of whatever the bundle itself says), but the wire protocol doesn’t. The hg client refuses to work with uncompressed data. Not sure why they didn’t use HTTP Content-Encoding header here, but it’s important to zlib the response before sending it back.
In the end, I think it almost works and it’s certainly a better means to share code than posting one off files to flak.