honk 1.0

It’s been four years since honk 0.1. Before that, the preview, and shortly after the followup. But finally, after a long journey, we’ve reached honk 1.0. (Narrator: honk is a microblog server that federates with other servers via ActivityPub.)

Once I needed to release 0.9.91 to fix a bug I realized we were getting very low on reserve version space. There’s no particular milestone that makes 1.0 a real release, although it does include some good stuff. But some stuff is still kinda in its 0.7 state. So this post is not really an analysis of the recent changelog entries.

Instead it’s about the many ways honk has won, and the many ways others have lost. It may also help to explain why some fediverse things work or don’t work the way they do. Totally unbiased, as you should expect from an objective critic.

wins

A quick list of things I was hoping would work, and I’m happy to say have.

Honk was built to be editable software. The user base is pretty small, but I’d say this worked out wonderfully. If you want to change something, you just go in and change it, and people generally do. There’s no sitting and watching your feature request languish in the issue tracker for years here.

The original prototype slung together is still going strong. We’ve changed the schema to eventually include every database antipattern, and probably invented some new ones, but haven’t lost any data that wasn’t originally stored.

Very little modularization. Dealing with ActivityPub there are a lot of data dependencies, and they can be not quite circular, with inboxes and keys and webfinger and the like. A normal modular approach requires a lot of dependency injection or similar techniques, which leads to pushing callbacks up and down and every which way. Much simpler if you can just call the function you want when you need it.

Honk was the first ActivityPub server I wrote, but there have been several since, which are refinements and specializations of the idea. I directly reused and modularized a lot of the other code, but not the activity code. I think you want to keep this bespoke.

SQLite is perfect. It’s caused far fewer problems (none) in deployment than anybody else has experienced with a grown up database. And it makes standing up one or a dozen test instances trivial on any number of development machines.

Using go to resize images instead of imagemagick because I don’t like strangers running code on my server. Using go to resave images because I don’t like strangers uploading code to my server. Using pledge and unveil on openbsd also would have mitigated problems that never happened.

The long ago feature to see posts from one or more years ago is a daily delight. Honk started in 2019.

A general lack of discovery features. This was a personal preference, but it seems others like it, too. It starts off slow, but pretty quickly the few people you follow talk to others, so you follow them, and it starts to snowball.

Server side rendering. It’s very fast on first load. And every load after, too. Every six months somebody asks if it works without javascript and I tab over to an xterm to send a reply in lynx.

Started with no API besides the web interface. Then adapted the hydration endpoint into something slightly more useful for bots and alternative clients, but have avoided a tremendous amount of frustration trying to chase somebody else’s API. At least one Mastodon app crashes just viewing honk posts; I can’t imagine what would happen if it were talking directly to honk.

There are no honk apps in the store, but you can run honk on your android phone.

contribs

There’s almost as many forks as users. Though if you’re using stock honk, bless you.

CSP was great. The best part is now I get an alert about an inline style violation whenever somebody using an adblocker or some other extension visits my site. Not entirely sure all the puzzle pieces fit together as intended, but who am I to question the decisions of the browser gods.

Got an emoji picker, which I was not planning on, but it fits in pretty nicely.

A decent number of federation edge cases that I missed or had given up on.

A lot of people probably bounced off the code style, but the people who stuck around seemed to have enjoyed it. I’ll take that as a win. I don’t think the project would have seen any more success had the code been boring.

quick hacks

A lot of honk features are simple ten line functions. (Almost certainly they’re actually longer, like 20-30 lines, with the error handling, etc., but they feel short.) There may be a better way, but I’ve got running code, not a year old design doc.

Specialized microformats. Honk offers a normal web form to attach images and location data to a post, but also uses the main content text form as a general purpose data entry. It can be hard to remember all the options available; do you want to use hoot: or meme:, or maybe convoy:? But it offers a wide variety of options without cluttering the UI. A dedicated text input for every option would flow off the screen. The processing code is a little jank and kinda fragile, but it’s still pretty easy to add in new options.

The untag feature. The unimaginative design of other software means everybody gets mentioned in every reply, which is terrible. Common occurrence is posting about an unusual traffic light, and then 20 replies later there’s two dickheads arguing about whether orange is a color or a fruit, mentioning me every time, and I do not care. One response is to increasingly frantically ask to untag me, please untag me, I hereby demand you assholes stop tagging me. Or mute the entire thread in desperation, but then running the risk of missing out on keen insights regarding whether green lights are actually better than red lights. Honk includes an untag button which performs a somewhat precise surgical excision on the thread. It’s not perfect, but it works far better than not having the feature.

Access tracking. Honk keeps a little access log for each post, so a year later when I fix a typo, it can send the update to all the appropriate servers who may have accessed it in the mean time. Nothing fancy, no time restricted bearcap remote attestation scheme, just a list of hostnames. Other implementations send updates to only the current follower list, which can miss out on hundreds of servers for a once popular post. Or they blast it on broadcast to every server they’ve ever seen, which is rather wasteful for a post with no interactions. It’s not very hard to be mostly correct and mostly efficient.

Quote support. All the popular microblogs use a limited set of recognizable url schemes, which we can match and optimistically inline. In practice this works even better than checking for implementation specific meta data, because it also works for people who just paste links into software that doesn’t know they are posting a quote.

The Instagram importer skips some data that the honk schema isn’t prepared for. It was also written during one happy hour at a bar. If anybody uses it, they can determine what else is needed, but the availability of basic support is needed to find out what else can be done. The twitter import does see occasionally use, because I get patches for it when somebody notices the export format has changed, again. The important thing is the feature exists, close enough to working, so that people will try it. I’m doubtful people would green field an importer.

nc honk.tedunangst.com 17

mess

Some stuff wasn’t a win, didn’t quite fail, but it’s still kind of a mess.

Javascript rehydration. A limited set of page navigations are sped up using javascript and partial refreshes. Really, I think this was a fine idea. It’s just a bit (just a bit) unfinished, and there’s some sequences that confuse its state. I jumped into this a little too early, with the intention to expand over time, but it would have benefited from a bit more high level design. There’s frameworks that are supposed to help here, but really, the 100 lines of javascript was not the hard part. It’s going through the template and finding all the links that are supposed to become something else. Would do again, but more cautiously.

Storing all the big media files in a blob.db worked fine, but it tends to confuse users, because the file size keeps growing. There’s a cleanup command which internally deletes old stuff, but it doesn’t run a vacuum because that’s very slow and IO intensive. This has been difficult to explain. It’s not hard to interact with blob.db and its one table, but I think people just like the idea of walking through the attachments directory weed whacker in hand.

The filtering and list organization features never converged. There’s one feature to hide things and one feature to show things, but really this should be a single feature. The downside is then you would end up with an interface like find.

Markdown parsing is an eternal struggle. Constant source of bugs and edge cases, where you can’t start a paragraph with a #hashtag or end a sentence with a @mention. But honestly, I’d rather fail closed than open, and honk remains one of the few programs that can post #include <stdio.h> without turning that into a hashtag link. This code started off as just a basic regex just for *i* and **b** but quickly got out of hand. There weren’t a lot of alternatives that met all requirements, and it’s got a few custom integrations that are just right, but still a minefield.

Content types. The Accept header is pretty important for federation, but which value to use is a question. When I started, using the official AP type would fail with some servers. Now some servers reject requests without it. Just accept all the options, people. I’ve also been exposed to some new HTTP status codes I’ve never seen before. Like what does 202 mean in response to a GET? I’d count ignoring LD-JSON entirely and just treating everything as plain JSON was a win, but some maniacs are still pushing it. I think standards are good when they facilitate communication, but some people think standards exist to limit communication.

Error handling, recovery and retries. A failed delivery should be retried some number of times to improve reliability, but not infinitely, which requires some assessment of what constitutes temporary vs permanent failure, and which errors should count against a server’s liveness score. It seems simple, like 503 means try again, unless it’s all you’ve seen for a week, and then you give up and mark the server dead. But what does 400 mean? Some people have now decided to reject any ActivityPub vocabulary they don’t implement with 400, which would make it a hard fail, but you also see 400 from misconfigured proxies, which means a retry may work. And if the particular mix of outgoing activities you’ve sent this week results in all 400s, should you mark the server dead? Maybe it’s not an activitypub server anymore. Honk does what it can to be helpful and courteous, but there are limits. If I stop federating with you, it’s because I thought that’s what you wanted.

Bespoke artisanal JSON deserialization. I think the first instinct for a real programmer, unless working in a language with a native string bag type, is to parse every ActivityPub object into some carefully defined native struct according to a rigid grammar definition. Honk forgoes this in favor of leaving everything in a floppy map type, and then extracting such fields are necessary and understood. The disadvantage is all the usual downsides of stringly typed programming, but the advantage is it’s usually easier to see what’s gone wrong. Honk will not reject your activities with an incomprehensible HTTP 400 not good enough error. I am halfway to recommending this approach, because you’ll have to write a good deal of logic to interpret AP objects regardless, and if you do it correctly your native types will be giant everything unions anyway, but it’s not best practice, so I will keep this secret for myself.

Federation drama will never die. It’s also true that the vast majority of people harping on unjust defederation have some very questionable motives. Anyway, I eventually settled on the Thunderdome method of fediblock adjudication. Anyone can demand anyone else be blocked at any time, and then I review both their posts, determine who I am more likely to regret never seeing again, and block the other. Judgement is swift and final. Two enter, only one leaves. This initially resulted in a burst of activity, but things quickly settled down, and the Thunderdome doesn’t see much use these days.

fails

Honk can never fail. Honk can only be failed.

Splitting the database internally by user results in weird stuff when Alice tries to view server/u/bob/h/x. It’s on the same server, so she has cookies and appears logged in, but it’s technically not in her database. This lead to some confusing UI interactions where buttons and actions would be presented, but fail when executed. Fixed that, but now you end up with posts you can’t interact with, though it’s not immediately clear why. I think this was maybe the right idea, since it’s integral to honk’s promise that your data is your data. It’s much harder to accidentally leak private posts between users since there’s a prominent userid column in every single post. If I started again, I’d push the design farther, and make each user have their own domain. Multi user setups should use vhosting.

Events never caught on. Given its prominent position in the menu, this seems like a headline feature. It was! I thought it would be cool to have a shared event or conference schedule. There’s always some smaller local con or meetup I never find out about until somebody posts a selfie as it starts. Too late! I wasn’t trying to build a full evite system here, just a quick reference page where you could see upcoming events, sorted by time, etc. This seemed like a pretty natural fit, given how much focus ActivityPub/Streams gives on human activities and communication. Support for Event activities is dreadfully thin, though. The few projects trying don’t seem very popular, and don’t communicate with honk. Pretty sad this didn’t work out, but then we went through the time of no events, so it didn’t matter much.

Read activities for reply control. Every few months there’s a big round of grisbayting where we demand to control who can reply to our posts. Sometimes they’re mean, sometimes they’re just dumb, whatever, I don’t need you clogging up the thread for anyone trying to read it. Except this is the internet and you can’t control what people say, although you can control what you choose to host on your server. And so, since nearly the beginning, honk has only publicly shown approved replies. This is trivial to implement, literally one bit, yet somehow beyond reach of many projects. The next step, of course, is to tell other servers about which replies are approved. Honk sends out Read activities for this. I mean, it’s your server, so you can show what you want, but if you had some desire to display my thread my way, only including the replies that I’ve Read seems like a reasonable approach. It’s not precisely to spec, which doesn’t specify much, but you can reasonably assume some things if I leave a reply unread. Anyway, nobody does this, and I predict they’ll still be yakking about it years from now.

Read activities also solve the ghost replies problem. This is where you ask a question, get some answers, then two days later are still getting duplicate answers because nobody sees the earlier replies. To spare people the embarrassment of a late redundant reply, I inform the fediverse about all the answers I’ve already Read. Alas, many people prefer to continue using software that embarrasses them. What annoys me is I go out of my way to assist you, and then you reject my messages with a 400? Not cool.

The most frustrating bug in honk history, where you’d get randomly logged out mid session, turned out to be an iPhone safari bug, but my god, I was losing my mind thinking I was somehow violating a secret cookie rule that everybody else knew about. If I’d used a grown up framework, I could have blamed them instead.

it’s honking time

First commit: 2019-04-09

1000th commit: 2019-11-12

1632nd commit: 2023-08-09

Happy Honking.

Posted 10 Aug 2023 14:22 by tedu Updated: 10 Aug 2023 14:22
Tagged: activitypub project web