flak rss random

anticrawl

It’s the internet, so there’s crawlers, and it’s the future, so they’re mindless wannabe chatbot scrapers, and it’s the cyberpunk world we always dreamed of, so the cool thing to do is to write your own force field to keep the bots out. Which I did.

For background, I have not actually noticed any load from bot scrapers (other than the google go cache, different story), but found them a different way. There’s a bug in humungus which I’m too lazy to fix that causes it to 500 error whenever a file revision that doesn’t exist is requested. There’s a second bug I’m too lazy to even look for that generates these links for the bots to find. But the punchline is I have a bunch of 500 errors in my log file. The robots file excludes these URLs because I know they’re useless for a crawler. I’m trying to help you, stupid bot, but some bots are beyond help, so we need a bigger hammer. Initially, I banned netblocks in pf, and after eliminating all of BabaWei’s IP blocks, we’re down to Brazilian ISPs, which I don’t want block at the network level.

anticrawl is a simple go http handler. Stick it in the affected service. Then configure a regex because I like problems. I don’t care if you scrape my README 700 times, that’s what it’s for, but leave the other junk alone. Also, I’d rather not bother humans, even a little bit, until they start clicking around deeper.

The challenge is super easy. If you have javascript, you have to find some 42s. (Why is it always zeroes we’re forced to search for?) If you don’t, you have to solve the riddle of the llama. Either way, it’s super trivial because the adversary isn’t exactly basilisk class AI. I was told cookies are evil, so the state is just stored server side. I’m thinking I might change the design so it’s even easier to bypass by starting at a normal entry point. So far, it appears very effective.

There’s also a standalone proxy server for people who can never run enough servers.

Posted 17 Apr 2025 16:53 by tedu Updated: 17 Apr 2025 16:53
Tagged: project

what if the poison were rust?

The OpenBSD kernel has a set of functions to help detect memory corruption, the poison subroutines. The memory management code uses these functions, but they themselves have a very simple interface, no complicated types or data structures, meaning they’re easy to replace. What if we rewrite the memory corruption detection functions in rust so it’s impossible for them to cause memory corruption?

more...

Posted 09 Apr 2025 04:48 by tedu Updated: 09 Apr 2025 04:48
Tagged: openbsd rust

dated carbon

I have a Pixelbook which Google says I need to stop using, but they’re not the boss of me, and in the process of reflashing it (long story), I needed to get out my trusty USB stick writer, a Zenbook UX305. Well, formerly trusty. After closing the lid, I noticed a small gap in the front. The laptop’s midsection has developed a serious case of the swoles. Okay, let’s get a 3rd gen Carbon X1 Thinkpad from the laptop shelf.

more...

Posted 31 Mar 2025 18:24 by tedu Updated: 31 Mar 2025 18:24
Tagged: computers

where do the bytes go?

Or perhaps more precisely, how do they get there? What happens when you call write?

more...

Posted 29 Mar 2025 10:38 by tedu Updated: 29 Mar 2025 10:38
Tagged: openbsd

dude, where are your syscalls?

The OpenBSD kernel is getting to be really old, like really, really old, mid 40s old, and consequently it doesn’t like surprises, so programs have to tell it where their syscalls are. In today’s edition of the polite programmer, we’ll learn the proper etiquette for doing so.

more...

Posted 05 Mar 2025 09:35 by tedu Updated: 12 Mar 2025 07:16
Tagged: openbsd programming

you don't link all of libc

On OpenBSD, there is a rule that you link with libc to interface with the kernel, because that’s where the syscall stubs live. This causes a great deal of consternation for partisans of other languages, because they don’t want to link “all of libc”. But when does anything link all of libc?

more...

Posted 12 Feb 2025 18:54 by tedu Updated: 12 Feb 2025 18:54
Tagged: c openbsd programming

two kindles

Because I am a glutton for exploitation, I bought another Kindle. The tiny entry level model, unlike the Scribe I currently have.

more...

Posted 03 Feb 2025 19:38 by tedu Updated: 03 Feb 2025 19:38
Tagged: gadget review

stories i refuse to believe

The internet is filled with stories that purport to teach us a valuable lesson or something about how the world works, and they’re really important because they really happened. NASA spent millions of dollars designing a space pen, which was really foolish when they could have just used a pencil like the Russians. I think not as many people believe that anymore, but it’s still floating around out there.

more...

Posted 22 Jan 2025 13:16 by tedu Updated: 22 Jan 2025 13:16
Tagged: thoughts

new year new rules new lines

We have a new edition of POSIX, which means new rules, new regulations, new red tape. But not new lines. Which is to say, posix at long last banishes new lines in file names.

more...

Posted 17 Jan 2025 16:11 by tedu Updated: 17 Jan 2025 16:11
Tagged: openbsd

an autoflusher

What if we want a grep that doesn’t stuck but we don’t want to resort to wild hacks like editing the source? What if there was some way to flush stdout automatically?

The auto flusher is a very simple preload.

#include <stdio.h>
#include <unistd.h>
#include <pthread.h>

static void *
flusher(void *arg)
{
        while (1) {
                sleep(3);
                fflush(stdout);
        }
}

__attribute__((constructor))
void
herewego(void)
{
        pthread_t thread;

        pthread_create(&thread, NULL, flusher, NULL);
}

Magic constructor attribute for initializers in C, and then we create a thread which loops around occasionally flushing stdout, so if there’s any data left lingering, it will eventually find it’s way out instead of waiting forever.

$ cc -shared -lpthread -o libflusher.so flusher.c
$ env LD_PRELOAD=./libflusher.so grep ...

Posted 14 Jan 2025 15:06 by tedu Updated: 14 Jan 2025 15:06
Tagged: c programming