flak rss random

small views of large files

Sometimes you have a large file when you want a small file. You may not be able to edit the large file, but that’s okay, you can simply read the small part you want out of the large file. libfdview is a proof of concept library that presents a smaller view of a larger file.

concept

We will LD_PRELOAD a library that overrides open and associated functions. When the application reads from the large file, it will instead be limited to a smaller view of the file.

We grab the next open function so we don’t tie ourselves up in loops, and set a few variables passed via environment.

static const char *viewfile;
static off_t viewstart, viewend;

static void init() __attribute__ ((constructor));
static void init()
{
        realopen = dlsym(RTLD_NEXT, "open");
        viewfile = getenv("FDVIEW_NAME")?:"";
        viewstart = strtonum(getenv("FDVIEW_START")?:"0", 0, 10987654321, NULL);
        viewend = strtonum(getenv("FDVIEW_END")?:"10987654321", viewstart, 10987654321, NULL);
}

It’s cleaner IMO to use a constructor init function, but lazy init would work here too.

Whenever a file is opened, we check if it’s our target. If so, we seek forward past the part of the file that shouldn’t be visible.

static int viewfd = -1;

int
open(const char *path, int flags, int mode)
{
        int fd = realopen(path, flags, mode);
        if (strcmp(path, viewfile) == 0) {
                viewfd = fd;
                lseek(viewfd, viewstart, SEEK_SET);
        }
        return fd;
}

Var args are messy.

fopen

The first problem is that trying to perform a simple test with head doesn’t work. Internally, head calls fopen which is wired up to use the libc version of open, not any preloaded versions. Another override.

FILE *
fopen(const char *path, const char *mode)
{
        if (strcmp(path, viewfile) == 0) {
                int fd = open(path, 0, 0);
                return fdopen(fd, mode);
        }
        return realfopen(path, mode);
}

The mode handling is rather incomplete, but good enough for production.

read

If we don’t want to read past the end of our small view, we’ll need to fix read as well.

ssize_t
read(int fd, void *buf, size_t n)
{
        if (fd == viewfd) {
                off_t pos = lseek(fd, 0, SEEK_CUR);
                if (n + pos > viewend)
                        n = viewend - pos;
        }
        return realread(fd, buf, n);
}

At some point we should probably take care to override lseek as well to make sure that the application doesn’t itself seek out of the view.

results

head -1 fdview.c
#include <dlfcn.h>
LD_PRELOAD=./libfv.so FDVIEW_NAME=fdview.c FDVIEW_START=19 head -1 fdview.c
#include <stdio.h>

Yes. And with a simple test program that’s basically cp to stdout.

LD_PRELOAD=./libfv.so FDVIEW_NAME=tester.c FDVIEW_END=39 ./tester
fd: 3
#include <stdio.h>
#include <unistd.h>

There’s more to the file, but we don’t see it because we’ve been cut off from reading more.

long tail

There’s no test for tail because, uh, it starts getting more complicated, and my hard drive is big enough to store both large and small files, but it could be done given sufficient functions overrides.

Posted 22 Sep 2020 20:00 by tedu Updated: 22 Sep 2020 20:00
Tagged: c programming