ZFS on OpenBSD
Yesterday I committed support for ZFS. (And the people rejoiced, ever so briefly.) What would a real port look like and what would be involved?
The short answer is too much. Too much of everything.
too many files
After finding the list of files in FreeBSD’s cvsweb, I started copying them down. Towards the end of the list I began to wonder if maybe writing a script to scrape the web page wouldn’t have been a better approach. Then I got to the end and realized I only had C files, the headers were hidden in another directory. Oh well, in for a penny, in for a pound.
As somebody told me afterwards, “Wow, I even thought the list of filenames was part of the joke, made up, until I figured out it was the, uhh, actual list.”
The problem with injecting this much code is somebody has to understand it all. Even after we go through and wrap the Solaris rwlocks with OpenBSD rwlocks and even though this code has been tested by lots of other people, there are always bugs, and the first part of debugging is determining what you want to happen. With this much code, I have no idea what should happen.
too many features
ZFS has so many files because it has so many features. Some of these features are highly desirable, which keeps me interested in ZFS, but others I think I could live without. The problem is there’s no way to choose. It’s all or nothing. And the way it’s implemented means it basically duplicates tons of functionality that already does exist. It’s like running half of a paravirtualized Solaris kernel just to get access to a filesystem. (Fun fact: I once ran samba to export an NFS mount from an OpenBSD VMWare guest to the Windows host, all because Windows NFS clients are lousy. At least I didn’t have to write any of the code involved.)
This also becomes an issue with the userland tools. We’d like OpenBSD to maintain a certain feel, a consistency (where possible) among tools. You can’t just toss on your Solaris admin hat and work on ZFS; you need to suit up with your Solaris belt and suspenders and necktie. An OpenBSD port would probably relegate the official tools to the ports tree, and include minimally functional rewrites in base. Not desirable.
too much memory
I alluded to memory consumption in the commit message. ZFS wants a lot of memory. A lot lot lot of memory. So much memory, the kernel address space has trouble wrapping its arms around ZFS. I haven’t studied it extensively, but the hack of pushing some of the cache off into higher memory and accessing it through a small window may even work. FreeBSD doesn’t attempt anything like this, even with PAE support, maybe for good reason.
Amusingly, the proposed hack would also work with the buffer cache embiggening diff. Of course, you’re unlikely to see such a hack because FFS performance doesn’t blow goats with only a measly one gigabyte of cache.
This much is certain. I do not want to have any involvement in writing a filesystem tuning guide that includes text along the lines of “You may need to increase KVA pages. If you experience random reboots, you increased it too much.”
too many words
On the less technical side, there’s the CDDL. Whatever other issues there may be with patents, copyleft, sharing is caring, and whatnot, the CDDL suffers from TMW. Too Many Words. The most important question for free software is, Can I use this? It’s a yes or no question and I strongly believe the answer should be determinable by reading an amount of text that fits on screen with an 80x25 terminal.
too many choices
What is there besides ZFS? FreeBSD added journaling to softupdates, but having heard McKusick talk about the amount of work involved in all the edge cases, I think I’ll pass. NetBSD has WAPBL (TMC: Too Many Capitals). Dragonfly has HAMMER (TMC as well), but it looks pretty cool. Somebody should totally port it.
What about FFS? We (in a rather encompassing sense of we) have addressed various shortcoming in FFS. softupdates to make it fast. dirhash to make big directories fast. It’s not done yet. Just yesterday I also found out about a new FFS layout policy that can reduce fsck times considerably. For that matter, SSD storage is the great performance equalizer for all filesystems.
There are some things about ZFS I really like. The consistency checks where every data block read is hashed and verified should be a part of any new filesystem.