python 3k17

New year, time for a new python, right? I’ve been sticking for python2 but two related events led me to try python3. The first was python3.6, which has a bunch of new features, notably finalized async support. No plans to actually use said support myself, but it seems like the kind of landmark feature that will convince other people to switch, so I figured I would hop on board. The second thing was python3.6 being available as an OpenBSD package. The scene was set for a day spent updating code. If you don’t use python, this will probably not be of much interest.

I don’t have that much python to begin with. A few utilities, somewhat larger than scripts, but much smaller than anything you’d call an application. The most important libraries I use are lxml, feedparser, and pygments. All are available in python3 flavors. So how much trouble can I have? Let’s start with the mechanical 2to3 conversion tool.

-for (k, v) in d.items():
-       print("%s %s" % (k, v))
+for (k, v) in list(d.items()):
+       print(("%s %s" % (k, v)))

Why are you sticking a list around items? OK, I get that strictly speaking, that’s the equivalent code, but it’s unnecessary. I’m switching to python3, in theory, because not returning a list should be faster. It’s like running a code pessimizer.

Not sure what the deal with the extra parens for print is about. Yo dawg, I heard you like parens?

One other change that the 2to3 tool didn’t cope with is that socket.makefile only supports “r” and “w” modes. Impossible to create a bidirectional file, despite the fact the underlying socket is bidirectional. Why, why, why? In my case, this was merely annoying as I was able to create one file to read a request, and then a second file to write the reply. But in a larger, more complicated application that interleaved read/write calls? Refactor from hell pushing two files down through all the plumbing.

Minor quibbles. The big issue, as expected, is strings versus byte strings. I happen to like my all in one unified string types, as found in lua. flak doesn’t have any trouble storing jpeg images and ümlaut strings in the same data type. It’s quite effortless, in fact. But now I suddenly need to care a great deal about this, and I hate it with a passion. (Incidentally, my nascent ruby career ended for exactly this reason. Code for 1.8.7 simply could not be made to run on later versions because string encoding issues.)

A brief interlude while I ponder the difficulty of supporting strings and bytes in a natural fashion. I also have some go code which works on strings or bytes, and if the dataflow graph were collapsed, it would look like this:

output = string([]byte(string([]byte(input))))

Kinda silly, but it happens. At least on the bright side, go is statically typed and the compiler warns me when a conversion is missing. No such luch with python3. I’ve got strings and bstrings running around, with errors only detectable at runtime. If at all.

Of particular note, "hello" == b"hello" simply evaluates to False. It’s not an error of any sort. One can also freely mix strings and bstrings as dictionary keys, leading to missing lookups. Gah! Of course, for many of my dictionaries, keys are optional, so the code would run, but incorrectly. Test, fix, test, fix, over and over.

Finally, all that done, it was time to spend some time enjoying the fruits of my labor. Except, what? Is it slower? Despite the promises that python3 would be faster, the only library whose performance I care about, pygments, runs nearly 50% slower. Thought it could be my janky code, but a quick test revealed the slowdown was entirely within pygments.

In the end, reverted everything. The python3 version will still be there, safe in the attic, if I need it. Maybe next year will be the year of python 3k18.

Posted 05 Jan 2017 17:30 by tedu Updated: 05 Jan 2017 17:30
Tagged: python