battery consuming battery software
This is a little tour of some software I took today. One of the topics that consistently comes up when people discuss what operating system to run on their laptop is how much battery life to expect, and the answers are all over the map. The focus always seems to be on the kernel and how advanced its scheduler algorithm is, and the minutia of interrupt controllers. We throw around terms like race to sleep. But rarely do I see anyone mention the impact that the software they choose to run spending millions of CPU cycles on trivial tasks might have on battery life. Especially ironic if that software ends up being the software we’re running to monitor how much battery is left.
What’s involved in determining how much battery is left? On any modern laptop, there’s a bit of firmware running on the battery itself, which measures discharge rate, does some smoothing, and estimates remaining charge based on the voltage. There’s a nonlinear relationship between voltage and remaining charge, but fortunately the battery works that out for us. It just presents us with a few ready to use numbers, which are accessed via some i2c smbus joinery. The specific incantation comes in the form of ACPI AML bytecode. So reading the register is probably a few dozen instructions, but finding the register requires running an AML interpreter, possibly a few thousand instructions.
That’s the requisite background info. Now we’ve got these numbers in the kernel. 30%, 147 minutes remaining. How many more instructions to put this information on the screen where you can see it?
Our journey begins with a note that OpenBSD apm support was added to the tmux battery plugin. My first thought was that they did this the inefficient way, by running apm
in a loop. That’s exactly what I did in a dwm status bar updater before noticing it was very wasteful. Click through to the tmux battery plugin commit and sure enough, apm | awk
.
To be fair, this may be the best way to do it. It’s the obvious approach, it glues together existing tools, and it’s not obviously incorrect. And the tmux status bar only updates every 15 seconds or so, so it’s probably not a measurable drain. But that’s also what makes this approach so pernicious. Each subcommand execution is so brief, there’s no easy way to measure the cumulative effect. You won’t see such commands in top. Creating new address spaces, tearing them down, context switching, flushing TLBs. We’re talking more than a few thousand instructions here. Even just the code required for awk to parse '{printf "- %s left", $1}'
is nontrivial.
What really caught my eye in the diff was an existing line. termux-battery-status | jq -r '.percentage' | awk '{printf("%d%%", $1)}'
I didn’t know what termux-battery-status was, it’s an Android thing. This outputs json, which requires another nontrivial parser to select out one field. I was curious if anybody had ever noticed or complained about the battery plugin using too much battery.
Turns out they had, but the complaints centered around the upower command instead. There’s even a note in the README advising to use the acpi command instead of upower. A little more digging turned up a link to a launchpad bug for upower with some interesting comments. One part of the problem seems to be that you can get flooded with events.
[08:40:48.351] device changed: /org/freedesktop/UPower/devices/battery_BAT0
[08:40:48.362] device changed: /org/freedesktop/UPower/devices/battery_BAT0
[08:40:48.372] device changed: /org/freedesktop/UPower/devices/battery_BAT0
I mean, I like my battery status to be up to date as much as anyone, but refreshing every hundredth of a second seems a bit much. Maybe if you have a 120hz monitor? This is clearly a bug somewhere, but I think it’s worth considering how we get such bugs. For infrequently changing data, without any particular urgency, we might add some hysteresis. But even better, we might write less code and simply poll at a comfortable interval. It’s possible to write a runaway poll loop, but it’s also easy to write a provably bounded slow poll loop.
(I think we’ve taken our fascination with hyper efficient edge triggered push events a little too far. Sometimes you don’t care about all the transitions in between, just whatever the current state happens to be from time to time.)
Then there’s another comment, which refers to another bug.
It seems that the issue is caused by iPad/iPhone pretending to offer Ethernet connection and "upowerd" automatically trying to enable that connection. This operation fails and "upowerd" is stupid enough to repeat the process forever.
This isn’t really about power management, but while we’re here... How should a program handle an attempted operation that fails? Abort the operation? Or retry in a loop? Do we notify the user, or carry on regardless because maybe things will get better?
But really, this is about power management. How much software out there is silently mindlessly failing in a loop? How many electrons die in vain as a result?
Adding support for low power mode to wifi drivers gets all the attention, but for the average user I suspect that’s really in the long tail of tiny incremental improvements. There’s some tooling available to detect power consumers (powertop, etc.) although I wonder if the environments it’s run in reflect typical laptops. You setup your new laptop, run powertop to check it’s all efficient, but do you run it again after 80 hours of uptime when seven different desktop notification daemons have gone haywire?
I linked to a few specific issues, but only because I happened to come across them today. I don’t think they’re unusually bad. They are not exceptional, but just the opposite.
Tagged: software