miniwebproxy
Imagine, if you can, a smaller version of the web. A web without dickbars, or scroll jacking, or chum boxes, or popup video, but still a web filled with informative articles about the 27 blockchains you need to be using right now. The good news is this web exists, but unfortunately your browser doesn’t connect to it by default. For that, you need the miniwebproxy.
I’ve been slowly refining a previous experiment in HTML rewriting over time to handle more sites and situations. Unfortunately, I don’t always remember to use it, and the work flow is more complicated, involving copying a URL, opening a new tab, pasting, etc. Also, it doesn’t work at all for sites that require logins and cookies. I’d like for things to be transparent. Hence, a proxy. miniwebproxy is an HTTP proxy that intercepts TLS connections, proxying the actual request instead of simply forwarding traffic. This presents it with the opportunity to modify response content, but reuse the same request headers as the browser.
This has several benefits. Web pages look nice again. They load quickly. They don’t strain the browser. Strange hovering artifacts don’t follow me about. I’ll mention loading quickly again, because not only is the page processed faster, but much less data is transferred. Fewer requests are made. On mobile, or hotel wifi, this makes a tremendous difference.
Some sites do some weird stuff. Ever read a medium post with some code snippets? Know how that works? First there’s an iframe. The iframe interior is an empty shell that sources some javascript. Then the javascript rewrites the iframe with love. miniwebproxy doesn’t believe in love, so it digs the code snippet out of the json and simply inserts it into the page. There are some other custom rules to handle javascript only blogspot themes, lazy loading images, etc. The goal isn’t to embed an actual javascript interpreter, but a few ad hoc rules gets most of the way. Some embeds aren’t handled, like giphy. But in a review of all the giphy featuring medium posts I could find in my browser history, 100% were bandwidth wasting memes; zero were informative or even relevant to the accompanying article.
Results are pretty good. It’s not yet up to the task of reducing the Bloomberg or WSJ or NYT or WAPO home pages, which remain gnarly grids, but individual articles are now much improved. Those stupid blogspot posts that take many seconds to load appear instantly, without bizarro animation effects. Numbers? Previously that link was 79 requests, 293 KB, 7.5 seconds to finish. Now it is one request, 3.4 KB, in under half a second. Pretty good.
All content is subject to a very simple embedded style sheet. It works well enough to read on my phone, has sufficient margins to not cause neck strain on a wide screen, and looks amazing on my surface in portrait mode.
There are other approaches to this problem, but I didn’t write them. Why not privoxy? I don’t want to see the regex that fetches an iframe and then replaces it with decoded json content. Also, https rewrite support is mandatory. Why not readability browser extensions? They don’t actually prevent downloading garbage, which doesn’t save me bandwidth. Why not an ad blocker? miniwebproxy actually does a very good job of eliminating ads on the pages it filters, but it doesn’t filter every page. Works best in conjuction with an blocker.
Another, somewhat idealistic, way to look at miniwebproxy is that it is a cloud based remote browser. It fetches web pages and tells you what’s on the page. For convenience and familiarity, it presents a web page like interface, which allows the user to operate it. To use the technical term, it is a user agent. Compare with your local browser, which is a site agent.
Side note: it’s weird watching chrome in real time. Opening a new blank tab makes several requests back to the googship. Each and every new tab, boom, boom, boom, another wave of requests. Because maybe the newtab-serviceworker.js I downloaded thirty seconds ago has expired already?