I’ve been slowly refining a previous experiment in HTML rewriting over time to handle more sites and situations. Unfortunately, I don’t always remember to use it, and the work flow is more complicated, involving copying a URL, opening a new tab, pasting, etc. Also, it doesn’t work at all for sites that require logins and cookies. I’d like for things to be transparent. Hence, a proxy. miniwebproxy is an HTTP proxy that intercepts TLS connections, proxying the actual request instead of simply forwarding traffic. This presents it with the opportunity to modify response content, but reuse the same request headers as the browser.
This has several benefits. Web pages look nice again. They load quickly. They don’t strain the browser. Strange hovering artifacts don’t follow me about. I’ll mention loading quickly again, because not only is the page processed faster, but much less data is transferred. Fewer requests are made. On mobile, or hotel wifi, this makes a tremendous difference.
Results are pretty good. It’s not yet up to the task of reducing the Bloomberg or WSJ or NYT or WAPO home pages, which remain gnarly grids, but individual articles are now much improved. Those stupid blogspot posts that take many seconds to load appear instantly, without bizarro animation effects. Numbers? Previously that link was 79 requests, 293 KB, 7.5 seconds to finish. Now it is one request, 3.4 KB, in under half a second. Pretty good.
All content is subject to a very simple embedded style sheet. It works well enough to read on my phone, has sufficient margins to not cause neck strain on a wide screen, and looks amazing on my surface in portrait mode.
There are other approaches to this problem, but I didn’t write them. Why not privoxy? I don’t want to see the regex that fetches an iframe and then replaces it with decoded json content. Also, https rewrite support is mandatory. Why not readability browser extensions? They don’t actually prevent downloading garbage, which doesn’t save me bandwidth. Why not an ad blocker? miniwebproxy actually does a very good job of eliminating ads on the pages it filters, but it doesn’t filter every page. Works best in conjuction with an blocker.
Another, somewhat idealistic, way to look at miniwebproxy is that it is a cloud based remote browser. It fetches web pages and tells you what’s on the page. For convenience and familiarity, it presents a web page like interface, which allows the user to operate it. To use the technical term, it is a user agent. Compare with your local browser, which is a site agent.
Side note: it’s weird watching chrome in real time. Opening a new blank tab makes several requests back to the googship. Each and every new tab, boom, boom, boom, another wave of requests. Because maybe the newtab-serviceworker.js I downloaded thirty seconds ago has expired already?