flak rss random

tweet compression

Sometimes you’ve got something really important to tweet, but it doesn’t quite fit in 140 characters. There’s several techniques that can help in this situation.

One option is to use another platform that allows longer posts, but like I said, this is really important. Twitter or bust. Or write the post on a napkin, then upload a picture of it, but let’s pretend we have an irrational preference for textual information conveyed as text.

English text has a lot of redundancy. Certain digraphs are particularly common. In fact, many digraphs are actually descended from single letters. We can shave some characters from our tweets by winding back the clock.

If instead of writing “This or that” we write “Þis or þat” using thorns, thankfully preserved in unicode by our viking friends, that represents a savings of 16%. It’s the same number of bytes, but I don’t make the absurd rules.

Besides the old standbys like æ, we can steal a trick or two from German and turn ue into ü. Unicode also includes some deprecated ligatures like fi. Mostly we want the compressed text to be recognizable even to those without a decoding tool. “Please fix the shoot.” -> “Plêse fix þe ʃōt.” Maybe.

<center>
Enter text here:<br>
<textarea id="carne" rows=6 cols=80>
</textarea>
<br>
<button onclick="javascript:burrito()">Click this button</button>


Encoded: (<span id=wraplen>0</span>)
<div id=wrapped>
</div>

Decoded:
<div id=unwrapped>
</div>
</center>

<script>
var fillings = [
[ "Th", "Þ" ],
[ "th", "þ" ],
[ "ch", "č" ],
[ "sh", "ʃ" ],
[ "wh", "ʍ" ],
[ "ae", "æ" ],
[ "ea", "ê" ],
[ "ee", "ē" ],
[ "ei", "ė" ],
[ "ie", "ə" ],
[ "oa", "ô" ],
[ "oe", "œ" ],
[ "oo", "ō" ],
[ "ue", "ü" ],
[ "ss", "ß" ],
[ "fi", "fi" ],
[ "fl", "fl" ],
[ "early bird gets the worm", "ebgtw" ]
]
function burrito() {
        var c = document.getElementById("carne").value
        var w = c
        var u = c
        for (i = 0; i < fillings.length; i++) {
                w = w.replace(new RegExp(fillings[i][0], "g"), fillings[i][1])
                u = u.replace(new RegExp(fillings[i][1], "g"), fillings[i][0])
        }
        document.getElementById("wrapped").innerHTML = w.replace(/&/g, "&amp;").replace(/</g, "&lt;")
        document.getElementById("unwrapped").innerHTML = u.replace(/&/g, "&amp;").replace(/</g, "&lt;")
        document.getElementById("wraplen").innerHTML = w.length

}
</script>
Posted 27 Aug 2016 19:00 by tedu Updated: 14 Sep 2016 14:37