sometimes the dependencies are useful
I ripped out a dependency and then I found out what it did. I wrote an RSS parser for a very simple project, and then figured, how hard could it be to use in a real feed reader? Well, not very hard, but it was somewhat time consuming, and offers another perspective on using other people’s code.
I started with my own RSS types to generate feeds with the go XML library. By flipping the polarity and using it to parse XML, I was able to extract one field from one feed. Hey, this works, so let’s use it everywhere. Sure, there will be a few more changes required, but I can handle that.
This post ended up being delayed by several days because every time I was about to write it, I found a new fix was required. Opportunities for learning abound, but now my brain is full.
The first required task was to add types for atom feeds. This wasn’t so difficult. Grab a sample feed from reader, feed it through the decoder until the right fields were populated. Ship it.
The first set of bugs are all to do with links.
I knew that gofeed supports a feedburner extension to expose the real link instead of tracking crap, because I was using it before, but I wasn’t parsing it now because I didn’t have a sample feed. Once one of my feeds refreshed, I knew which one to inspect. An easy fix, and I knew about it in advance, I was just waiting until it appeared.
One of the atom feed’s links were all wrong. Turns out the <link>
element can be repeated, so I needed to change the type to an array and look for the rel="alternate"
one.
A different atom feed broke. Some feeds only have a single <link>
with the rel attribute. So we need a backup plan for that case.
Lots of trouble getting the proper content.
I started by parsing <summary>
from atom feeds. But frequently that’s not enough, you need to parse <content>
. Okay, append that too.
Oops, appending looks bad. Just prefer content over summary if present. That’s better.
Noticed that even some RSS feeds look wrong. Some people use an <encoded>
element. Add that to my type so we pick it up.
I thought I was done, but there’s still an atom feed with no content. Turns out people make fields where the <content>
element includes XHTML. Utterly deranged. I really can’t figure out a use case where I would want the parsed content tree just sitting there in my parsed feed. This was discouraging, but there’s a way to ask for just the string contents inside an element, so I tell the decoder to give me that.
Now I need to handle trimming off <![CDATA[]]>
myself.
Oh, right, some feeds don’t use CDATA, so we need to unescape the contents manually.
And finally I think I’ve settled into a good spot. Was any of this difficult? No, but it shows the value of using code that’s already been tested. I certainly could have avoided some trouble by downloading all my followed feeds and testing them before push, but I wanted to do a little experiment. We don’t always know what cases are handled by existing code. What happens if we naively replace code based only on our initial understanding of the problem?