sometimes the dependencies are useful

I ripped out a dependency and then I found out what it did. I wrote an RSS parser for a very simple project, and then figured, how hard could it be to use in a real feed reader? Well, not very hard, but it was somewhat time consuming, and offers another perspective on using other people’s code.

I started with my own RSS types to generate feeds with the go XML library. By flipping the polarity and using it to parse XML, I was able to extract one field from one feed. Hey, this works, so let’s use it everywhere. Sure, there will be a few more changes required, but I can handle that.

This post ended up being delayed by several days because every time I was about to write it, I found a new fix was required. Opportunities for learning abound, but now my brain is full.

The first required task was to add types for atom feeds. This wasn’t so difficult. Grab a sample feed from reader, feed it through the decoder until the right fields were populated. Ship it.

The first set of bugs are all to do with links.

I knew that gofeed supports a feedburner extension to expose the real link instead of tracking crap, because I was using it before, but I wasn’t parsing it now because I didn’t have a sample feed. Once one of my feeds refreshed, I knew which one to inspect. An easy fix, and I knew about it in advance, I was just waiting until it appeared.

One of the atom feed’s links were all wrong. Turns out the <link> element can be repeated, so I needed to change the type to an array and look for the rel="alternate" one.

A different atom feed broke. Some feeds only have a single <link> with the rel attribute. So we need a backup plan for that case.

Lots of trouble getting the proper content.

I started by parsing <summary> from atom feeds. But frequently that’s not enough, you need to parse <content>. Okay, append that too.

Oops, appending looks bad. Just prefer content over summary if present. That’s better.

Noticed that even some RSS feeds look wrong. Some people use an <encoded> element. Add that to my type so we pick it up.

I thought I was done, but there’s still an atom feed with no content. Turns out people make fields where the <content> element includes XHTML. Utterly deranged. I really can’t figure out a use case where I would want the parsed content tree just sitting there in my parsed feed. This was discouraging, but there’s a way to ask for just the string contents inside an element, so I tell the decoder to give me that.

Now I need to handle trimming off <![CDATA[]]> myself.

Oh, right, some feeds don’t use CDATA, so we need to unescape the contents manually.

And finally I think I’ve settled into a good spot. Was any of this difficult? No, but it shows the value of using code that’s already been tested. I certainly could have avoided some trouble by downloading all my followed feeds and testing them before push, but I wanted to do a little experiment. We don’t always know what cases are handled by existing code. What happens if we naively replace code based only on our initial understanding of the problem?

Posted 26 May 2025 19:47 by tedu Updated: 26 May 2025 19:47
Tagged: programming project