RSS Hacking with Feed43 and YQL

My most recent project has been using certain free tools to create RSS feeds. I’m an avid user of Google Reader (since 2007 I’ve read something like 133k items) and I (for example) don’t want to open a page just to view an image when it, or have to manually go to a website that is updated regularly just because they don’t have a feed. I’ll demonstrate the use of two tools that will help with that, Feed43 and YQL. If you have any particularly fun ideas on the use of this, feel free to leave them in the comments! All of the fun is after the cut.

Making a feed from a webpage

So the first example is the webcomic Ctrl+Alt+Delete. In his RSS feed, he only links to his site and doesn’t embed the actual comic. Well, we can fix that by making a feed that targets the actual daily comic image. This is also the method you’d use for a comic (or other webpage) that doesn’t have a feed at all. Go to http://feed43.com/ and click “Create your own feed” to get started. The address is http://www.cad-comic.com/cad/, so paste that in there and click “Reload.”

We’re interested in grabbing the comic image, so after some trial and error you can figure out where it is, mostly by reading the page’s source above and making variations of the part we want. I found:

{*}img{*}src="{%}" alt="{%}" title="{%}" style="width: {%}; height: {%}" />{*}

works pretty well for me. Let’s explain this string, now. We’re trying to explain what’s important about this page to Feed43, so it has to match the format. Every time you use {%}, it’s telling Feed43 that that’s a bit of text you’d want to use in your feed; and {*} is a bit of text that varies but that you don’t care about. When you hit the “extract” button it’ll tell you what kind of information it has found using your string. Once you’ve got the fields you need (in our case, the image url) you can start building a feed.

The title and description of your feed could be anything you feel describes it; I’d put the name of the comic in mine. The RSS Item Properties is more interesting, though; here, I put {%2} as the title, {%1} (the image url) as the link, and <img src="{%1}" title="{%2}" alt="{%3}"/> as the content (which displays the comic image in the feed, as discussed above). Hit preview to see your work and drop the resulting file into your favorite feed reader.

Working example of this one: feed43.com/8532302726676705.xml

Making a feed from another feed

So my college’s newspaper is The Technique, and they’ve got an RSS feed that has the story headlines here: http://www.nique.net/feed/ but I want an RSS feed with the entire story. Looking at their RSS feed, the story is actually included in a CDATA block that doesn’t show up in Google Reader. Well, we can fix that. Here’s the item template I used:

<item>{*}<title>{%}</title>{*}<link>{%}</link>{*}<content:encoded><![CDATA[{%}]]></content:encoded>{*}</item>

The RSS Item Properties then become {%1}, {%2}, and {%3}, respectively. Snazzy! Here's mine: feed43.com/7706512666248645.xml

Using YQL

YQL is a tool by Yahoo that amalgamates a variety of data sources and analysis tools. Let's go for something basic and make an RSS feed of the weather. Go to the YQL Console and look down the data tables on the right side of the page until you find weather. Click it, click on weather.forecast and it'll give you an example query. We want to get the weather for whereever you are, so change that zip code and click the Test button underneath. Okay, we've got some XML to play with, but just to make it easy later on, change your query to say select item from weather instead of select * from weather, so only the information that we're interested shows up. Grab the url at the bottom of the page ("the rest query") and stuff that into Feed43.

Making a feed from YQL

Now we're going to parse this just like we did the webpage above. One tweak you can use is specifying that the global search pattern is <channel>{%}</channel>. For my item search pattern, I used the following:

<item><title>{%}</title>{*}
<link>{*}*{%}</link>{*}
<pubDate>{%}</pubDate>{*}
<yweather:condition {*} temp="{%}" text="{%}"/>{*}
<description><![CDATA[{%}({*}]]></description>{*}
</item>

The most important part of that is the description, which has a nicely formatted weather display. To grab that, you set the Item Content Template to {%6} (or whatever it happens to be on yours). Preview it, and you've got an RSS feed of the weather near you! If you look over the available YQL tables, you might think of another example.

Here's mine: feed43.com/0376374670181450.xml

Making a feed from a YQL query from a feed

This is where we're going to put everything together. Let's say that I just want a feed of images in Ars Technica articles. I can't think of a better example that works at the moment, but this technique is useful when you need some content from each page linked to by an RSS feed. I'd try this YQL query:

select * from html where url in (select link from feed where url='http://feeds.arstechnica.com/arstechnica/everything') and xpath="//div[@id='story']/div[2][@class='story-image CenteredImage']/img"

The most complicated thing here is the xpath, which specifies what part of the page YQL should return. I've found that the Chrome extension xpathOnClick works pretty well to help you with this, although there are other extensions and methods to determine that as well.

From here, we stuff the resulting REST query into Feed43, and set our search pattern to <img {*} src="{%}" {*}/>.

Set the item title and link to {%1}, and then set the content template to be

<img src="{%1}">

And it works! Link: feed43.com/8713878217633741.xml

Other examples

Dropbox

Dropbox has an RSS feed of things that you've put into it. You can make that into an image blog by extracting the URL of the item you've placed and adding an image tag in the Item Template. The item search pattern is:

<item>{*}
<title>You added the file{%}</title>{*}
<pubDate>{%}</pubDate>{*}
<description><![CDATA[{%}<a href='{%}'>{%}</a>{*}]]></description>{*}
<guid>{%}</guid>{*}
</item>

The item content template is:

<img src="{%4}"/>

AutoHotKey_L Changelog

I can't go a post without mentioning AutoHotKey in some way. Here's how to get a feed of the changelog:
url: http://www.autohotkey.net/~Lexikos/AutoHotkey_L/docs/AHKL_ChangeLog.htm
global template: <body>{%}</body>
item template:

<h2>{%}</h2>
{%}

Item title: {%1}
Item template: {%2}

Mine: feed43.com/1651418770472811.xml

The Technique classifieds

Maybe you're looking for a new apartment?
url: http://ad2adnetwork.biz/ad2ad/index.php?tpsb=2500406&action=showallads
item template:

<SPAN class={*}>{%}</SPAN><SPAN class={*}>{%}</SPAN>

Item title: {%1}
Item template: {%2}

Mine: feed43.com/6173010227672288.xml

Andrew Guyton
http://www.disavian.net/

3 Comments

Multiple xpath lookups in YQL – Andrew Guyton's Blog

[…] understand what we’re doing here, you may want to read my introduction to using YQL and Feed43 to create custom RSS feeds. I wanted to pull multiple unconnected elements from a webpage to make an RSS feed, but every […]

okomi

Hello Sr. I have a question hope you don’t mind answering it, imagine what you capture in {%1} is a url to a different page (all the {%1} are links to different pages, for example links to different post on a blog) how would you open that link and show it on the “Item Content Template*” feed43 feed ??

Incredible tutorial thank you very much.

Andrew Guyton

That sounds like you’d have to use YQL to get the content of the page and the url, and then put that YQL query into Feed43.

Leave a Reply