Term extractor

There are a lot of little coding things I’d like to play around with. I have a whole Ta-da list of ideas to investigate and rummage through. Unfortunately, real life tends to get in the way, sucking away all my available time so that few, if any, of these ideas actually get implemented.

Rather than simply letting them wither and die, I thought it would be better to at least drone on about them here so that someone else (with more time than me) can take something and run with it.

One of the things I keep meaning to investigate further is a Web Service from Yahoo called Term Extraction. This little REST request is tucked away on the Yahoo Developer Network. Don’t let the innocuousness fool you. This looks like one powerful API.

You pass it a string of text like, oh say, the contents of a blog post. It then returns either XML or JSON (whichever you prefer) containing the keywords it extracted from the text. In other words, it sifts out all the "noise words" and brings back the terms that appear relevant. In fact, it’s using the very algorithm that the Yahoo search engine uses to rank pages.

Now, I was thinking that it would be fun to mash this up with APIs from other Web Services. What if you treated each returned term as a tag? You could then pass those tags to any number of tag-based services, like Flickr, Del.icio.us, or Technorati.

So, instead of the simple "here’s my Technorati profile" or "here are my Flickr pics" on a blog, you could have links that were specific to each individual blog post. If I sent the text of this post to the term extractor, it would return a list of terms like "api", "yahoo", etc. By passing those terms as tags to a service like Technorati or Del.icio.us, readers could be pointed to other blog posts and articles that are (probably) related.

That’s the idea anyway. Like I said, I just don’t seem to have much time these days to investigate this kind of thing further. But, at the very least, I can point others to the term extractor and infect people with my meme-ish ravings.

Also on AMP

Have you published a response to this? :