Google have added a new “recommendations” feature to Google Reader. First I thought, “Oh great, they’ve stolen my idea.” But actually, it’s not even close to the goal of increasing the precision of my Google Reader inbox. Recommendations does not appear to be using any kind of classification (e.g. StumbleUpon), instead just clumping all users “likes” together in one big naive popularity contest.
The interface is simple. You can click an “I like this” button for each item. This is the most important UI feature Reader has introduced to date. Using the shortcut keys, I can read articles with acceptable speed and use the L key to quickly flag interesting items. However, what Google does with this “user X likes item Y” training data needs a lot of work.

Here’s some improvements that need to be made, ASAP:
- It shouldn’t show me items from feeds I subscribe to and have already read (syntactic duplication).
- It shouldn’t show me reposts of news stories I have already read (semantic duplication). If a story is deemed relevant, show me the most authoritative reporting of it.
- It shouldn’t show me useless no-content feeds that require you go to the original site to view the story.
- If it’s going to recommend YouTube videos, then it should use the mountain of data it already has on the YouTube network already, not just recommend based on popularity.
- Recommendations need to have much higher precision. Currently, I estimate its less than 0.1 (for every 10 items I read, 1 is relevant).
- It should apply the relevance filtering to posts in my existing subscriptions, most of which have similarly low precision.
- However, there are some feeds such as web comics which should not be filtered. I want to read every single XKCD whether I find it funny or not. If a system could predict which I find funny before I read them I’d be thoroughly impressed!
- Ranking of items (by “magic”? please…) is NOT important. I want to read stories from oldest to newest. I want recall of 1.0 and precision of at least 0.8 or I’m not interested.
To be successful, it needs to merge StumbleUpon’s classification system (which has the logic right) with the Google Reader framework (which has the interface right).
To make a parallel with Gmail and spam classification, the reason Gmail’s anti-spam shits all over other spam classifiers is that Google added a simple “This is Spam” button to the web interface, effectively outsourcing the training of spam messages to its enormous user base. Similar techniques can be applied to Google Reader, but on an individualised basis.
Key to the success of such a classifier is social analysis, which is used by StumbleUpon and Last.fm recommends music I might like, based on what people with similar taste listen to.
