Saving RSS: Why Meta-feeds will triumph over Tags
It’s pretty clear that RSS has now become the de facto standard for web content syndication. Just take a look at the numbers. The total number of RSS feeds tracked by Syndic8.com has grown from about 2,500 in the middle of 2001, to 50,000 at the beginning of 2004, to 286,000 as of the middle of this month. That’s total growth of over 11,300% in just the past 3.5 years!
Feed Overload Syndrome
However, as I wrote at the beginning of last year, the very growth of RSS threatens to sow the seeds of its own failure by creating such a wealth of data sources that it becomes increasingly difficult for users to sift through all the “noise” to find the information that they actually need.
Just ask any avid RSS user about how their use of RSS has evolved and they will likely tell you the same story: When they first discovered RSS it was great because it allowed them to subscribe to relevant information “feeds” from all of their favorite sites and have that information automatically aggregated into one place (usually an RSS reader like Sharpreader or NewsGator). However, as they began to add more and more feeds (typically newly discovered blogs), the number of posts they had to review started rising quickly, so much so that they often had hundreds, if not thousands of unread posts sitting in their readers. Worse yet, many of these posts ended up either being irrelevant (especially random postings on personal blogs) or duplicative. Suffering from a serious case of “feed overload”, many of these users ultimately had to cut back on the number of feeds that they subscribed to in order to reduce the amount of “noise” in their in-box and give them at least a fighting chance of skimming all of their unread posts each day.
The First Step: Recognizing That You Have Problem
Many in the RSS community recognize that “Feed Overload Syndrome” is indeed becoming a big problem and have begun initiatives to try and address it.
Perhaps the most obvious way to address the problem is to create keyword based searches that filter posts based on keywords. The results from such searches can themselves be syndicated as an RSS feed. This approach has several problems though. First, many sites only syndicate summaries of their posts, not the complete post thus making it difficult to index the entire post. Second, keyword-based searches become less and less effective the more data you index, as the average query starts to return more and more results. Third, keywords often have multiple contexts which in turn produce significant “noise” in the results. For example, a keyword search about “Chicago” would produce information about the city, the music group, and the movie among other things. That said many “feed aggregation sites” such as Technorati and Bloglines currently offer keyword based searching/feeds and for most folks these are better than nothing. However it’s pre-ordained that as the number of feeds increase, these keyword filtering techniques will prove less and less useful.
Tag, You’re Categorized
Realizing the shortcomings of keyword-based searching, many people are embracing the concept of “tagging”. Tagging is simply adding some basic metadata to an RSS post, usually just a simple keyword “tag”. For example, the RSS feed on this site effectively “tags” my posts by using the dc:subject property from the RSS standard. Using such keywords, feed aggregators (such as Technorati, PubSub and Del.icio.us ) can sort posts into different categories and subscribers can then subscribe to these categorized RSS feeds, instead of the “raw” feeds from the sites themselves. Alternatively, RSS readers can sort the posts into user-created folders based on tags (although mine doesn’t offer this feature yet).
Tagging is a step in the right direction, but it is ultimately a fundamentally flawed approach to the issue. The problem at the core of tagging is the same problem that has bedeviled almost all efforts at collective categorization: semantics. In order to assign a tag to a post, one must make some inherently subjective determinations including: 1) what’s the subject matter of the post and 2) what topics or keywords best represent that subject matter. In the information retrieval world, this process is known as categorization. The problem with tagging is that there is no assurance that two people will assign the same tag to the same content. This is especially true in the diverse “blogsphere” where one person’s “futbol” is undoubtedly another’s “football” or another’s “soccer”.
Beyond a fatal lack of consistency, tagging efforts also suffer from a lack of context. As any information retrieval specialist will tell you, categorized documents are most useful when they are placed into a semantically rich context. In the information retrieval world, such context is provided by formalized taxonomies. Even though the RSS standard provides for taxonomies, tagging as it is currently executed lacks any concept of taxonomies and thus lacks context.
Deprived of consistency and context, tagging threatens to become a colossal waste of time as it merely adds a layer of incoherent and inconsistent metadata on top of an already unmanageable number of feeds.
While tagging may be doomed to confusion, there are some other potential approaches that promise to bring order to RSS’s increasingly chaotic situation. The most promising approach involves something called a Meta-feed. Meta-feeds are RSS feeds comprised solely of metadata about other feeds. Combining meta-feeds with the original source feeds enables RSS readers to display consistently categorized posts within rich and logically consistent taxonomies. The process of creating a meta-data feed looks a lot like that needed to create a search index. First, crawlers must scour RSS feeds for new posts. Once they have located new posts, the posts are categorized and placed into a taxonomy using advanced statistical processes such as Bayesian analysis and natural language processing. This metadata is then appended to the URL of the original post and put into its own RSS meta-feed. In addition to the categorization data, the meta-feed can also contain taxonomy information, as well as information about such things as exact/near duplicates and related posts.
RSS readers can then request both the original raw feeds and the meta-feeds. They then use the meta-feed to appropriately and consistently categorize and relate each raw post.
For end users, meta-feeds will enable a wealth of features and innovations. Users will be able to easily find related documents and eliminate duplicates of the same information (such as two newspapers reprinting the same wire story). Users will also be able to create their own custom taxonomies and category names (as long they relate them back to the meta-feed). Users can even combine meta-feeds from two different feeds so long as one of the meta-feed publishers creates an RDF file that relates the two categories and taxonomies (to the extent practical). Of course the biggest benefit to users will be that information is consistently sorted and grouped into meaningful categories allowing them greatly reduce the amount of “noise” created by duplicate and non-relevant posts.
At a higher level, the existence of multiple meta-feeds, each with its own distinct taxonomy and categories, will in essence create multiple “views” of the web that are not predicated on any single person’s semantic orientation (as is the case with tagging). In this way it will be possible to view the web through unique editorial lenses that transcend individual sites and instead present the web for what it is: a rich and varied collective enterprise that can be wildly different depending on your perspective.
The Road Yet Traveled
Unfortunately, the road to this nirvana is long and as of yet, largely un-traveled. While it may be possible for services like Pubsub and Technorati to put together their own proprietary end-to-end implementations of meta-feeds, in order for such feeds to become truly accepted, standards will have to be developed that incorporate meta-feeds into readers and allow for interoperability between meta-feeds.
If RSS fails to address “Feed Overload Syndrome”, it will admittedly not be the end of the world. RSS will still provide a useful, albeit highly limited, “alert” service for new content at a limited number of sites. However for RSS to reach its potential of dramatically expanding the scope, scale, and richness of individuals’ (and computers’) interaction with the web, innovations such as meta-feeds are desperately needed in order to create a truly scaleable foundation.