« "Low-end" EAI Is Where The Action's At | Main | Social Networking: Partying Like It's 1999 »
02/17/2004
RSS: A Big Success In Danger of Failure
There’s a lot of hoopla out these days about RSS. RSS is an XML-based standard for summarizing and ultimately syndicating web-site content. Adoption and usage of RSS has taken off in the past few years leading some to suggest that it will play a central role in transforming the web from a random collection of websites into a paradise of personalized data streams. However the seeds of RSS’s impending failure are being sown by its very success and only some serious improvements in the standard will save it from a pre-mature death.
Push 2.0?
The roots of RSS go all the way back to the infamous “push” revolution of late 1996/ early 1997. At that point in time, Pointcast captured the technology world’s imagination with a vision of the web in which relevant, personalized content would be “pushed” to end users freeing them from the drudgery of actually having to visit individual websites. The revolution reached its apex in February of 1997 when Wired Magazine published a “Push” cover story in which they dramatically declared the web dead and “push” the heir apparent. Soon technology heavyweights such as Microsoft were pushing their own “push” platforms and for a brief moment in time the “push revolution” actually looked like it might happen. Then, almost as quickly as took off, the push revolution imploded. There doesn’t appear to be one single cause of the implosion (outside of Wired’s endorsement), some say it was the inability to agree on standards while others finger clumsy and proprietary “push” software, but whatever the reasons “push” turned out to be a big yawn for most consumers. Like any other fad, they toyed with it for a few months and then moved on the big next thing. Push was dead.
Or was it? For while Push, as conceived of by PointCast, Marimba and Microsoft had died an ugly and (most would say richly deserved) public death, the early seeds of a much different kind of push, one embodied by RSS, had been planted in the minds of its eventual creators. From the outset, RSS was far different from the original “push” platforms. Instead of a complicated proprietary software platform designed to capture revenue from content providers, RSS was just a simple text-based standard. In fact, from a technical perspective RSS was actually much more “pull” than “push” (RSS clients must poll sites to get the latest content updates) but from the end-user’s perspective, the effect was basically the same. As an unfunded, collective effort RSS lacked huge marketing and development budgets, and so, outside of a few passionate advocates, it remained relatively unknown many years after its initial creation.
Recently though, RSS has emerged from its relative obscurity, thanks in large part to the growing popularity of RSS “readers” such as Feedemon, Newsgator, and Sharpreader. These readers allows users to subscribe to several RSS “feeds” at once, thereby consolidating information from around the web into one highly efficient, highly personalized, and easy-to-use interface. With it’s newfound popularity, proponents of RSS have begun hailing it as the foundation for creating a much more personalized and relevant web experience which will ultimately transform the web from an impenetrable clutter of passive websites, into a constant, personalized stream of highly relevant data that can reach a user no matter where they are or what device they are using.
Back to the Future?
Such rhetoric is reminiscent of the “push” craze, but this time it may have a bit more substance. The creators of RSS clearly learned a lot from push’s failures and they have incorporated a number of features which suggest that RSS will not suffer the same fate. Unlike “push”, RSS is web friendly. It uses the many of same protocols and standards the power the web today and uses them in the classic REST-based “request/response” architecture that underpins web. RSS is also an open standard that anyone is free to use in whatever way they see fit. This openness is directly responsible for the large crop of diverse RSS readers and the growing base of RSS friendly web sites and applications. Thus, by embracing the web instead of attempting to replace it, RSS has been able to leverage the web to help spur its own adoption.
One measure of RSS’s success is the number of RSS compliant, feeds or channels available on the web. At Syndicat8.com, a large aggregator of RSS feeds, the total number of feeds listed has grown over 2000% in just 2.5 years from about 2,500 in the middle of 2001 to almost 53,000 in February of 2004. The growth rate also appears to be accelerating as a record 7,326 feeds were added in January of 2004, which is 2X the previous monthly record.
A Victim of Its Own Success
The irony of RSS’s success though is that this same success may ultimately contribute to its failure. To understand why this might be the case, it helps to imagine the RSS community as a giant Cable TV operator. From this perspective, RSS has now has tens of thousands of channels and will probably hundreds of thousands of channels by the end of the year. While some of the channels are branded, most are little known blogs and websites. Now imagine that you want to tune into channels about, let’s say, Cricket. Sure there will probably be a few channels with 100% of their content dedicated to Cricket, but most of the Cricket information will inevitably be spread out in bits and pieces across the 100,000’s of channels. Thus, in order to get all of the Cricket information you will have to tune into hundreds, if not thousands, of channels and then try to filter out all the “noise” or irrelevant programs that have nothing to do with Cricket. That’s a lot of channel surfing!
The problem is only going to get worse. Each day as the number of RSS channels grows, the “noise” created by these different channels (especially by individual blogs which often have lots of small posts on widely disparate topics) also grows, making it more and more difficult for users to actually realize the “personalized” promise of RSS. After all, what’s the point of sifting through thousands of articles with your reader just to find the ten that interest you? You might as well just go back to visiting individual web sites.
Searching In Vain
What RSS desperately needs are enhancements that will allow users to take advantage of the breadth of RSS feeds without being buried in irrelevant information. One potential solution is to apply search technologies, such as key word filters, to incoming articles (such as pubsub.com is doing). This approach has two main problems: 1) The majority of RSS feeds include just short summaries, not the entire article, which means that 95% of the content can’t even be indexed. 2) While key-word filters can reduce the number of irrelevant articles, they will still become overwhelmed given a sufficiently large number of feeds. This “information overload” problem is not unique to RSS but one of the primary problems of the search industry where the dirty secret is that the quality of search results generally declines the more documents you have to search.
Classification and Taxonomies to the Rescue
While search technology may not solve the “information overload” problem, its closely related cousins, classification and taxonomies, may have just what it takes. Classification technology uses advanced statistical models to automatically assign categories to content. These categories can be stored as meta-data with the article. Taxonomy technology creates detailed tree structures that establish the hierarchical relationships between different categories. A venerable example of these two technologies working together is Yahoo!’s Website Directory. Here Yahoo has created a taxonomy, or hierarchical list of categories, of Internet sites. Yahoo has then used classification technology to assign each web site one or more categories within the taxonomy. With the help of these two technologies, a user can sort through millions of internet sites to find just those websites that deal with say, Cricket, in just a couple of clicks.
It’s easy to see how RSS could benefit from the same technology. Assigning articles to categories and associating them with taxonomies will allow users to subscribe to “Meta-feeds” that are based on categories of interest, not specific sites. With such a system in place, users will be able to have their cake and eat it to as they will effectively be subscribing to all RSS channels at once, but due to the use of categories they will only see those pieces of information that are personally relevant. Bye-bye noise!
In fact, the authors of the RSS anticipated the importance of categories and taxonomies early on and the standard actually supports including both category and taxonomy information within an RSS message, so the good news is that RSS is already “category and taxonomy ready”.
What Do You Really Mean?
But there’s a catch. Even though RSS supports the inclusion of categories and taxonomies, there’s no standard for how to determine what category an article should be in or which taxonomy to use. Thus there’s no guarantee that that two sites with very similar articles will categorize them the same way or use the same taxonomy. This raises the very real prospect that, for example, the “Football” category will contain a jumbled group of articles including articles on both the New England Patriots and Manchester United. Such as situation leads us back to an environment filled with “noise” and thus no better off when we started.
The theoretical solution to this problem is get everyone in a room and agree on a common way to establish categories and on a universal taxonomy. Unfortunately, despite the best efforts of academics around the world, this has so far proven impossible. Another idea might be to try and figure out a way to map relationships between different concepts and taxonomies and then provide some kind secret decoder ring that enables computers to infer how everything is interrelated. This is basically what the Semantic Web movement is trying to do. This sounds great, but it will likely be a long time before the Semantic Web is perfected and everyone will easily lose patience with RSS before then. (There is actually a big debate within the RSS community over how Semantic-web centric RSS should be.)
Meta-Directories And Meta-Feeds
The practical solution will likely be to create a series of meta-directories that collect RSS feeds and then apply their own classification tools and taxonomies to those feeds. These intermediaries would then either publish new “meta-feeds” based on particular categories or they would return the category and taxonomy meta-data to the original publisher which would then incorporate the metadata into their own feeds.
There actually is strong precedent for such intermediaries. In the publishing world, major information services like Reuters and Thompson have divisions that aggregate information from disparate sources, classify the information and then resell those classified news feeds. There are also traditional syndicators, such as United Media, who collect content and then redistribute it to other publications. In addition to these establish intermediaries, some RSS-focused start-ups such as Syndic8 and pubsub.com also looked poised to fulfill these roles should they choose to do so.
Even if these meta-directories are created, it’s not clear that the RSS community will embrace them as they introduce a centralized intermediary into an otherwise highly decentralized and simplistic system. However, it is clear that without the use of meta-directories and their standardized classifications and taxonomies the RSS community is in danger of collapsing under the weight of its own success and becoming the “push” of 2004. Let’s hope they learned from the mistakes of their forefathers.
February 17, 2004 in Blogs, Content Managment, Database, RSS | Permalink
Legal Disclaimer
The thoughts and opinions on this blog are mine and mine alone and not affiliated in any way with Inductive Capital LP, San Andreas Capital LLC, or any other company I am involved with. Nothing written in this blog should be considered investment, tax, legal,financial or any other kind of advice. These writings, misinformed as they may be, are just my personal opinions.
Comments