Persistent Search: Search’s Next Big Battleground
What do you get when you marry ping servers and RSS with stored queries? A whole new type of search that is destined to become the search industry’s next big battleground: Persistent Search. While Persistent Search presents search companies with difficult new technical challenges and potentially higher infrastructure costs, it also gives them powerful mechanisms for building much stronger user relationships which may increase the value of their advertising services.
The Search That Never Stops
Simply put, Persistent Search allows users to enter a search query just once and then receive constant, near real-time, automatic updates whenever new content that meets their search criteria is published on the web. For example, let’s say you are a stock trader and you want to know whenever one of the stocks in your portfolio is mentioned on the web. By using a persistent search query, you can be assured that you will receive a real-time notification whenever one of your stocks is mentioned. Or perhaps you are a teenager who is a rabid fan of a rock group. Wouldn’t it be nice to have a constant stream of updates on band gossip, upcoming concerts, and new albums flowing to your mobile phone? Or maybe you are just looking to rent the perfect apartment or buy a specific antique. Wouldn’t it be nice to get notified as soon as new items which roughly matched your criteria were listed on the web so that you were able to respond before someone else beat you to the punch? Persistent search makes all of this possible for end users with very little incremental effort.
Something Old, Something New
While the technical infrastructure required for Persistent Search services leverages existing search technology, there are several new elements that must be added to existing technology to make Persistent Search a reality. These elements include:
- Ping Servers: Most blogs and an increasing number of other sites now send special “pings” to so called “Ping Servers” every time they publish new content. The ping servers do things such as cue crawlers at a search engine to re-index a site or provide a summarized list of recently published information to other web sites. Because ping servers are the first to know about newly published content, they are critical to enabling the real-time nature of Persistent Search.
- RSS: RSS feeds can be used both to feed raw information into Persistent Search platforms (in a similar fashion to what GoogleBase does) as well as to take processed queries out. RSS is a polling based mechanism so it does not provide real time notification, but it is good enough in most cases.
- Stored Queries: Stored queries are simply search queries that are “saved” for future use. Ideally, the stored query is constantly running in the background and it flags any new piece of content that meets the search criteria. While this is a simple concept, it presents some very difficult technical challenges. The easiest way for a search engine to do a stored query would be to execute the stored query into its existing index at some regular interval, say once an hour. However, executing each unique stored query 24 times a day could start to become very expensive if Persistent Search starts to take off. One could easily imagine search companies in the near future executing billions of incremental stored queries an hour. Processing these added queries will take lots of extra resources, but will not generate the same amount of revenue that traditional ad-hoc search queries generate because stored queries will often return no new results. One alternative would be for search companies to use the “stream database” query techniques pioneered by start-ups such as StreamBase. These techniques would allow them to query the new content as it flows into their index, not just reducing overall query load but also improving latency. However changing their query approach is a huge step for existing search companies and therefore one that it unlikely to be undertaken. One more likely approach might to use special algorithms to combine stored queries into “master queries”. This reduces the number of queries that need to be executed and uses simple post-query filters to “personalize” the results. Given their critical important to overall quality of persistent search, the design of “stored query architectures” is likely to become one of the key technical battle grounds of search companies, much the way query result relevancy has been for the past few years.
Once these three pieces are put together, search companies will be in position to provide rich Persistent Search services. The results of those services will be distributed to end users via e-mail, RSS, IM, SMS, or some pub-sub standard, depending on their preferences and priorities.
The Business of Persistent Search
From a business and competitive perspective, Persistent Search has a number of very attractive aspects to it relative to traditional ad-hoc queries. Traditional ad-hoc search queries tend to result in very tenuous user relationships with each new query theoretically a competitive “jump ball”. Indeed, the history of search companies, with no less than 5 separate search “leaders” in 10 years, suggests that search users are not very loyal.
Persistent Search presents search companies with the opportunity to build rich, persistent relationships with their users. The search engine that captures a user’s persistent searches will not only have regular, automatic exposure to that user, but they will be able to build a much better understanding of the unique needs and interests of that user which should theoretically enable them to sell more relevant ads and services at higher prices. They will also stand a much better chance of capturing all or most of that users’ ad-hoc queries because they will already be in regular contact with the user.
It is this opportunity to build a long term, rich relationship directly with a uniquely identifiable consumer that will make persistent search such an important battle ground between the major search companies. Persistent search may be especially important to Google and Yahoo as they attempt to fight Microsoft’s efforts to imbed MSN Search into Windows Vista.
It should also be noted that enterprise-based Persistent Search offers corporations the opportunity to improve both internal and external communications. For example, it’s not hard to imagine the major media companies offering persistent search services or “channels” to consumers for their favorite actor, author, or singer.
The State of Play
As it stands, Persistent Search is in its infancy. The world leader in persistent search is most likely the US government, specifically the NSA, however the extent of their capabilities appears to be a closely guarded secret. Some of the commercial players in the space include:
- Pub-Sub is in many ways the commercial pioneer of the space. It operates one of the largest ping servers and has been offering customized key word-based “real time” persistent searches for some time. Some start-ups are even building vertical persistent search services on top of pub-sub’s infrastructure. However Pub-Sub lacks the infrastructure and resources of a complete search engine and lacks the consumer awareness and multi-channel distribution capabilities needed to reach critical consumer mass.
- Google itself offers Google Alerts, a service that enables users to be e-mailed the results of stored queries, however this service relies on Google’s general index and so it could take days for new content to appear. It also does not appear to directly incorporate ping servers and does not offer any distribution mechanisms beyond e-mail. In addition, as a beta service, it’s not clear that it is capable of scaling efficiently should demand take off.
- Real Time Matrix is a start-up focused on aggregating RSS feeds and re-broadcasting them based on user preferences. The net effect of their technology is to deliver real-time Persistent Search-based RSS feeds to consumers.
- Technorati, a popular blog search engine, allows users to create “watchlists” of persistent searches. However Technorati limits its index to blogs and so does not offer a comprehensive search service.
- Yahoo and MSN offer their own alert services and while these services provide very good distribution options (e-mail, IM, SMS), they do not integrate into their search engines but just offer canned updates for things such as news stories and the weather.
- WebSite-Watcher and Trackengine offer services that track specific sites and/or RSS feeds for changes, but they do not allow fine grained free text queries and are just focused at the site level.
Despite this activity, no one has yet to put together an end-to-end Persistent Search offering that enables consumer-friendly, comprehensive, real-time, automatic updates across multiple distribution channels at a viable cost. That said, the opportunity is clear and the competitive pressures are real, so I expect to see rapid progress towards this goal in the near future. It will be interesting to see how it plays out.
Other Articles In This Blog By Topic: Blogs Collaboration Content Managment CRM Database Development Tools EAI ERP Internet Middleware Network Management Open Source Operating Systems Operations Management PLM RSS Security Software Stocks Supply Chain Venture Capital Wall Street Web Services Wireless
The thoughts and opinions on this blog are mine and mine alone and not affiliated in any way with Inductive Capital LP, San Andreas Capital LLC, or any other company I am involved with. Nothing written in this blog should be considered investment, tax, legal,financial or any other kind of advice. These writings, misinformed as they may be, are just my personal opinions.