Persistent Search: Search’s Next Big Battleground
What do you get when you marry ping servers and RSS with stored queries? A whole new type of search that is destined to become the search industry’s next big battleground: Persistent Search. While Persistent Search presents search companies with difficult new technical challenges and potentially higher infrastructure costs, it also gives them powerful mechanisms for building much stronger user relationships which may increase the value of their advertising services.
The Search That Never Stops
Simply put, Persistent Search allows users to enter a search query just once and then receive constant, near real-time, automatic updates whenever new content that meets their search criteria is published on the web. For example, let’s say you are a stock trader and you want to know whenever one of the stocks in your portfolio is mentioned on the web. By using a persistent search query, you can be assured that you will receive a real-time notification whenever one of your stocks is mentioned. Or perhaps you are a teenager who is a rabid fan of a rock group. Wouldn’t it be nice to have a constant stream of updates on band gossip, upcoming concerts, and new albums flowing to your mobile phone? Or maybe you are just looking to rent the perfect apartment or buy a specific antique. Wouldn’t it be nice to get notified as soon as new items which roughly matched your criteria were listed on the web so that you were able to respond before someone else beat you to the punch? Persistent search makes all of this possible for end users with very little incremental effort.
Something Old, Something New
While the technical infrastructure required for Persistent Search services leverages existing search technology, there are several new elements that must be added to existing technology to make Persistent Search a reality. These elements include:
- Ping Servers: Most blogs and an increasing number of other sites now send special “pings” to so called “Ping Servers” every time they publish new content. The ping servers do things such as cue crawlers at a search engine to re-index a site or provide a summarized list of recently published information to other web sites. Because ping servers are the first to know about newly published content, they are critical to enabling the real-time nature of Persistent Search.
- RSS: RSS feeds can be used both to feed raw information into Persistent Search platforms (in a similar fashion to what GoogleBase does) as well as to take processed queries out. RSS is a polling based mechanism so it does not provide real time notification, but it is good enough in most cases.
- Stored Queries: Stored queries are simply search queries that are “saved” for future use. Ideally, the stored query is constantly running in the background and it flags any new piece of content that meets the search criteria. While this is a simple concept, it presents some very difficult technical challenges. The easiest way for a search engine to do a stored query would be to execute the stored query into its existing index at some regular interval, say once an hour. However, executing each unique stored query 24 times a day could start to become very expensive if Persistent Search starts to take off. One could easily imagine search companies in the near future executing billions of incremental stored queries an hour. Processing these added queries will take lots of extra resources, but will not generate the same amount of revenue that traditional ad-hoc search queries generate because stored queries will often return no new results. One alternative would be for search companies to use the “stream database” query techniques pioneered by start-ups such as StreamBase. These techniques would allow them to query the new content as it flows into their index, not just reducing overall query load but also improving latency. However changing their query approach is a huge step for existing search companies and therefore one that it unlikely to be undertaken. One more likely approach might to use special algorithms to combine stored queries into “master queries”. This reduces the number of queries that need to be executed and uses simple post-query filters to “personalize” the results. Given their critical important to overall quality of persistent search, the design of “stored query architectures” is likely to become one of the key technical battle grounds of search companies, much the way query result relevancy has been for the past few years.
Once these three pieces are put together, search companies will be in position to provide rich Persistent Search services. The results of those services will be distributed to end users via e-mail, RSS, IM, SMS, or some pub-sub standard, depending on their preferences and priorities.
The Business of Persistent Search
From a business and competitive perspective, Persistent Search has a number of very attractive aspects to it relative to traditional ad-hoc queries. Traditional ad-hoc search queries tend to result in very tenuous user relationships with each new query theoretically a competitive “jump ball”. Indeed, the history of search companies, with no less than 5 separate search “leaders” in 10 years, suggests that search users are not very loyal.
Persistent Search presents search companies with the opportunity to build rich, persistent relationships with their users. The search engine that captures a user’s persistent searches will not only have regular, automatic exposure to that user, but they will be able to build a much better understanding of the unique needs and interests of that user which should theoretically enable them to sell more relevant ads and services at higher prices. They will also stand a much better chance of capturing all or most of that users’ ad-hoc queries because they will already be in regular contact with the user.
It is this opportunity to build a long term, rich relationship directly with a uniquely identifiable consumer that will make persistent search such an important battle ground between the major search companies. Persistent search may be especially important to Google and Yahoo as they attempt to fight Microsoft’s efforts to imbed MSN Search into Windows Vista.
It should also be noted that enterprise-based Persistent Search offers corporations the opportunity to improve both internal and external communications. For example, it’s not hard to imagine the major media companies offering persistent search services or “channels” to consumers for their favorite actor, author, or singer.
The State of Play
As it stands, Persistent Search is in its infancy. The world leader in persistent search is most likely the US government, specifically the NSA, however the extent of their capabilities appears to be a closely guarded secret. Some of the commercial players in the space include:
- Pub-Sub is in many ways the commercial pioneer of the space. It operates one of the largest ping servers and has been offering customized key word-based “real time” persistent searches for some time. Some start-ups are even building vertical persistent search services on top of pub-sub’s infrastructure. However Pub-Sub lacks the infrastructure and resources of a complete search engine and lacks the consumer awareness and multi-channel distribution capabilities needed to reach critical consumer mass.
- Google itself offers Google Alerts, a service that enables users to be e-mailed the results of stored queries, however this service relies on Google’s general index and so it could take days for new content to appear. It also does not appear to directly incorporate ping servers and does not offer any distribution mechanisms beyond e-mail. In addition, as a beta service, it’s not clear that it is capable of scaling efficiently should demand take off.
- Real Time Matrix is a start-up focused on aggregating RSS feeds and re-broadcasting them based on user preferences. The net effect of their technology is to deliver real-time Persistent Search-based RSS feeds to consumers.
- Technorati, a popular blog search engine, allows users to create “watchlists” of persistent searches. However Technorati limits its index to blogs and so does not offer a comprehensive search service.
- Yahoo and MSN offer their own alert services and while these services provide very good distribution options (e-mail, IM, SMS), they do not integrate into their search engines but just offer canned updates for things such as news stories and the weather.
- WebSite-Watcher and Trackengine offer services that track specific sites and/or RSS feeds for changes, but they do not allow fine grained free text queries and are just focused at the site level.
Despite this activity, no one has yet to put together an end-to-end Persistent Search offering that enables consumer-friendly, comprehensive, real-time, automatic updates across multiple distribution channels at a viable cost. That said, the opportunity is clear and the competitive pressures are real, so I expect to see rapid progress towards this goal in the near future. It will be interesting to see how it plays out.
Real Estate, Cars, Jobs: Watch Out World, Google Base Has Only Begun To Stir
There has been a lot of commentary about Google Real Estate’s beta launch earlier this week. It turns out that Google also is quietly testing a similar service for cars and jobs as well. Both the real estate launch and the car launch take data from Google base and integrate it with Google Maps, providing a consumer friendly front end to the database. (My guess is that the appearance of both services this week probably has something to do with the release of Google Maps’ 2.0 API).
With the launch of these Google Base front-ends, Google is clearly putting into place the major pieces required to support its vertical search platform. Broadly speaking, such a platform requires 4 major pieces:
- A big, highly scalable database that can handle lots of queries. This, of course, is what Google Base was all about.
- Consumer friendly front ends to access these databases. The auto and real estate front ends are obviously the first of such front ends.
- A large, robust, crawling farm. This is obviously Google’s crown jewel.
- A set of intelligent algorithms to find, classify, and flag listings. We have yet to see this from Google.
Most people remain unimpressed by Google Base because it doesn’t seem to contain a lot data. That’s because what you are seeing is a work in progress that is being purposely hobbled to reduce load during the testing phase. Google has now built beta versions of pieces #1 and #2. We will un doubtedly soon see pieces #3 and #4. Only when those pieces are in place will Google Base fulfill its potential.
In terms of piece #3, Google likely has to make changes and updates to its core crawler code in order to accomplish this. This is a non-trivial task and not something undertaken lightly. Piece #4 requires a decent number of Google’s PhD’s to build and test algorithms for recognizing listings within unstructured data and then structuring it, also a non-trivial task. However, as someone who watched Google hire some of the brightest minds in unstructured data management I can tell you that have more than enough firepower to accomplish the mission.
Once Google, hooks up pieces #3 and #4 (likely at the same time) a flood of information will cascade into Google Base and from there into the fronts that they recently launched. If you want to view a good approximation of what Google Base will look like once it is finished, go look at Vast.com.
Losers and Bigger Losers
There will be two sets of losers in all of this. The first and most immediate set of losers will be the start-up vertical search players (indeed one can only imagine the long faces at Trulia (and their VC backers) when they got their first look at Google Real Estate). Of course losers may not be an accurate term as the correct response to Google Base from these companies should be to pick up the phone and call Yahoo, Microsoft, IAC, and AOL and say “you guys need to buy us because Google is going to clean your clock” and who knows some of the big boys might just hit the panic button and write a few big checks.
The problem for these vertical search players is that Google has set a very high bar by integrating its vertical search seamlessly into it’s free text query engine, crawler farm, and data base. If a Yahoo or Microsoft, were to buy several different vertical search start-ups to respond to Google Base (and they must respond one way or the other) they would inherit a huge integration headache and be faced with a massive back-end restructuring. Faced with this headache, some of them may well decide to follow in Google’s footsteps and built it from scratch. Alternatively, they may prefer to acquire a more “horizontal” vertical search play, such as Vast, which has already built a multi-vertical crawl and classify engine. Either way, if I were running a vertical search engine I would be putting a sign on my front door right about now reading “No reasonable offers refused”.
The second set of losers in this are the well established listings-focused Walled Gardens of the Internet. As I have outlined before in detail, these Walled Galled face a fundamental threat from search. A fully functioning Google Base will make that threat more real than ever.
I don’t know when pieces #3 and #4 will be launched, but for me that will be the single most interesting day in the short life of Google Base and far more deserving of hoopla than the launch of a few simple front ends.
Virtual Stock Portfolio: March 2006
The Burnham's Beat Virtual Stock portfolio was up modestly in March. The average pick in the portfolio was up 2.5% while the overall portfolio was up 1.7%. This underperformed the NASDAQ though, which was up 2.6%. This month's underperformance was almost entirely attributable to one stock, Bank Rate, which I will get to later.
For the 1st Quarter overall, the portfolio was up 13.8% which is well ahead of the average quarterly return since inception of 8.9%. This also compares favorably to the NASDAQ's 6.1% quarterly gain. Short positions were flat (+0.0%)in the quarter and all the gains were generated by long positions. Overall this quarter went very well with 8 out of the 11 positions profitable. It would have been even better had I not pulled the trigger too early on a couple of short picks (CNVR, RATE).
As it happens, this will be the last month I post the Burnham's Beat Virtual Stock Portfolio. It's been a lot of fun publicly picking stocks again and letting the chips fall where they may at the end of each month. I'd love to keep doing it but I have to stop publishing this kind of stuff for some professional reasons.
For the record, the virtual portfolio makes its final close up 109.3% from the day I started it (1/26/04). That compares the NASDAQ's 8.6% gain during the same period. This equates to annualized return of 36% despite the fact that the portfolio's average market exposure was just 3.8%. Of course, this was a "virutal" portfolio on a blog so it doesn't count, but it was fun to test myself against the market again. For those that have followed my ups and downs, thanks for all your comments and criticisms.
Company: Microstrategy Ticker: MSTR
Sub-sector: Business Intelligence
Investment Thesis: I like the BI space in general and have been keeping my eye on Microstrategy. This has recently been one of the cheaper stocks in the space, yet it also has one of the better product portfolios and market positions. Businesses are still spending big bucks on BI and MSTR should be a big beneficiary.
Performance: Since 3/31/05: +94%, Mar. vs. Feb.: +14.8%
Comments: Great month which means it is finally trading back in range with the rest of BI players. It may have a bit more room to grow here because the street estimates remain too low, but most of the easy money is out of this stock now and it's probably a good time to move on. This stock is a good case study on how post-SARBOX accounting problems can cloud the street's judgement for companies with good products in strong markets.
Company: Actuate Ticker: ACTU
Sub-sector: Business Intelligence
Investment Thesis: Actuate is a business intelligence company with a particular focus on enterprise reporting. I had a long position in ACTU in 2004 and lost money on it, but I think the stock is back on the upswing now thanks to an improved product line and focus. ACTU trades at a healthy discount to rest of the BI group (kind of like SPSS did at one point) and every penny of upside in its EPS could really move the stock.
Performance: Since 9/30/05: +68.0% Mar. vs. Feb.: +9.5%
Comments: Another strong month and with the stock trading at 19X 06, still some more room to grow. I think the street is wrong on both top and bottom lines due to an acquisition at the beginning of the year. We will see!
Company: OpenText Ticker: OTEX
Sub-sector: Content Management
Investment Thesis: OpenText is a content management company that went on an acquisition binge in 2003 and 2004. The stock suffered from all the M&A related charges and fallout but management now claims that they are going to resolutely focus on EPS growth. OTEX trades at a healthy discount to the rest of the content management group and has a broad product portfolio. Integration snafus could trip them up, but the low multiple on the stock should limit any potential damage.
Performance: Since 9/30/05: +17.5% Mar. vs. Feb.: -6.1%
Comments: Only long postion to lose money this month which concerns me. I also don't like the deferred revenue trend in the business. This is probably the weakest long in the portfolio right now in terms of Q1 exposure.
Company: Cryptologic Ticker: CRYP
Sub-sector: Gaming Software
Investment Thesis: Cryptologic is a provider of gambling software to online casinos and poker rooms. They license their software to numerous companies in return for a cut of the take. About 70% of their revenues are from casino related software sales and about 30% from poker related sales. Since they are a technology provider and not an operator they actually are listed in the US and do not appear to be in danger of violating any online gambling laws.
Performance: Since 9/30/05: +47.0% Mar. vs. Feb.: +2.2%
Comments: Trading in line with the other online gambling comps now. A competitor, Playtech, went public this month and its growth rates may make Cryptologic look relatively unattractive. Playtech may be the better way to play this trend as it is cheaper on a PEG basis.
Company: Party Gaming Ticker: PRTY.L
Sub-sector: Online Gambling
Investment Thesis: Party gaming is the largest online gambling company in the world with a focus on poker, but a very quickly growing casino operation as well. Some may recall that I had PRTY long in a successful pair trade in Q4 05. After seeing Party's Q4 report and doing some modeling I feel compelled to add them into the portfolio as a pure long bet. Party not only showed good growth in poker in Q4, but had an absolute blow-out quarter in its casino business thanks to cross selling into its poker base. By my calculations the stock is currently trading at 11X 2006 EPS even though it should grow 30%-40% on the top/bottom line without adding any new businesses. Oh, and there's a 3% dividend payment coming in May.
Performance: Since 1/31/06: -2.1% Mar. vs. Feb.: 2.8%
Comments: Concerns about online gambling legislation continue to buffet the biggest online gambling stock. 2.5% dividend payment in May makes this easier to hold on to.
Company: Agile Software Ticker: AGIL
Sub-sector: Supply Chain
Investment Thesis: The supply chain sector has been a complete disaster the last few years and Agile's stock has been no exception. However, AGIL has actually grown revenue over the last four years and while it's still GAAP negative it actually seems to have turned the corner in terms of generating positive operating cash flow. It's only trading at about 1.2X EV/Sales which is low given it's potential leverage once it gets its expense base in order.
Performance: Since 1/31/06: +18.7% Mar. vs. Feb.: +9.9%
Comments: M&A speculation in the supply chain/PLM market is starting to heat up after Matrix One 's purchase and Agile is the primary beneficiary of this speculation.
Company: Wave Systems Ticker: WAVX
Investment Thesis: I first encountered Wave when I wrote my initial analyst report on Wall Street in the mid-1990s. Wave has remained in business largely by claiming that it is developing revolutionary security technologies, kind of like a bio-tech company that never gets out of trials. With a grand total of $1.4M in revenues over the last 3.5 years, a $4M/quarter cash burn rate and only $4M or so in the bank, a day of reckoning is fast approaching.
Performance: Since 10/1/04: +34.1% Mar. vs. Feb.: +7.6%
Comments: While it was a good month I remain amazed the stock is not down more. Delisting should be confirmed in April but appeals will likely push that off for at least a month or so. My guess is that they will reverse spilt the stock to stave off delisting but you never know. Another financing needs to take place sometime by end of May, so it looks like a no-brainer to hold this for at least a couple more months.
Company: Convera Ticker: CNVR
Sub-sector: Content Management
Investment Thesis: Some may recall that I was short Convera the first half of last year on the theory that the management team would not deliver on their much hyped enterprise web search product. That turned out to be a bad short as the hype around search was just too big of a reality distortion field. Well, reality has begun to settle in and I am back for another beating.
Performance: Since 1/31/06: -22.0% Mar. vs. Feb.: 2.2%
Comments: After a disasterous first month in the portfolio, CNVR seems to have settled down a bit but it remains a very jumpy stock. There may be more PR-driven pain in the coming months but it's inevitable at some point that people will realize it's crazy to pay a 66% premium to Google and a 233% premium to an already inflated Autonomy on a price to sales basis for a company with 0.2% the revenues of Google.
Company: BankRate Ticker: RATE
Sub-sector: Internet Content
Investment Thesis: I spent a lot of time at one point in consulting to Fannie Mae and I spent a lot of time at one point analyzing financial services related internet companies. Bankrate is a web content site focused on financial services, but its growth is largely being driven by mortgage related advertising and referral fees. With interest rates rising, I don't think they will have trouble hitting their Q4 #s, but I can't imagine they aren't going to have to talk the analysts down a bit on off their pretty aggressive 06 growth #s.
Performance: Since 1/31/06: -14.6% Mar. vs. Feb.: -20.5%
Comments: Ug. After a decent first month this stock killed me this month, despite announcing a big secondary offering of shares. It would be funny to watch all the retail investors piling in while the insiders bail out except for the fact that this one stock killed my overall performance. I was obviously way too early on this and need to cover and wait for long term rates to really start to put the squeeze on. I can't see that happening this quarter and don't want to stick around for the report.
Internet Stocks Update: March 2006
Internet stocks slightly lagged the overall market in March with the Internet Stock Index up 2.4% compared to the NASDAQ's 2.6% gain. The average stock was actually up 7.5% though indicating continued strength in small cap stocks. The big winner this month was hosting provider Navisite which soared 114% on the backs of a good Q4 report and some balance sheet restructuring while the big loser was online payments provider FireOne (-16.3%).
There was only one Internet related IPO in March, Online gambling technology provider Playtec went public on AIM and now has a cool $1BN valuation making it the 5th public online gambling company with a $1BN+ market cap.
For a detailed breakdown of all the stock statistics including a record of all of the M&A in the space, click here to download an Excel spreadsheet with the data and click here to get Microsoft's automatic stock quote downloading plug-in for Excel if you don't already have it. The spreadsheet has been improved lately with detailed fundamental financial data and ratios for almost all of the stocks.
Software Stocks Update: March 2006
Software Stocks modestly outperformed the rest of the market in February with the Software Stock Index up 3.3% vs. the NASDAQ's 2.6% gain. The strength was largely due to strong performances from several large cap stocks, especially Oracle (+10.1%) and SAP(+6.9%). The average stock was up 4.5% thanks to even stronger small call performance. The best performing sector was Wireless (+12.3%) due to strength at Openwave and Inforspace while the worst performing sector was, one again, Clinical Apps (-4.6%).
There were, yet again, no software IPOs this month. In fact there were no Software IPO's in the 1st quarter. The two M&A transactions this month were both private buyouts with Silver Lake buying Serena Software and Golden Gate buying business intelligence player GEAC.
For a detailed breakdown of all the stock statistics including a record of all of the M&A in the space, click here to download an Excel spreadsheet with the data and click here to get Microsoft's automatic stock quote downloading plug-in for Excel if you don't already have it.