« Microsoft/Yahoo: A Bad Deal For Silicon Valley: Take II | Main | 4 Things to Do After You Get Your First Term Sheet »
02/13/2008
SkyGrid and the Emergence of Flow-Based Search
GigaOm had a post today on a company called SkyGrid and its official company launch. As an investor, advisor, and beta-user of the platform, I thought I would chime in with my own self-serving post mostly because I wanted to talk about the advanced technology and architecture behind SkyGrid and why it makes the company such an interesting case study in the evolution of search technology.
Simply put, SkyGrid represents a massive and exciting departure from traditional search architectures and technologies. If I had to sum it up in a word, I would say that SkyGrid represents what I consider to be one of the first "flow based" search architectures, while traditional search engines are "crawl based" architectures.
Old Search: Crawl/Index/Query
While the technical
departure was necessitated by the leading edge demands of investment
professionals, it was these needs, and the lack of traditional search's
ability to meet them, that exposed some of the most glaring weaknesses
of traditional search technology. Specially, traditional search
technology and architectures suffer from several glaring weaknesses:
- Crawl-based: Current search architectures collect information to index primarily by employing massive farms of "crawlers" that systematically crawl IP address spaces. The benefit of crawling is that it is exhaustive, the drawback is that it time consuming and expensive.
- One-off: Search platforms are designed around rapidly processing one off queries. This makes search engines highly useful and adept at finding "the needle in the haystack" but very cumbersome to use in situations where one just wants to get new results to the same old query.
- Batch-based: Page rank and the other "secret sauce"algorithms behind most search engines today require a very expensive and complicated indexing process to be performed on "snap shots" of data. It can be days or even weeks before newly published content is crawled and properly indexed, meaning that most search engines fail to provide "real time" results for all but the most popular content sources (which they crawl very frequently).
- Unabridged: Search engines are exhaustive in that they return every URL that mentions a string. This is good is you are looking for a needle in the haystack, but bad if you are trying to search on a common term such as "Google" or "Microsoft". While ranking algorithms do a great job of ordering results according to likely relevancy, they don't filter down the number of results. Since most users don't go past the first page of results, this makes it quite easy to miss relevant information that for some reason doesn't rank in the top 10 results.
- Unstructured: Search engines typically present query results as a simple list without context or analytics, beyond say separating them by a simple criteria, such as text and images. While some progress has been made in terms of trying to cluster results or help users filter them, by and large, users still just get an unprocessed, unanalyzed data dump when they do a search.
- Retrospective: Search today is focused on determining what has happened in the past. Who wrote what, who said what, etc. However this does little to help people figure out what will happen in the future.
Without giving away the farm, SkyGrid represents an exciting departure from the search technologies and architectures of the past. This change has been made possible by several factors including the widespread deployment and adoption of ping servers and RSS/ATOM feeds, dramatic improvement in several areas of artificial intelligence and unstructured data analytics, and new stream-based methods of database and query design.
SkyGrid Search: Flow/Filter/Analyze
When you put
all of these technologies together, along with a laser like focus on
solving some of the unique high-end demands of investment
professionals, you get a radical new search architecture and technology
that not only solves some very pressing and pragmatic problems facing
investors, but holds the potential to actually predict the pattern and
influence of idea/meme propagation throughout the internet and from
there into the financial markets and beyond.
Specifically, SkyGrid's search architecture differs from traditional search engines in that it is:
- Flow-based: SkyGrid treats the web as a giant pub-sub system or at least it does to the extent that the rapidly growing RSS/Ping server infrastructure does. It does not crawl the web, but rather the web "flows" to it.
- Persistent: SkyGrid persists queries over time so that incremental results are delivered with no additional action by the user. One can easily see how this would be valuable in the case of something like, oh say, a stock, which persists from day to day.
- Real-time: Rather than using batch-based indexing, SkyGrid uses a real-time stream-like query system that queries (and analyzes) new content as it flows into the system. This is particularly useful in situations, such as investing, where a few minutes or seconds, can make a huge economic difference.
- Filtered: Rather than presenting results as a data-dump, SkyGrid uses advanced analytics in the form of entity extraction, meta-data analytics, and rules based AI, to quickly analyze and append additional meta-data to incoming information. This enables users to easily filter data according to number of criteria which greatly lessens the chance of "data overload" and greatly improves the chance of "data discovery".
-
Analytical: By applying highly advanced artificial intelligence, such as natural language procession, entity extraction, etc. SkyGrid is able to actually analyze and assess the actual content of a URL, thus enabling it to make determinations such as the sentiment (positive/negative) of information, its "velocity" and its "authority". This goes a step beyond simple meta-data filtering to creating real insights into the content.
- Predictive: SkyGrid's flow based architecture and advanced analytics enable it to view the web as a living breathing, changing entity. By observing the propagation of information over time and across downstream nodes, SkyGrid is in a position to not only assess the "authority" and "influence" of individual nodes, but it should ultimately be able to make reasonable predictions about which information will flow where on the web. By correlating this observed "flow" over time with observed movements in things such as, oh say, stock markets, company sales, etc. it can not only assess the historical sensitivity of changes on the web creating changes in the real world, but it should ultimately be able to theoretically predict, with reasonable accuracy, many of those changes. Yes, I said it: SkyGrid and its new search architecture may ultimately predict the future.
I realize that the last point is at the very least hyperbolic and at worst disingenuous, but as an early beta-user I can tell you first hand that once you see it in action and understand the architecture, predicting the future, in some very specific, limited, yet potentially highly valuable ways, is certainly not something beyond the realm of reason and indeed something that seems quite possible given the progress to date. That said, SkyGrid is still a beta platform and many features have yet to be implemented in part or in full, but the promise and potential is undeniably there.
Google Roadkill?
Why won't SkyGrid simply be put of
business by the big players like so many other search oriented
start-ups? First and foremost because SkyGrid is delivering a premium
product to a group of users that will pay significant sums for
something that not only dramatically improves their daily productivity
but holds out the promise of providing insightful, market oriented
analytics that they simply can't get elsewhere. Second, the existing
search engines cannot compete effectively against SkyGrid because to do
so would require a reengineering of their basic search architectures to
address all of their shortcomings relative to SkyGrid. Moving from a
traditional crawl/index/query architecture to a flow/filter/analyze one
is a decidedly non-trivial undertaking, one that would require an
entire re-architecture of their core services and thus one highly
unlikely to be made.
Well then does that mean that SkyGrid will put the "legacy" search engines out of business? Not at all. The current search engines are optimized to deal incredibly well with the vast majority of queries from the vast majority of users and they will likely continue to do so for some time. Next generation flow-based platforms such as SkyGrid are, by design, tackling a subset of the available queries, but arguably a very valuable subset. Indeed that's why SkyGrid can charge $500/seat/month for its services while the existing search engines must give away their services for fee and make their money on advertising.
Now I can see a lot of people being skeptical after reading this about both my ability to impartially judge SkyGrid's next generation search technology as well as its market potential. To them I would say: just keep your eyes out for some announcements over the next month as I think they will conclusively demonstrate that a number of people far more knowledgeable and accomplished than I see the same potential.
February 13, 2008 in Content Managment, Internet, Wall Street | Permalink
Legal Disclaimer
The thoughts and opinions on this blog are mine and mine alone and not affiliated in any way with Inductive Capital LP, San Andreas Capital LLC, or any other company I am involved with. Nothing written in this blog should be considered investment, tax, legal,financial or any other kind of advice. These writings, misinformed as they may be, are just my personal opinions.
Comments