SkyGrid and the Emergence of Flow-Based Search
GigaOm had a post today on a company called SkyGrid and its official company launch. As an investor, advisor, and beta-user of the platform, I thought I would chime in with my own self-serving post mostly because I wanted to talk about the advanced technology and architecture behind SkyGrid and why it makes the company such an interesting case study in the evolution of search technology.
Simply put, SkyGrid represents a massive and exciting departure from traditional search architectures and technologies. If I had to sum it up in a word, I would say that SkyGrid represents what I consider to be one of the first "flow based" search architectures, while traditional search engines are "crawl based" architectures.
Old Search: Crawl/Index/Query
While the technical departure was necessitated by the leading edge demands of investment professionals, it was these needs, and the lack of traditional search's ability to meet them, that exposed some of the most glaring weaknesses of traditional search technology. Specially, traditional search technology and architectures suffer from several glaring weaknesses:
- Crawl-based: Current search architectures collect information to index primarily by employing massive farms of "crawlers" that systematically crawl IP address spaces. The benefit of crawling is that it is exhaustive, the drawback is that it time consuming and expensive.
- One-off: Search platforms are designed around rapidly processing one off queries. This makes search engines highly useful and adept at finding "the needle in the haystack" but very cumbersome to use in situations where one just wants to get new results to the same old query.
- Batch-based: Page rank and the other "secret sauce"algorithms behind most search engines today require a very expensive and complicated indexing process to be performed on "snap shots" of data. It can be days or even weeks before newly published content is crawled and properly indexed, meaning that most search engines fail to provide "real time" results for all but the most popular content sources (which they crawl very frequently).
- Unabridged: Search engines are exhaustive in that they return every URL that mentions a string. This is good is you are looking for a needle in the haystack, but bad if you are trying to search on a common term such as "Google" or "Microsoft". While ranking algorithms do a great job of ordering results according to likely relevancy, they don't filter down the number of results. Since most users don't go past the first page of results, this makes it quite easy to miss relevant information that for some reason doesn't rank in the top 10 results.
- Unstructured: Search engines typically present query results as a simple list without context or analytics, beyond say separating them by a simple criteria, such as text and images. While some progress has been made in terms of trying to cluster results or help users filter them, by and large, users still just get an unprocessed, unanalyzed data dump when they do a search.
- Retrospective: Search today is focused on determining what has happened in the past. Who wrote what, who said what, etc. However this does little to help people figure out what will happen in the future.
Without giving away the farm, SkyGrid represents an exciting departure from the search technologies and architectures of the past. This change has been made possible by several factors including the widespread deployment and adoption of ping servers and RSS/ATOM feeds, dramatic improvement in several areas of artificial intelligence and unstructured data analytics, and new stream-based methods of database and query design.
SkyGrid Search: Flow/Filter/Analyze
When you put all of these technologies together, along with a laser like focus on solving some of the unique high-end demands of investment professionals, you get a radical new search architecture and technology that not only solves some very pressing and pragmatic problems facing investors, but holds the potential to actually predict the pattern and influence of idea/meme propagation throughout the internet and from there into the financial markets and beyond.
Specifically, SkyGrid's search architecture differs from traditional search engines in that it is:
- Flow-based: SkyGrid treats the web as a giant pub-sub system or at least it does to the extent that the rapidly growing RSS/Ping server infrastructure does. It does not crawl the web, but rather the web "flows" to it.
- Persistent: SkyGrid persists queries over time so that incremental results are delivered with no additional action by the user. One can easily see how this would be valuable in the case of something like, oh say, a stock, which persists from day to day.
- Real-time: Rather than using batch-based indexing, SkyGrid uses a real-time stream-like query system that queries (and analyzes) new content as it flows into the system. This is particularly useful in situations, such as investing, where a few minutes or seconds, can make a huge economic difference.
- Filtered: Rather than presenting results as a data-dump, SkyGrid uses advanced analytics in the form of entity extraction, meta-data analytics, and rules based AI, to quickly analyze and append additional meta-data to incoming information. This enables users to easily filter data according to number of criteria which greatly lessens the chance of "data overload" and greatly improves the chance of "data discovery".
Analytical: By applying highly advanced artificial intelligence, such as natural language procession, entity extraction, etc. SkyGrid is able to actually analyze and assess the actual content of a URL, thus enabling it to make determinations such as the sentiment (positive/negative) of information, its "velocity" and its "authority". This goes a step beyond simple meta-data filtering to creating real insights into the content.
- Predictive: SkyGrid's flow based architecture and advanced analytics enable it to view the web as a living breathing, changing entity. By observing the propagation of information over time and across downstream nodes, SkyGrid is in a position to not only assess the "authority" and "influence" of individual nodes, but it should ultimately be able to make reasonable predictions about which information will flow where on the web. By correlating this observed "flow" over time with observed movements in things such as, oh say, stock markets, company sales, etc. it can not only assess the historical sensitivity of changes on the web creating changes in the real world, but it should ultimately be able to theoretically predict, with reasonable accuracy, many of those changes. Yes, I said it: SkyGrid and its new search architecture may ultimately predict the future.
I realize that the last point is at the very least hyperbolic and at worst disingenuous, but as an early beta-user I can tell you first hand that once you see it in action and understand the architecture, predicting the future, in some very specific, limited, yet potentially highly valuable ways, is certainly not something beyond the realm of reason and indeed something that seems quite possible given the progress to date. That said, SkyGrid is still a beta platform and many features have yet to be implemented in part or in full, but the promise and potential is undeniably there.
Why won't SkyGrid simply be put of business by the big players like so many other search oriented start-ups? First and foremost because SkyGrid is delivering a premium product to a group of users that will pay significant sums for something that not only dramatically improves their daily productivity but holds out the promise of providing insightful, market oriented analytics that they simply can't get elsewhere. Second, the existing search engines cannot compete effectively against SkyGrid because to do so would require a reengineering of their basic search architectures to address all of their shortcomings relative to SkyGrid. Moving from a traditional crawl/index/query architecture to a flow/filter/analyze one is a decidedly non-trivial undertaking, one that would require an entire re-architecture of their core services and thus one highly unlikely to be made.
Well then does that mean that SkyGrid will put the "legacy" search engines out of business? Not at all. The current search engines are optimized to deal incredibly well with the vast majority of queries from the vast majority of users and they will likely continue to do so for some time. Next generation flow-based platforms such as SkyGrid are, by design, tackling a subset of the available queries, but arguably a very valuable subset. Indeed that's why SkyGrid can charge $500/seat/month for its services while the existing search engines must give away their services for fee and make their money on advertising.
Now I can see a lot of people being skeptical after reading this about both my ability to impartially judge SkyGrid's next generation search technology as well as its market potential. To them I would say: just keep your eyes out for some announcements over the next month as I think they will conclusively demonstrate that a number of people far more knowledgeable and accomplished than I see the same potential.
Saving RSS: Why Meta-feeds will triumph over Tags
It’s pretty clear that RSS has now become the de facto standard for web content syndication. Just take a look at the numbers. The total number of RSS feeds tracked by Syndic8.com has grown from about 2,500 in the middle of 2001, to 50,000 at the beginning of 2004, to 286,000 as of the middle of this month. That’s total growth of over 11,300% in just the past 3.5 years!
Feed Overload Syndrome
However, as I wrote at the beginning of last year, the very growth of RSS threatens to sow the seeds of its own failure by creating such a wealth of data sources that it becomes increasingly difficult for users to sift through all the “noise” to find the information that they actually need.
Just ask any avid RSS user about how their use of RSS has evolved and they will likely tell you the same story: When they first discovered RSS it was great because it allowed them to subscribe to relevant information “feeds” from all of their favorite sites and have that information automatically aggregated into one place (usually an RSS reader like Sharpreader or NewsGator). However, as they began to add more and more feeds (typically newly discovered blogs), the number of posts they had to review started rising quickly, so much so that they often had hundreds, if not thousands of unread posts sitting in their readers. Worse yet, many of these posts ended up either being irrelevant (especially random postings on personal blogs) or duplicative. Suffering from a serious case of “feed overload”, many of these users ultimately had to cut back on the number of feeds that they subscribed to in order to reduce the amount of “noise” in their in-box and give them at least a fighting chance of skimming all of their unread posts each day.
The First Step: Recognizing That You Have Problem
Many in the RSS community recognize that “Feed Overload Syndrome” is indeed becoming a big problem and have begun initiatives to try and address it.
Perhaps the most obvious way to address the problem is to create keyword based searches that filter posts based on keywords. The results from such searches can themselves be syndicated as an RSS feed. This approach has several problems though. First, many sites only syndicate summaries of their posts, not the complete post thus making it difficult to index the entire post. Second, keyword-based searches become less and less effective the more data you index, as the average query starts to return more and more results. Third, keywords often have multiple contexts which in turn produce significant “noise” in the results. For example, a keyword search about “Chicago” would produce information about the city, the music group, and the movie among other things. That said many “feed aggregation sites” such as Technorati and Bloglines currently offer keyword based searching/feeds and for most folks these are better than nothing. However it’s pre-ordained that as the number of feeds increase, these keyword filtering techniques will prove less and less useful.
Tag, You’re Categorized
Realizing the shortcomings of keyword-based searching, many people are embracing the concept of “tagging”. Tagging is simply adding some basic metadata to an RSS post, usually just a simple keyword “tag”. For example, the RSS feed on this site effectively “tags” my posts by using the dc:subject property from the RSS standard. Using such keywords, feed aggregators (such as Technorati, PubSub and Del.icio.us ) can sort posts into different categories and subscribers can then subscribe to these categorized RSS feeds, instead of the “raw” feeds from the sites themselves. Alternatively, RSS readers can sort the posts into user-created folders based on tags (although mine doesn’t offer this feature yet).
Tagging is a step in the right direction, but it is ultimately a fundamentally flawed approach to the issue. The problem at the core of tagging is the same problem that has bedeviled almost all efforts at collective categorization: semantics. In order to assign a tag to a post, one must make some inherently subjective determinations including: 1) what’s the subject matter of the post and 2) what topics or keywords best represent that subject matter. In the information retrieval world, this process is known as categorization. The problem with tagging is that there is no assurance that two people will assign the same tag to the same content. This is especially true in the diverse “blogsphere” where one person’s “futbol” is undoubtedly another’s “football” or another’s “soccer”.
Beyond a fatal lack of consistency, tagging efforts also suffer from a lack of context. As any information retrieval specialist will tell you, categorized documents are most useful when they are placed into a semantically rich context. In the information retrieval world, such context is provided by formalized taxonomies. Even though the RSS standard provides for taxonomies, tagging as it is currently executed lacks any concept of taxonomies and thus lacks context.
Deprived of consistency and context, tagging threatens to become a colossal waste of time as it merely adds a layer of incoherent and inconsistent metadata on top of an already unmanageable number of feeds.
While tagging may be doomed to confusion, there are some other potential approaches that promise to bring order to RSS’s increasingly chaotic situation. The most promising approach involves something called a Meta-feed. Meta-feeds are RSS feeds comprised solely of metadata about other feeds. Combining meta-feeds with the original source feeds enables RSS readers to display consistently categorized posts within rich and logically consistent taxonomies. The process of creating a meta-data feed looks a lot like that needed to create a search index. First, crawlers must scour RSS feeds for new posts. Once they have located new posts, the posts are categorized and placed into a taxonomy using advanced statistical processes such as Bayesian analysis and natural language processing. This metadata is then appended to the URL of the original post and put into its own RSS meta-feed. In addition to the categorization data, the meta-feed can also contain taxonomy information, as well as information about such things as exact/near duplicates and related posts.
RSS readers can then request both the original raw feeds and the meta-feeds. They then use the meta-feed to appropriately and consistently categorize and relate each raw post.
For end users, meta-feeds will enable a wealth of features and innovations. Users will be able to easily find related documents and eliminate duplicates of the same information (such as two newspapers reprinting the same wire story). Users will also be able to create their own custom taxonomies and category names (as long they relate them back to the meta-feed). Users can even combine meta-feeds from two different feeds so long as one of the meta-feed publishers creates an RDF file that relates the two categories and taxonomies (to the extent practical). Of course the biggest benefit to users will be that information is consistently sorted and grouped into meaningful categories allowing them greatly reduce the amount of “noise” created by duplicate and non-relevant posts.
At a higher level, the existence of multiple meta-feeds, each with its own distinct taxonomy and categories, will in essence create multiple “views” of the web that are not predicated on any single person’s semantic orientation (as is the case with tagging). In this way it will be possible to view the web through unique editorial lenses that transcend individual sites and instead present the web for what it is: a rich and varied collective enterprise that can be wildly different depending on your perspective.
The Road Yet Traveled
Unfortunately, the road to this nirvana is long and as of yet, largely un-traveled. While it may be possible for services like Pubsub and Technorati to put together their own proprietary end-to-end implementations of meta-feeds, in order for such feeds to become truly accepted, standards will have to be developed that incorporate meta-feeds into readers and allow for interoperability between meta-feeds.
If RSS fails to address “Feed Overload Syndrome”, it will admittedly not be the end of the world. RSS will still provide a useful, albeit highly limited, “alert” service for new content at a limited number of sites. However for RSS to reach its potential of dramatically expanding the scope, scale, and richness of individuals’ (and computers’) interaction with the web, innovations such as meta-feeds are desperately needed in order to create a truly scaleable foundation.
RSS: A Big Success In Danger of Failure
There’s a lot of hoopla out these days about RSS. RSS is an XML-based standard for summarizing and ultimately syndicating web-site content. Adoption and usage of RSS has taken off in the past few years leading some to suggest that it will play a central role in transforming the web from a random collection of websites into a paradise of personalized data streams. However the seeds of RSS’s impending failure are being sown by its very success and only some serious improvements in the standard will save it from a pre-mature death.
The roots of RSS go all the way back to the infamous “push” revolution of late 1996/ early 1997. At that point in time, Pointcast captured the technology world’s imagination with a vision of the web in which relevant, personalized content would be “pushed” to end users freeing them from the drudgery of actually having to visit individual websites. The revolution reached its apex in February of 1997 when Wired Magazine published a “Push” cover story in which they dramatically declared the web dead and “push” the heir apparent. Soon technology heavyweights such as Microsoft were pushing their own “push” platforms and for a brief moment in time the “push revolution” actually looked like it might happen. Then, almost as quickly as took off, the push revolution imploded. There doesn’t appear to be one single cause of the implosion (outside of Wired’s endorsement), some say it was the inability to agree on standards while others finger clumsy and proprietary “push” software, but whatever the reasons “push” turned out to be a big yawn for most consumers. Like any other fad, they toyed with it for a few months and then moved on the big next thing. Push was dead.
Or was it? For while Push, as conceived of by PointCast, Marimba and Microsoft had died an ugly and (most would say richly deserved) public death, the early seeds of a much different kind of push, one embodied by RSS, had been planted in the minds of its eventual creators. From the outset, RSS was far different from the original “push” platforms. Instead of a complicated proprietary software platform designed to capture revenue from content providers, RSS was just a simple text-based standard. In fact, from a technical perspective RSS was actually much more “pull” than “push” (RSS clients must poll sites to get the latest content updates) but from the end-user’s perspective, the effect was basically the same. As an unfunded, collective effort RSS lacked huge marketing and development budgets, and so, outside of a few passionate advocates, it remained relatively unknown many years after its initial creation.
Recently though, RSS has emerged from its relative obscurity, thanks in large part to the growing popularity of RSS “readers” such as Feedemon, Newsgator, and Sharpreader. These readers allows users to subscribe to several RSS “feeds” at once, thereby consolidating information from around the web into one highly efficient, highly personalized, and easy-to-use interface. With it’s newfound popularity, proponents of RSS have begun hailing it as the foundation for creating a much more personalized and relevant web experience which will ultimately transform the web from an impenetrable clutter of passive websites, into a constant, personalized stream of highly relevant data that can reach a user no matter where they are or what device they are using.
Back to the Future?
Such rhetoric is reminiscent of the “push” craze, but this time it may have a bit more substance. The creators of RSS clearly learned a lot from push’s failures and they have incorporated a number of features which suggest that RSS will not suffer the same fate. Unlike “push”, RSS is web friendly. It uses the many of same protocols and standards the power the web today and uses them in the classic REST-based “request/response” architecture that underpins web. RSS is also an open standard that anyone is free to use in whatever way they see fit. This openness is directly responsible for the large crop of diverse RSS readers and the growing base of RSS friendly web sites and applications. Thus, by embracing the web instead of attempting to replace it, RSS has been able to leverage the web to help spur its own adoption.
One measure of RSS’s success is the number of RSS compliant, feeds or channels available on the web. At Syndicat8.com, a large aggregator of RSS feeds, the total number of feeds listed has grown over 2000% in just 2.5 years from about 2,500 in the middle of 2001 to almost 53,000 in February of 2004. The growth rate also appears to be accelerating as a record 7,326 feeds were added in January of 2004, which is 2X the previous monthly record.
A Victim of Its Own Success
The irony of RSS’s success though is that this same success may ultimately contribute to its failure. To understand why this might be the case, it helps to imagine the RSS community as a giant Cable TV operator. From this perspective, RSS has now has tens of thousands of channels and will probably hundreds of thousands of channels by the end of the year. While some of the channels are branded, most are little known blogs and websites. Now imagine that you want to tune into channels about, let’s say, Cricket. Sure there will probably be a few channels with 100% of their content dedicated to Cricket, but most of the Cricket information will inevitably be spread out in bits and pieces across the 100,000’s of channels. Thus, in order to get all of the Cricket information you will have to tune into hundreds, if not thousands, of channels and then try to filter out all the “noise” or irrelevant programs that have nothing to do with Cricket. That’s a lot of channel surfing!
The problem is only going to get worse. Each day as the number of RSS channels grows, the “noise” created by these different channels (especially by individual blogs which often have lots of small posts on widely disparate topics) also grows, making it more and more difficult for users to actually realize the “personalized” promise of RSS. After all, what’s the point of sifting through thousands of articles with your reader just to find the ten that interest you? You might as well just go back to visiting individual web sites.
Searching In Vain
What RSS desperately needs are enhancements that will allow users to take advantage of the breadth of RSS feeds without being buried in irrelevant information. One potential solution is to apply search technologies, such as key word filters, to incoming articles (such as pubsub.com is doing). This approach has two main problems: 1) The majority of RSS feeds include just short summaries, not the entire article, which means that 95% of the content can’t even be indexed. 2) While key-word filters can reduce the number of irrelevant articles, they will still become overwhelmed given a sufficiently large number of feeds. This “information overload” problem is not unique to RSS but one of the primary problems of the search industry where the dirty secret is that the quality of search results generally declines the more documents you have to search.
Classification and Taxonomies to the Rescue
While search technology may not solve the “information overload” problem, its closely related cousins, classification and taxonomies, may have just what it takes. Classification technology uses advanced statistical models to automatically assign categories to content. These categories can be stored as meta-data with the article. Taxonomy technology creates detailed tree structures that establish the hierarchical relationships between different categories. A venerable example of these two technologies working together is Yahoo!’s Website Directory. Here Yahoo has created a taxonomy, or hierarchical list of categories, of Internet sites. Yahoo has then used classification technology to assign each web site one or more categories within the taxonomy. With the help of these two technologies, a user can sort through millions of internet sites to find just those websites that deal with say, Cricket, in just a couple of clicks.
It’s easy to see how RSS could benefit from the same technology. Assigning articles to categories and associating them with taxonomies will allow users to subscribe to “Meta-feeds” that are based on categories of interest, not specific sites. With such a system in place, users will be able to have their cake and eat it to as they will effectively be subscribing to all RSS channels at once, but due to the use of categories they will only see those pieces of information that are personally relevant. Bye-bye noise!
In fact, the authors of the RSS anticipated the importance of categories and taxonomies early on and the standard actually supports including both category and taxonomy information within an RSS message, so the good news is that RSS is already “category and taxonomy ready”.
What Do You Really Mean?
But there’s a catch. Even though RSS supports the inclusion of categories and taxonomies, there’s no standard for how to determine what category an article should be in or which taxonomy to use. Thus there’s no guarantee that that two sites with very similar articles will categorize them the same way or use the same taxonomy. This raises the very real prospect that, for example, the “Football” category will contain a jumbled group of articles including articles on both the New England Patriots and Manchester United. Such as situation leads us back to an environment filled with “noise” and thus no better off when we started.
The theoretical solution to this problem is get everyone in a room and agree on a common way to establish categories and on a universal taxonomy. Unfortunately, despite the best efforts of academics around the world, this has so far proven impossible. Another idea might be to try and figure out a way to map relationships between different concepts and taxonomies and then provide some kind secret decoder ring that enables computers to infer how everything is interrelated. This is basically what the Semantic Web movement is trying to do. This sounds great, but it will likely be a long time before the Semantic Web is perfected and everyone will easily lose patience with RSS before then. (There is actually a big debate within the RSS community over how Semantic-web centric RSS should be.)
Meta-Directories And Meta-Feeds
The practical solution will likely be to create a series of meta-directories that collect RSS feeds and then apply their own classification tools and taxonomies to those feeds. These intermediaries would then either publish new “meta-feeds” based on particular categories or they would return the category and taxonomy meta-data to the original publisher which would then incorporate the metadata into their own feeds.
There actually is strong precedent for such intermediaries. In the publishing world, major information services like Reuters and Thompson have divisions that aggregate information from disparate sources, classify the information and then resell those classified news feeds. There are also traditional syndicators, such as United Media, who collect content and then redistribute it to other publications. In addition to these establish intermediaries, some RSS-focused start-ups such as Syndic8 and pubsub.com also looked poised to fulfill these roles should they choose to do so.
Even if these meta-directories are created, it’s not clear that the RSS community will embrace them as they introduce a centralized intermediary into an otherwise highly decentralized and simplistic system. However, it is clear that without the use of meta-directories and their standardized classifications and taxonomies the RSS community is in danger of collapsing under the weight of its own success and becoming the “push” of 2004. Let’s hope they learned from the mistakes of their forefathers.
EMC + Documentum = War for Control of Unstructured Data
One of the most interesting recent acquisitions in the software space was EMC’s purchase of Documentum. Not because it was a particularly large acquisition in terms of dollar size or premium paid but because of the strategic implications it has for much of the software industry, especially for companies in the content management, storage software, and database markets.
Documentum isn’t the only acquisition EMC has made recently. It also acquired storage software maker Legato Systems and virtualization leader VmWare. However, both of these acquisitions can be seen as incremental expansions of EMC’s existing focus on storage and storage management (and probably a competitive response to some of the moves storage players like Veritas have been making) whereas the Documentum acquisition represents a major leap “up the stack” right into the midst of the classic enterprise software space.
By boldly jumping into the enterprise software space, EMC appears to be, as a fighter pilot might say, “going vertical”. They are making a bet that customers will want to buy the “whole enchilada” from one vendor including not just platters and storage management software, but high-level content management and work flow software as well. Indeed, one can reasonably expect that the logical extension of this strategy will be a series of vertical solutions targeted at specific applications such as claims processing, image management, content publishing, e-mail management, etc.
By providing the entire solution (no doubt delivered by its services division), EMC should theoretically be able to improve margins by focusing its customers on the value of the entire bundled solution as opposed to simply the cost/gigabyte of its storage products.
Even more important than this solutions focus though, is that fact that EMC is trying to stake a claim to the entire unstructured data management space. EMC’s drive to do this has no doubt been influenced by their customers who are increasingly buying additional storage not to supplement existing databases or information warehouses, but to store and manage unstructured data such as e-mails, PowerPoint presentations, and web pages.
That EMC can stake a claim to the unstructured data management space without alienating some of its biggest ISV partners (the database and warehouse vendors) has much to do with the fact that traditional RDBMS vendors have been surprisingly reluctant to make major commitments to the unstructured data management space. Many of these players, led by Oracle, continue to hold on to the outdated belief that all of an enterprise’s information will be managed by RDBMSs and therefore they have made few attempts to expand into unstructured data management. This abdication has in turn opened the door for EMC to make a move without causing massive near term channel problems.
That’s not to say that EMC’s move into the unstructured data management space won’t ruffle more than a few feathers. Those that will feel the brunt of this entry are the remaining content management players such as FileNet, Interwoven, Stellent and Vignette. These firms must now contend with a very large, aggressive competitor selling hardware/software bundles. In addition, EMC’s traditional storage software vendors must now consider whether or not they will respond by making their own forays into the unstructured data management space. Veritas in particular will now have some difficult decisions to make. Finally, by letting such a large, aggressive company as EMC into their back yard, the traditional RDBMS players are going to have to make a decision as to whether or not they continue to hold on to their anti-file system views or they respond by delivering their own unstructured data management solutions.
Caught in the crossfire between all of these behemoths will be the existing unstructured data players in content management, search, categorization, taxonomy, and work-flow. These relatively small players (many of them still start-ups) will have to decide if it is better to sell out to one of the big players moving into their space or to solider on and attempt to carve out a defensible niche. In this sense, EMC’s acquisition of Documentum represents just the first shot by a major player in what is likely to be a long conflict for control of the unstructured data management space.