By Category By Month Recent Posts
Consumer Internet IPOs Consumer Internet M&A Enterprise Internet IPOs Enterprise Internet M&A

« Software Stock Spreadsheet | Main | EMC + Documentum = War for Control of Unstructured Data »


The Data Centric Web

A revolution is quietly brewing on the Internet. It’s a revolution that will ultimately expand the web into something far more useful and productive than it is today and one that will likely undermine much of the conventional wisdom regarding the evolution of web services. At the heart of this revolution is the Data-Centric Web.

Since it’s creation in the early 1990’s, the web has been a document-centric system and all of its core technologies have been designed to support that vision. HTTP’s simple GET, PUT, POST API’s were developed to facilitate the sharing of documents across a network. HTML was created to format those documents and URLs (aka URIs) were created to easily identify and access those documents and DNS was created to match URL’s to specific file servers on the network. While these technologies have evolved dramatically over the years, they remain fundamentally document-centric.

The Data-Centric Web is different. While the document-centric web revolves and documents and assumes that humans are the main consumers of information, the Data-Centric Web revolves around individual data elements and assumes that computers are its main consumers.

This shift in focus from documents to data and from humans to computers is simple and yet profound. Just imagine a world in which every piece of data is immediately and automatically accessible from any computer via the web using a simple, universal set of protocols and formats. Indeed, such a vision has long represented the “holy grail” of Enterprise Application Integration (EAI) and yet attempts to realize this vision have been woefully inadequate to date.

However it appears increasingly likely that the Data-Centric Web will become a reality thanks to the introduction of a few new technologies as well as the “hijacking” of some existing ones. The most important new technology for the Data Centric Web is eXtensible Markup Language or XML. XML is an offshoot of HTML, but unlike HTML which is designed to format and present text, XML is designed to identify and structure data. Using XML, programmers can identify individual data elements and put those elements into context within a larger taxonomy (e.g. This data element is the company’s zip code and it is part of their shipping address.), thus turning plain text into data.

But turning text into data is only half the battle. Once text is turned into data, there still needs to be a way to find and share this data. That’s where the Data-Centric Web starts hijacking. Specially, the Data-Centric Web hijacks HTTP,URL’s and DNS. These existing technologies have been stalwarts of the document-centric web, seamlessly interconnecting billions of pages of text across the internet and the Data-Centric Web simply hijacks them and uses them to interconnect data instead of text.

For example, if you type into your web browser you will, via the magic of HTTP, URL’s, and DNS (among other things), be taken to a web page on Yahoo Finance which gives you IBM’s current stock quote as well as a bunch of other related information. While to a human being this stock quote page looks great, to a computer it looks like a random jumble of text of images. However, suppose there was a URL like this and that instead of returning a large page filled with information a GET request simply returned “99” in XML. Now imagine that every data element in the world is formatted in XML and has it’s own URL. The web has just become a giant library of data, not just documents.

Once the web becomes a library of data elements, easily and universally accessible via URLs, the very nature of data exchange and integration will be transformed. Need to integrate the current temperature in New York City into that Java program you are writing, just paste in the URL from Need to get a stock price into Excel? Just type in a URL from Yahoo! Finance. A website wants your shipping address? Just type in a URL (and when you change your shipping address, all the businesses that have your “shipping” URL will automatically get that change). Need to give a business partner access to your inventory levels? Just e-mail them a URL and let them figure out how they want to integrate it.

This last example highlights a crucial advantage of the Data-Centric Web. Because the Data-Centric Web relies on the web to provide common API, transport, and naming, the complexity of bilateral data integration efforts declines dramatically. Businesses can simply define their data once and then let their partners pick and choose how they want to access it.

But wait you might say, isn’t this what web services are all about? Aren’t RPC-based web services supposed to transform the web from a document-centric model to a service centric model and since you can usually access data via a service, aren’t web services essentially the same thing as the Data-Centric Web?

Yes, RPC-based web services can accomplish the same end goal of accessing data over the Internet, but web services lack several aspects of the Data-Centric Web. 1) Web services do not make use of a universal API, such as HTTP, rather they let users define their own API and, if they want to, describe that API in a standardized way (via WSDL). The power of the Data-Centric web is that it uses a universal API (HTTP) and a standardized naming convention (URLs). I know its can of strange to say that web services standards aren’t standardized but they aren’t! 2) Web services are not integrated into the infrastructure of the web. Over the past 10 years, the document-centric web has integrated itself not only into the Internet but into applications and databases around the world. Thousands of interfaces have been written to the web and most of today’s most popular applications are, if not web-centric, fully web aware. For example, Microsoft Office has a powerful set of components that enable users to access the web from within its applications. By being fully “web-compliant” the Data Centric Web can take advantage of this massive pre-existing infrastructure as well as all of the skills and knowledge of those that maintain and contribute to it. True, Web Services are rapidly being adopted and integrated into the infrastructure but it will be many years before they approach the level of integration that the web itself has already achieved. 3) Web Services are much less efficient than the Data-Centric Web. Web Services are great for complex queries and business operations which lots of parameters, but they are overkill for simple data access operations. What takes thousands of lines of code and lots of cycles to accomplish with Web Services can be accomplished with one URL and a few cycles in the Data-Centric Web. In fact, the web-unfriendly nature of RPC-based Web Services has set off a debate within the Web Services community and there are many efforts underway to make Web Services more web-friendly, mostly by using the concepts embodied within the Data-Centric Web. (This debate is often referred to as the REST vs. RPC debate, something I hope to write more about in the future.)

So how close are we to realizing the promises of the data-centric web? Closer than you might think. Given that the Data Centric Web hijacks much of the infrastructure built for the document-centric web, the key components are already in place. Indeed, many programmers have been employing the principals of the Data Centric Web in their own applications for sometime. What is needed for wide adoption of the Data-Centric web are just a few “bridges” from the data world to the web world. These bridges would enable databases and applications to easily publish data elements to web servers, which in turn will assign those elements URLs and thus make them accessible to the entire web. Some of these bridges will probably be built by the open source community, others by enterprising young start-ups. (Castbridge is a particularly interesting start-up focused on this area).

No matter who builds them, these bridges will unleash a new spurt of innovation and productivity on the web. Data elements will be integrated into applications and documents around the world with the ease and simplicity of typing in a URL and the whole system will work with the robustness and availability that we have come to expect from the web. I may be a geek, but I can’t wait!

January 27, 2004 in EAI, Middleware | Permalink


Legal Disclaimer

The thoughts and opinions on this blog are mine and mine alone and not affiliated in any way with Inductive Capital LP, San Andreas Capital LLC, or any other company I am involved with. Nothing written in this blog should be considered investment, tax, legal,financial or any other kind of advice. These writings, misinformed as they may be, are just my personal opinions.