Yes, it’s all about data but…

There has been a lot of commentary lately about “Web 2.0”, the future of the PC (thin vs fat clients), software as a service, etc. Many of our technology thought leaders rave about the ease by which new ideas are developed on top of Google‘s, MSN‘s, Yahoo!‘s and other similar Web-based services, the rapid adoption of RSS and Atom as the means for content distribution, the focus on XML-based technologies and open formats as the basis for a services-for-all model for distributed applications. They keep telling us that it’s all about data now, all about knowledge representation and information models, all about service integration. We’ve had fun with the Web only as the means for accessing information; it’s now time for the Web to morph into this huge data and service integration platform on top of which even more new services are built and delivered.

I have no reason to doubt that the future will look exactly as the visionaries out there have been describing it. In fact, I agree with all of the above.

However, I feel the need to ask: in this new, data-is-the-emphasis world who is control of that data? We are already seeing the software-as-a-service world forming and it seems, at least to me, that the service-providers are more worried with attracting as many users as possible by offering more and more space for their data (mail, photos, videos, blogs) and by building better management interfaces, better integration with desktop applications, etc. rather than building the future of the Web.

While free services are great for consumers, one has to wonder how long it will last. What will happen next? What is the alternative motive behind such great offerings? It has to be financial gain of course. In most cases, it is advertising that makes it all worthwhile for those behind the services. But then, what happens if the advertising industry moves onto other media or if the cost of maintaining a service cannot be covered anymore by the ad-related income? What happens when a free service decides to move to a subscription model? Look what happened to flickr after it was bought by Yahoo!. Even the requirement to use Yahoo! IDs has angered flickr’s current users. If Yahoo! or any other service decides to enforce new policies, introduce payment, or even discontinue their offerings, there is no one to stop them because they are in control.

“Move to another service” I hear you say. Indeed, in an open market with healthy competition we should have alternatives. But is it that easy? What happens to our data? Yes, I can move all my photographs to another service. I can probably do the same with all my email and perhaps my blog entries. But is it really that easy?

Once we publish our data and make it available for others to access, we are making it part of this global network of knowledge and information that is being built around us. Once our data is out there, it can be linked and used as part of other data or even referred to in other media (e.g. references in research papers). I think that with the ability to publish data comes great responsibility. Once data is shared and consumed, it should always be available.

Permanent HTTP links (permalinks) are supposed to be the current solution to this problem. There is a 1-1 association between the published data and a link. However, the data published through services is usually associated with a link whose structure is controlled by those service (e.g. a company-controlled domain name). But even if the service allows us to use our own domain names, it is still very difficult to move to a new service because there are no standard formats for sharing this published information. I can’t just say to my photos-hosting service, “please give me back all my data in a format that I can take to your competitor and host it there”.

This is the reason I continue to host my own blog under a domain name that I control. Once MSN Spaces or Blogger.com or any other service allows me to use ITS service to publish MY data under my own terms, then I will think about moving. I am sure there are services out there that offer this kind of functionality but I am really talking about the big guys and the practices that are in place. I am willing to pay for a data hosting service if I am in full control of my data and its identity.

Of course, another solution is to move away from the address is the identity’ practice we seem to have adopted because of the Web. We could start using PURLs or even better URNs to identify our data. Our data identities wouldn’t be coupled to a particular transport/application protocol as it is mostly the case now (i.e. HTTP). Of course, resolution services will have to be deployed throughout the Internet for the solution to work. But wait, this is supposed to be a service-oriented world, right? 🙂

What do you think?