New York Semantic Web meetup

The LinkedData Planet 2008 conference is now over. I think it’s the first industry conference on the Semantic Web (or at least amongst the first few). I was invited by the good folks of the New York Semantic Web Meetup Group to participate in a panel discussion. Tim Berners-Lee was a surprise addition to the panel and was siting next to me.

You can always count on Savas to disagree in public with major figures in the community 🙂 It was an interesting discussion, which I very much enjoyed. I do hope the folks who stayed till 8.30pm to hear what we had to say enjoyed it too.

I think that the Semantic Web community has a huge hammer, which happens to be a good one (e.g. RDF, OWL), and everything looks like a nail. I am personally more interested in the “semantic” part of the “Semantic Web” space. I see the latter as an ecosystem of technologies to enable the representation and management of information/knowledge facts. They are just technologies. Through the work I am currently focusing with people like Evelyne Viegas, I am trying to showcase the value of semantic computing (internally to Microsoft and externally).

It’s fantastic that the W3C is working towards standards, which are a great way to achieve interoperability. However, the world won’t switch overnight. The Microsoft Research team, of which I am part, will do its absolutely best to support as many of those standards as possible in our works, within the limits of the resources we have available.

There is already a lot of information out there; some of it is already structured (e.g. microformats, data adhering to domain-specific vocabularies and captured in XML, etc.) but most of it is unstructured. We can make use of the structured information directly and apply machine learning, latent semantics, entity extraction, etc. techniques to create structure where none exists. We could then demonstrate the value of what is possible if all information was represented in a structured manner (e.g. information inferencing, reasoning, better machine learning, arbitrary information correlation, etc. over the Web), effectively realizing Tim‘s vision of the Giant Global Graph :-), or what I’ve been calling over time as “Web Overlays”, “Data Networking”, “Data Mesh”; they all refer to the same thing… a huge graph of knowledge facts that spans the Web, the graph of all computer-representable, interconnected information.

Over time, more and more information and knowledge will be structured, perhaps with the W3C standards being used as the pervasive formats. I think there is more value to be gained by trying to demonstrate the value of semantics-oriented computing rather than trying to persuade everyone out there to use RDF; that will come over time.

I think Tim misunderstood the example I tried to give during the panel discussion. I am a believer in that little semantics can take as long way. So I tried to make the argument that Google‘s PageRank algorithm makes use of the global graph of hypermedia links as part of its service. If you think about it, Google has created a huge graph based on facts expressed as triples (URI links-to URI) and applies various relevance algorithms on top of that. Does it matter that the graph is not represented in RDF? Nope! The important thing is that information is inferred from what’s out there and a great service is offered as a result. It’s the linking semantics captured in HTML that are used in Google‘s algorithms. And that’s just a simple example.

My position is that we should take advantage of all the information and representations we already have out there before trying to persuade everyone to encode their data in RDF. That will come over time, when the value of representing information in a structured manner becomes too obvious, when it’d be just silly to do otherwise. Also, I think that service, application, and tool vendors need to see the value in adopting common knowledge representations on the wire. For example, could Facebook export all its data as RDF? Of course they can… they have a graph and they can represent it using any technology they want. Why don’t the use RDF then? Well, there is no benefit (yet) to their business to do so. The data is not commodity; they have a competitive advantage in maintaining that data closed or provide access to it only through their platform, in the way that is easier to them. Over time, this will certainly change, especially given the emergence of data sharing platforms like Open Social.

In general, I have the cynic view that companies are interested in standards and interoperability only when it’s not anymore to their advantage to do otherwise*. I am lucky to be working with people who are supporters of open standards. As technologists, we need to be always demonstrating the potential business opportunities around open, structured data; I am a huge fun!

* This is my personal view of course and should not be taken as my company’s position on standards.