New York Semantic Web meetup

The LinkedData Planet 2008 conference is now over. I think it’s the first industry conference on the Semantic Web (or at least amongst the first few). I was invited by the good folks of the New York Semantic Web Meetup Group to participate in a panel discussion. Tim Berners-Lee was a surprise addition to the panel and was siting next to me.

You can always count on Savas to disagree in public with major figures in the community 🙂 It was an interesting discussion, which I very much enjoyed. I do hope the folks who stayed till 8.30pm to hear what we had to say enjoyed it too.

I think that the Semantic Web community has a huge hammer, which happens to be a good one (e.g. RDF, OWL), and everything looks like a nail. I am personally more interested in the “semantic” part of the “Semantic Web” space. I see the latter as an ecosystem of technologies to enable the representation and management of information/knowledge facts. They are just technologies. Through the work I am currently focusing with people like Evelyne Viegas, I am trying to showcase the value of semantic computing (internally to Microsoft and externally).

It’s fantastic that the W3C is working towards standards, which are a great way to achieve interoperability. However, the world won’t switch overnight. The Microsoft Research team, of which I am part, will do its absolutely best to support as many of those standards as possible in our works, within the limits of the resources we have available.

There is already a lot of information out there; some of it is already structured (e.g. microformats, data adhering to domain-specific vocabularies and captured in XML, etc.) but most of it is unstructured. We can make use of the structured information directly and apply machine learning, latent semantics, entity extraction, etc. techniques to create structure where none exists. We could then demonstrate the value of what is possible if all information was represented in a structured manner (e.g. information inferencing, reasoning, better machine learning, arbitrary information correlation, etc. over the Web), effectively realizing Tim‘s vision of the Giant Global Graph :-), or what I’ve been calling over time as “Web Overlays”, “Data Networking”, “Data Mesh”; they all refer to the same thing… a huge graph of knowledge facts that spans the Web, the graph of all computer-representable, interconnected information.

Over time, more and more information and knowledge will be structured, perhaps with the W3C standards being used as the pervasive formats. I think there is more value to be gained by trying to demonstrate the value of semantics-oriented computing rather than trying to persuade everyone out there to use RDF; that will come over time.

I think Tim misunderstood the example I tried to give during the panel discussion. I am a believer in that little semantics can take as long way. So I tried to make the argument that Google‘s PageRank algorithm makes use of the global graph of hypermedia links as part of its service. If you think about it, Google has created a huge graph based on facts expressed as triples (URI links-to URI) and applies various relevance algorithms on top of that. Does it matter that the graph is not represented in RDF? Nope! The important thing is that information is inferred from what’s out there and a great service is offered as a result. It’s the linking semantics captured in HTML that are used in Google‘s algorithms. And that’s just a simple example.

My position is that we should take advantage of all the information and representations we already have out there before trying to persuade everyone to encode their data in RDF. That will come over time, when the value of representing information in a structured manner becomes too obvious, when it’d be just silly to do otherwise. Also, I think that service, application, and tool vendors need to see the value in adopting common knowledge representations on the wire. For example, could Facebook export all its data as RDF? Of course they can… they have a graph and they can represent it using any technology they want. Why don’t the use RDF then? Well, there is no benefit (yet) to their business to do so. The data is not commodity; they have a competitive advantage in maintaining that data closed or provide access to it only through their platform, in the way that is easier to them. Over time, this will certainly change, especially given the emergence of data sharing platforms like Open Social.

In general, I have the cynic view that companies are interested in standards and interoperability only when it’s not anymore to their advantage to do otherwise*. I am lucky to be working with people who are supporters of open standards. As technologists, we need to be always demonstrating the potential business opportunities around open, structured data; I am a huge fun!

 

* This is my personal view of course and should not be taken as my company’s position on standards.

5 responses to “New York Semantic Web meetup”

  1. You are indeed a huge fun! 😀

    I think you’re only partially right about the standards thing though – I have an even more pessimistic view of standards adoption than you do – I don’t think RDF will come over time unless there’s a compelling reason for the end-user to want their data exposed in that way.

    And as most people don’t care, there’s very little reason for companies to expose their data for free so anyone can compete (witness the “opening” of Word, Excel, etc document formats in your own venerable institution..)

    Cheers,

    Einar

    p.s. How was Ithaca?

  2. Savas,

    Have you looked into the simple correlation of the “Linked Data Web” and Data Source Naming ala ODBC, but with the additional benefits of:

    1. Naming records in addition to Tables, Views, and Stored Procedures

    2. Use of HTTP in naming scheme so that named record access expands to the Web

    I think points 1&2 are things you and Microsoft already understand. What is sometimes missing is the fact that we are all talking about the same thing in different ways.

    The Web just needs to add the ability to see the “Data Sources” behind “Web Pages” , and the best way to deliver this is via the Web’s universal client: the Web Browser. Once Web Browsers add the “View Data Behind” feature alongside “View Page Source” we are there!

    I suspect our paths never crossed at Linked Data Planet, as I demonstrated a Firefox extension that exposes the Linked Data Webs behind existing Web pages.

  3. Hi Kingsley, I am sorry I missed you. This sounds interesting.

    Hey Einar… Ithaca was fun but it rained a lot 🙁

  4. Savas,

    I’m sorry we didn’t have more time to talk during LinkedData Planet.

    BTW, if you’ve extended Firefox with Semantic Radar and Piggy Bank, you can see the RDF behind the LinkedDataSummit.com web pages that discuss the LinkedData Planet program:

    http://www.linkeddatasummit.com/events/LDP2008/LinkedDataPlanet_NY.htm

    and the pages on the menu on the left.

  5. Hi Savas, regrettably I couldn’t make L.D.P. (although a couple of colleagues – Paul Miller and Ian Davis were there).

    I’m very pleased to hear of your interest in the Semantic Web – in the past MS (Research) have come *so close* to this approach with WinFS (and its descendants) and especially Project Astoria (Pablo Castro & co).

    As I believe you’re uk-based, perhaps it would be possible for you to visit our hq in Birmingham for chat on the subject. (In this context right now we’re very much developers/enthusiasts/researchers, certainly not salesmen, so nothing to worry about there 🙂

    http://www.talis.com/platform/

    Anyhow, feel free to ping me – I’d love to here more on your thoughts around this tech: danny.ayers@gmail.com