Web overlays: Some thoughts about machine-processable knowledge representation

(I have been pondering on whether I should post this for some time now. Let’s see what happens.)

A while back I read the “Google exec challenges Berners-Lee” article and I was trying hard to find any real challenge to Tim Burners-Lee‘s position. I am very much a believer of machine-processable knowledge representation (perhaps influenced by the books I’ve been reading lately). So what if people/companies put false information out there by accident or try to misdirect us on purpose? So what if knowledge is not captured as accurately as we might have liked? It’s not the fault of the technology or the thinking behind it. Deception, misinformation, wrong facts, etc. are all part of our daily lives (electronic or not), part of our society, an unfortunate reality of our times. In the same manner we learn how to educate ourselves by having a well-rounded opinion about current affairs, in the same way we learn how to be critical of the news/facts sources in the real world, we should equally try to arm ourselves with the best tools technology has to offer in the electronic world. We could use technological advances in software/architecture in order to automate the process of keeping the culprits cornered in our electronic world, to filter them out. I guess junk filters are already doing something very similar (the >300 messages per day automatically removed by my mail provider are testament to the way automation helps me). We should use the technology to our advantage rather than attack technological advances because of the possible misuses.

I related the above article with some of the thinking I have been doing over the last few months around the Web and knowledge representation (aka ‘Semantic Web’). I have hinted in past posts about the concept of ‘Web overlays’ that has been torturing my brain for a while. The following is a short summary of what the concept represents (if there is anything there at all, that is).

What are Web overlays?

Going into the future, I see hubs of information on the Web which will be collecting and process representations of knowledge from all around the world. The next search engine will not just do text-based search but will also do interpretation of information; it will give us semantically-rich results; it will be able to automatically reason about the information it collects, interpret it within different contexts, tailor its answers towards our specific, domain-specific needs. Actually, we are already seeing examples of collaborative or semantics-based search engines.

I believe that social networks, tagging, and various semantic annotation technologies are hints towards a “knowledge inference” future. Webs of information overlayed on top of the same data, interpreted in different contexts, knowledge or new information automatically inferred and delivered, an electronic world where people on the Web are equally producers of information as they are consumers (with the latter being mostly the case today).

The syndication paradigm (feeds and permalinks) in combination with Semantic Web concepts/ideas can be used to introduce new information layers on top of the traditional Web; these are called Web overlays’. The concept doesn’t really represent anything new, it’s nothing radical. It’s about using microformats to ontologically capture information, URIs to correlate instances of the captured information, and syndication technologies to consume the produced information. Web overlays are created through the combination of well-known tags or URIs for creating relationships between lists (directly or indirectly). URIs here do not necessarily mean HTTP (i.e. a simple pointer to some resource); instead, a URI represents a relationship, it is an ontology-backed reference to some representation, a resource, a person, some knowledge, a concept, etc. For example, “this document contains the list of books I am about to read”, with each book identified as an ISBN URI [1]. Or, “this is the list of emotions I went through while watching this movie”, with both the emotions and the movie identified as URIs. An Overlay Web is not just about following links as REST teaches us; it’s not about state transfer and it’s definitely not about a hypermedia state machine. It’s all about resource representations, correlations between concepts and captured information, an expanding network of facts/statements about all aspects of our lives, a distributed network for captured knowledge.

Future aggregators will be responsible for processing and trying to make sense of all the captured knowledge and help us manage it, reason about it, infer new facts based on it. For example, we should be able to mine information on the Web in order to answer questions like “what was the most popular book over the last month?”, “how do teenagers feel about this movie?”, “who else has photographs of this building?”, “what do people of this country feel about the potential introduction of a new law on subject X?”, etc. Some of these questions can already be answered to a certain extend using today’s technologies. I have a suspicion that social networking is going to be the motivating factor for the adoption of Semantic Web related ideas, like the Web overlays. The processing of the information people produce is going to be at the centre of the next evolution on the Web rather than the consumption of the information which is already out there.

I am thinking of writing a paper on this topic (as always, working together with Jim) and submit it for consideration to WWW07. This is an unconventional way of going about it. I am effectively announcing the intention to write a paper rather than start talking about it after it has been written and peer-reviewed 🙂 I am hoping to further describe the idea in various blog posts (if people think that it makes sense and it’s not something completely wacky) and also release specs and technology to support it. I am counting on the community’s participation and feedback. If people are interested in contributing, they should contact me. Let’s see if this is going to work. It could all be a disaster (i.e. there is no merit or anything new in the ‘Web overlays’ concept) or something interesting could come out of it (if nothing else, at least attract attention to the Semantic Web). No matter what, I am hoping that interesting discussions/food for thought will emerge.

–

[1] The URIs could indeed be HTTP ones as per the discussion on identifiers/links that took place on this blog few weeks back.

Savas Parastatidis

Savas Parastatidis works at Amazon as a Sr. Principal Engineer in Alexa AI'. Previously, he worked at Microsoft where he co-founded Cortana and led the effort as the team's architect. While at Microsoft, Savas also worked on distributed data storage and high-performance data processing technologies. He was involved in various e-Science projects while at Microsoft Research where he also investigated technologies related to knowledge representation & reasoning. Savas also worked on language understanding technologies at Facebook. Prior to joining Microsoft, Savas was a Principal Research Associate at Newcastle University where he undertook research in the areas of distributed, service-oriented computing and e-Science. He was also the Chief Software Architect at the North-East Regional e-Science Centre where he oversaw the architecture and the application of Web Services technologies for a number of large research projects. Savas worked as a Senior Software Engineer for Hewlett Packard where he co-lead the R&D effort for the industry's Web Service transactions service and protocol. You can find out more about Savas at https://savas.me/about