savas parastatidis

Merging data graphs – myExperiment & Resource-Output Repository Platform

2008-04-03

In a recent post, I briefly discussed our "research-output" repository platform and linked to a video of a demo visualization of the data graph stored in the repository. I had previously used the same WPF technology to visualize the myExperiment data graph. Since I've been talking about "data networks" a lot, I thought that it'd be a nice demonstration of the ideas if I was to merge the myExperiment and our repository's data graphs. Rather than take a copy of all the data and put it in one store, the idea is to merge the graphs dynamically. The myExperiment data is accessed through their Web API (thanks to Jits Bhagat and Don Cruickshank for implementing the Web API and their help). Our platform does not yet have a Web API yet, apart from RSS/Atom feed and OAI-PMH support, since we are concentrating on the performance of the core for our Milestone 1. A Web API will be available, though, most probably based on ADO.NET Data Services (Astoria) in Milestone 2.

There is nothing really new in the idea of joining graphs. Social networking sites and application built on top of the social graphs do this type of thing a lot. However, I believe that if we take advantage of infrastructures like those that Google, Yahoo, and Microsoft have built, we could do this type of data networking at Internet-scale. Combined with machine learning, entity extraction, collective intelligence, statistics, etc. technologies the future of data processing and analysis looks very interesting :-)

What you see in the screencast is a query against our research-output repository (i.e. "Goble" for Carole Goble). If an entry is found with the URI pointing to myExperiment (e.g. "http://www.myexperiment.org/...), then the data for that user is retrieved through their Web API (that's why you'll notice some delay in the queries). Whenever a node is expanded, a new query is issued (against our repository and against the myExperiment data store). This way, a person appears to have connections with other users (from myExperiment), with workflows (myExperiment), articles (our repository), publications (our repository), tags, lectures, etc.

You'll notice that for "lecture" resources, the video of the presentation is automatically played (ok... I had to use videos from the Web in my sample data rather than real presentations; however, in our internal tests we are planning to use MSR's HUGE database of lectures). BTW... the video is truly retrieved from the repository dynamically (I don't cheat by providing a direct link to the media file, or anything like that).

A Silverlight version of the same functionality will also be available when a newer beta of the ADO.NET Data Services is available. I will, however, continue to be experimenting with our current infrastructure and Silverlight until then.

Update: Corrected some typos and grammar (was wayyyy too tired when I wrote the original entry :-)