savas parastatidis

Relationships can have properties as well

2008-03-26

I was asked a very good question by the people who are going to be using our "research-output" platform. They want to be able to capture information like this: "Paper P was authored by Author A while A was a Microsoft employee". The use case is obvious. In Microsoft, like any other organization, people come and go. It is important to be able to capture whether a researcher's stored work was undertaken while they were employees of the company.

Our "research-output" platform does not support this type of information explicitly. This is because we are not building an identity system. Instead, we expect that information about people are stored somewhere else (e.g. Active Directory, LDAP, etc.). Applications built on top of us make use of our API to capture the necessary information so that they can relate the information in our store with that residing elsewhere (through the use of URIs for example). So how can we support the above use case? Well, there are few approaches. Please allow me to expand on my favorite one.

As I said in my Microsoft and "Research-Output" Repositories, our "research-output" platform stores relationships between resources. Also, I suggested that our model is extensible. Additional information can be attached to a relationship. For example, a triple of the form

    <Subject, Predicate, Object>

can have additional information associated with it in the form of name-value pairs (any number). This is our way of enabling developers of associating extra information for a relationship. In our initial release, the 'value' in the pair can only be a string. We are going to think how we can support typed values as well (not very easy).

    <Subject, Predicate, Object, [<name, value>]*>

The above scenario can now be represented through the following tuple:

    <Paper P, Authored By, Author A, <While at Microsoft, True>>

Another scenario is the ordering of authors. Imagine you have the following triples:

    <Paper P, Authored By, Author A> 
<Paper P, Authored By, Author B>

What is the order of the authors? We can add a name-value pair to indicate the relative ordering between Objects when the Subject and the Predicate are the same.

    <Paper P, Authored By, Author A, <order, 2>, <While at Microsoft, True>> 
<Paper P, Authored By, Author B, <order, 1>, <While at Microsoft, False>>

Actually, we thought that the ordering scenario was common enough that we made it a typed property of our "Relationship" class.

Comments feedback are always more than welcome.

6 responses to “Relationships can have properties as well”

Kent

2008-03-26

Rather than having additional information associated with a triple using a different convention (comma separated name/value pairs), why not just “reify” the original triple as a subject and record predicates and objects against that reified subject (such as {reified authored-by-triple}, Author’s institution, Microsoft} ? (BTW, Im sure “While at Microsoft” is just for explanatory purposes as the “While at ..” list could get really long in the real world!)
Savas Parastatidis

2008-03-26

Hi Ken,

You are absolutely right. This is one of the approaches. I thought of writing about it and then explaining why we can’t really support it but i decided to focus on the solution above.

Unfortunately, this is where we are paying the penalty of flexibility vs performance by adopting a hybrid “relational <-> triple” store model. Our relationships are not stored as resources.

There is a way to do this by creating resources to capture the relationships, since relationships are uniquely identified but it requires some more involvelment by the programmer so I preferred to showcase the above solution.

But you are absolutely correct. Your way is another possible way of doing it (less performant than the one i presented but more flexible).

And yes… “while at Microsoft” was for illustration purposes. I should have used something else 🙂

thanks,

.savas.
Tim Berners-Lee

2008-03-28

Just use Notation3. You don’t need to add lots more columns to your triple, once subgraphs become first class objects. Then they can be the subject (and/or object) or arbitrary statements.

<21401781-24ec-4b36-8dc7-b9fca72c2e3d.aspx>

dc:creator
Danny

2008-03-28

I’m not going to disagree with Tim (no sir!) but just to add that even without using N3 as-is, the general named graphs approach does make for neat solutions.

seeAlso: http://www.hpl.hp.com/techreports/2004/HPL-2004-57R1.html

(I’m pretty sure Carroll/Bizer have a few other papers around on named graphs)

I think the specific example you gave here might lend itself to expressing as an n-ary relation while staying within the triples model.

http://www.w3.org/TR/swbp-n-aryRelations/
Savas Parastatidis

2008-03-28

Hi Tim, Danny,

I presented the example the way I did for the benefit of those who aren’t very familiar with semantic computing technologies. We do not really create new columns internally. We do not have a triple store either since we try to optimize performance as much as possible through the relational model.

We can totally serialize the information using N3 or any other approach. Our triples get a unique identifier automatically so it would be possible to identity them as resources and make them part of a named graph.

At the moment, we want to make sure that we are efficient in managing the data for the specific domain we are addressing and then we’ll focus on how to serialize the information using interop solutions that exist out there. We are also looking at OAI-ORE.

Cheers,

.savas.
Robert Barta

2008-03-28

The RDF-folks will probably look for their

rotten tomatoes to throw at me ;-), but with

Topic Maps attaching semantic information to

an association is already a first class concept.

And there is also something called “scope” there,

which allows to say “while someone was employed

somewhere, this statement is true”. It can be

argued whether it covers the use case outlined

above.

Ad Kent.Fitch: “reification in RDF” is not a

scalable solution, because you would need both,

the original statement and the reified one with

the meta information attached.

Let the tomatoes fly.