“Names and addresses” – a different view

I have discussed the issue of identity, naming, and addresses for large-scale distributed systems in the past (e.g. “Loose-coupling through the relaxation of endpoint assumptions“, “Comparing S-O and O-O as design principles and not as implementation technologies“, “WS-Web (The Web using SOAP:-)“, and more). It’s a topic that keeps coming up. It was part of our WS-GAF arguments back in my Newcastle days. I think it’s at the heart of every discussion/argument between service-oriented and resource/object-oriented advocates, even if it’s not explicitly identified as such.

I recently read Norman Walsh‘s post on “Names and addresses” where he advocates the use of HTTP URIs as names. I totally agree with his statement that “URIs are names”. Great! However, I disagree with the argument that HTTP URIs could serve as a general-purpose naming mechanism.

URIs as names

URIs could indeed be treated as opaque strings for naming purposes. However, URI schemes were invented so that structure and meaning could be attached to such opaque strings in an attempt to make them machine/human processable/interpretable. Such strings can now be used as names or identifiers (depending on the context) according to the needs of particular application domains. The string “123 456 7890” could indeed be used as a name. But in real life we usually associate some context with that number (in some cases we even need to say that it’s a “number” rather than a series of unicode characters). Similarly, in computer systems we associate prefixes (if we use URIs) like “tel” (Telephone) or “ssn” (Social Security Number) and as a result the numbers take a completely different meaning. Of course one can try to dial my SSN or I could try to charge my book order to my telephone number. That would be an indication that the semantics of the context within which the number is used are not understood. The prefixes give such context to the number. It’s still a name but one which can be used within particular application domains to convey meaning (provided that there is a specification to define the semantics of the prefix of course and that specification is understood).

Similarly, URIs with the ‘http’ prefix suggest the use of a particular application/protocol (depending on the way one uses it) and the interaction semantics associated with that protocol. There are other URI (or URN schemes… let’s not forget that URNs are URIs too) which are equally useful. For example, Life Science Identifiers (LSIDs) (OMG LSR), Amazon’s Standard Identification Number, and UUIDs serve different purposes. LSIDs in particular have embraced the use of DNS names as the mechanism of choice for the management of authorities in a distributed environment. However, they have associated retrieval semantics for both HTTP and SOAP with the ‘lsid’ prefix. I don’t see anything wrong with that. The ‘http’ is not a panacea and it certainly does not mean an instant solution to all our distributed application needs.

Names vs identifiers vs addresses

The use of a URI as a name, identifier, or address is contextual. For example, while the US government may use “ssn:123-456-7890” to uniquely identify me within the country, this is by no means a universal identification scheme. In Greece, a different number is associated with me. Even in the US, my “alien” number (green card) is different. It’s still an identifier as far as the immigration services are concerned but it’s not the only one associated with me. Within a particular context those names (because they are all names) can be considered unique identifiers.

Since I move a lot, I have changed addresses quite a few times but some names/identifiers associated with me have remained the same despite for a period of time (e.g. SSN, my full name, some of my credit card numbers, some of my phones numbers, etc.). An address is another name. It cannot always be correlated with the same person or it may even be linked with more than one person at a time. Coupling an address with a person for all times would be difficult in our real world, it would have been a universal identifier. So why do we want to do it with ‘http’ URI schemes? (Having said that, such coupling may sometimes be convenient even though it is more expensive, hence the use of PO Boxes).

Application domains can define their own interpretation of the URI schemes they use. This way we are not running into the danger of enforcing a coupled naming/identification/addressing scheme which attempts to meet all possible requirements for all possible applications. What is wrong with “ssn:123-456-7890” sent to a Web Service or HTTP POX endpoint as part of a message’s payload? Why does it have to be an ‘http’ name?

Is a world with multiple identification schemes good? I believe it is ok to define new schemes where it makes sense (e.g. a credit card, an ISBN, a US SSN, etc. once a standard is in place). Through the use of semantic technologies we should be able to automate the processing of the named/identified/addressed information represented out there, interpret it, correlate it across application domains, and reason about it.