I have discussed the issue of identity, naming, and addresses for large-scale distributed systems in the past (e.g. "Loose-coupling through the relaxation of endpoint assumptions", "Comparing S-O and O-O as design principles and not as implementation technologies", "WS-Web (The Web using SOAP:-)", and more). It's a topic that keeps coming up. It was part of our WS-GAF arguments back in my Newcastle days. I think it's at the heart of every discussion/argument between service-oriented and resource/object-oriented advocates, even if it's not explicitly identified as such.
I recently read Norman Walsh's post on "Names and addresses" where he advocates the use of HTTP URIs as names. I totally agree with his statement that "URIs are names". Great! However, I disagree with the argument that HTTP URIs could serve as a general-purpose naming mechanism.
URIs as names
URIs could indeed be treated as opaque strings for naming purposes. However, URI schemes were invented so that structure and meaning could be attached to such opaque strings in an attempt to make them machine/human processable/interpretable. Such strings can now be used as names or identifiers (depending on the context) according to the needs of particular application domains. The string "123 456 7890" could indeed be used as a name. But in real life we usually associate some context with that number (in some cases we even need to say that it's a "number" rather than a series of unicode characters). Similarly, in computer systems we associate prefixes (if we use URIs) like "tel" (Telephone) or "ssn" (Social Security Number) and as a result the numbers take a completely different meaning. Of course one can try to dial my SSN or I could try to charge my book order to my telephone number. That would be an indication that the semantics of the context within which the number is used are not understood. The prefixes give such context to the number. It's still a name but one which can be used within particular application domains to convey meaning (provided that there is a specification to define the semantics of the prefix of course and that specification is understood).
Similarly, URIs with the 'http' prefix suggest the use of a particular application/protocol (depending on the way one uses it) and the interaction semantics associated with that protocol. There are other URI (or URN schemes... let's not forget that URNs are URIs too) which are equally useful. For example, Life Science Identifiers (LSIDs) (OMG LSR), Amazon's Standard Identification Number, and UUIDs serve different purposes. LSIDs in particular have embraced the use of DNS names as the mechanism of choice for the management of authorities in a distributed environment. However, they have associated retrieval semantics for both HTTP and SOAP with the 'lsid' prefix. I don't see anything wrong with that. The 'http' is not a panacea and it certainly does not mean an instant solution to all our distributed application needs.
Names vs identifiers vs addresses
The use of a URI as a name, identifier, or address is contextual. For example, while the US government may use "ssn:123-456-7890" to uniquely identify me within the country, this is by no means a universal identification scheme. In Greece, a different number is associated with me. Even in the US, my "alien" number (green card) is different. It's still an identifier as far as the immigration services are concerned but it's not the only one associated with me. Within a particular context those names (because they are all names) can be considered unique identifiers.
Since I move a lot, I have changed addresses quite a few times but some names/identifiers associated with me have remained the same despite for a period of time (e.g. SSN, my full name, some of my credit card numbers, some of my phones numbers, etc.). An address is another name. It cannot always be correlated with the same person or it may even be linked with more than one person at a time. Coupling an address with a person for all times would be difficult in our real world, it would have been a universal identifier. So why do we want to do it with 'http' URI schemes? (Having said that, such coupling may sometimes be convenient even though it is more expensive, hence the use of PO Boxes).
Application domains can define their own interpretation of the URI schemes they use. This way we are not running into the danger of enforcing a coupled naming/identification/addressing scheme which attempts to meet all possible requirements for all possible applications. What is wrong with "ssn:123-456-7890" sent to a Web Service or HTTP POX endpoint as part of a message's payload? Why does it have to be an 'http' name?
Is a world with multiple identification schemes good? I believe it is ok to define new schemes where it makes sense (e.g. a credit card, an ISBN, a US SSN, etc. once a standard is in place). Through the use of semantic technologies we should be able to automate the processing of the named/identified/addressed information represented out there, interpret it, correlate it across application domains, and reason about it.
15 responses to ““Names and addresses” – a different view”
Perhaps if we renamed “http” to “void” (as in a C “void *”), there’d be less confusion about why the http scheme is special.
Hmmm… RFC 2616, section 3.2.2 makes it clear that there is a special connection between the ‘http’ scheme and the HTTP protocol 🙂
I’m not sure that I signed the comment I just submitted, so here goes – Jacek. 😎
Oh, not submitted… Here goes again:
Savas, do you mean to say that every application that uses identifiers for demonstrably different purpose (think ssn vs. isbn) should register a new URI scheme? In this case many pieces of the internet would not be able to do anything useful with coke: apart from treating it as a prefix for unique identifiers (I’m not even considering the use of unregistered schemes…)
Similarly, a user has to go through serious hoops to get the meaning of ssn:123-456-7890 (find registration for ssn: scheme, find its description) whereas everybody already knows what could be done with http://ssn.gov/123-456-7890#person . What disadvantages do you see in using this (admittedly longer) form?
What Jacek said .. I much prefer to see http://ssn.gov.uk/blah than urn:ssn:blah, and Amazon ASINs belong on http://www.amazon.com. Just because a URI starts with http doesn’t mean your browser has to open a socket to the dns address, it’s just a handle to identify what you’re after: http://blog.whatfettle.com/archives/000244.html
I didn’t say HTTP wasn’t also special! 😎 The specialness of http is related to the specialness of HTTP too; HTTP’s uniform interface gives http it’s ability to identify anything (or more accurately, it prevents it from losing that ability in practice). But that doesn’t mean that one needs to use HTTP to make use of http URIs. Imagine some new, better-than-HTTP resource-oriented protocol which a client and server can negotiate to use; HTTP would only be used to enable that negotiation and to bootstrap the new protocol, but all the resource-manipulation stuff wouldn’t be HTTP.
Jacek, i am not advocating for all application domains to define their own schemes/IDs. They should reuse everything that makes sense and they should only create new ones when the existing schemes don’t meet their needs. An ‘ssn’ scheme has specific semantics which the ‘http’ cannot capture. Every application dealing with SSNs should adopt it. The same goes for LSIDs, etc.
All, the use of the http prefix does not give me any semantics on the interpretation of the information captured by the URI. If my automated system wants to understand that http://ssn.gov/123-456-7889#person is the SSN for a person, it needs to associate special semantics behind “ssn.gov” or “ssn.gov.uk”. How’s that different from reading and understanding the specification of ‘ssn’.
Also, I know very well that one could use an http://example.org/bla/bla URI with other protocols. You treat it as an opaque string or you only adopt the semantics of the part capturing the DNS authority (a selective application of the semantics of the URI scheme). Why carry the ‘http’ prefix if you don’t want to use HTTP?
“it needs to associate special semantics behind “ssn.gov” or “ssn.gov.uk””
No, the association is made via the data retrieved when GET’s invoked on that URI. The use of the http prefix just hooks you into a pervasively deployed infrastructure that is able to do that. If you mint a new scheme, you lose that, and it would take you years (if you can convince the world) to get to the point where the Web’s already at.
“Why carry the ‘http’ prefix if you don’t want to use HTTP?”
Because it allows you to bootstrap. If you just used MEST-TP (!) today, how many servers would respond? None, of course. If you used HTTP first, and requested an upgrade to MEST-TP, then that helps MEST-TP roll out incrementally because the servers that don’t support it could still use HTTP.
Savas, what semantics does ssn: give you that http://ssn.gov/ doesn’t? We seem to agree that an app would have to learn about the special meaning of ssn:*, so it could very similarly learn about special meaning of http://ssn.gov/*. In practice, usually schemes have specific handlers, would you like a specific handler to be invoked upon resolution of ssn: URIs? What would it do?
And the thing with http://ssn.gov/ and http://ssn.gov.uk/ etc. is that the governments wouldn’t have to agree on, let’s say, a UN-PersonID quite yet. They can, and the ID space can live in http://ssn.un.org/, but they don’t have to; however with ssn: there would be big pressure to standardize UN-PersonID right now (plus HTTP redirects installed after ssn.un.org gets live in 2012 would ease the transition).
And of course I value the bonus ability of a user right after the introduction of the SSN URIs to type http://ssn.gov/123-456-7890#person (upon encountering it somewhere, as URIs have a tendency to leak out of their applications) and getting useful information about the meaning of this ID.
Mark, I understand your point about the semantics being associated with the retrieved representation. But in order to do this, one needs to retrieve that representation via HTTP (or some other protocol). However, for certain types of information (e.g. credit card numbers) there can be a shared understanding without the need for bootstrap. That shared understanding is captured in the representation of the identifier/name itself (e.g. a possible ‘creditcard’) prefix that everyone understands. I can use such identifier with any protocols my application chooses to use (e.g. plain HTTP, SOAP/HTTP, SOAP/TCP, etc.).
Jacek,
I don’t want to carry the semantics of the ‘http’ prefix when i don’t need them. I can send a ‘ssn:123-456-7890’ using any protocol i want. I can store it in a database as a name and i’ll know its semantics for decades to come no matter what the preferred application/transfer/transport protocol of choice is going to be then. And if one wishes to use a distributed authority scheme for ‘ssn’, they could do so: “ssn:uk:123-456-7890′ suggesting that the UK government is responsbile for this SSN.
I do get the point about the goodness of being able to retrieve some represetnation behind the URI. There is no reason that ssn:uk:number couldn’t define a retrieval of the form http://gov.uk/ssn/number but also a SOAP message to tcp-ip:gov.uk with a request for the resource representation of ssn:uk:number.
I understand Savas, but the cost of deploying the shared understanding around a “creditcard” scheme is *extremely* high (for the sake of argument, I’ll say O($10 million)), since this shared understanding is actually new software that is required to be deployed *pervasively*… because you can’t anticipate who you’ll want to have a conversation about credit cards with.
As for the use of other protocols, you can always negotiate that using HTTP. But you’ve got to start somewhere, and HTTP is that “somewhere” because it’s already deployed pervasively, ferchristsakes! 😎
Are HTTP URIs addresses?
Or is it that because HTTP URIs are soo ubiqutous that we simply think of them as addresses?
It is still necessary to resolve HTTP URIs to addresses to be able to send messages. http://savas.parastatidis.name requires that savas.parastatidis.name be looked up in DNS, it also requires that the client knows the http URI scheme and that port 80 is the default port.
What is the context of a HTTP URI? All the people and software that recognises the HTTP URI scheme? Global names lead to global network effects.
The second item, after the LSID homepage, in a search on Google for LSID is a Web Resolver for LSID: “This web based LSID resolution service allows you to view the data and metadata of an LSID with your web browser.” To me this shows that being integrated into the Web is very important to the people who actually use LSIDs.
Mark,
The LSID folks have defined resolvers for both SOAP and HTTP-based services. I think this shows their relactance to base their identifiers on a protocol-specific scheme. And yes, HTTP is protocol specific since it’s defined as part of the HTTP specification. It is defined to convey specific semantics: “this endpoint understands the HTTP protocol”.