URI Issues
This version:http://dfdf.inesc-id.pt//tr/doc/uri-issues/20081016
Editor: Xiaoshu
Wang (xiao
kdbio.inesc-id.pt)
Background
Identifier has only one issue. That is: if the referential realm of its symbol space is sufficient to cover the desired modelling need. In this sense, URI is sufficient because it has unlimited space for describing everything in the world.
But the current URI specification is yet syntactically complete because URI's referential realm does not allow us to easily (1) make the distinction between a URI - the Symbol and a URI - the Referent (2) denote a document, which requires two URIs: one of the Referent (Resource) and the other of the type of Information (document). Hence, we are hear making a proposal to remedy it with the two short-hand notations - "?" and "??" suggested in The ARK identifier scheme.
As a side note: I do not agree with ARK completely. ARK raised a very good point: persistency is a service but an identifier issue. But the concept of metadata used in ARK is ambiguously defined. Metadata is commonly known as the data of data. But to know metadata presupposes some knowledge about the data. In the Web, we do not know the latter. Hence, it is useless to talk about the metadata (or description) of a resource. From the definition of the Web, the only knowledge we have about the Web is that (1) a resource has a URI and (2) possibly many kinds of documents that may be associated with it. Hence, it is only sensible to define metadata along these two lines. This is what we are doing here.
First Proposal
- The use of ?
- A URI ended with one question mark "?" denotes the URI as a resource.
- The use of ??
- A URI ended with two question mark "??" denote the availabe document types that the resource might serve.
For example,
- "http://dfdf.inesc-id.pt/tr/web-issues" denote a resource – the one that you just acquire some information about.
- "http://dfdf.inesc-id.pt/tr/web-issues?" denotes the URI as a resource. The "?", therefore, is a syntactic sugar that allows us to describe the composition of a URI without rasing ambiguous identity issues with its referent.
- "http://dfdf.inesc-id.pt/tr/web-issues??" denotes the possible formats associated with the resource "http://dfdf.inesc-id.pt/tr/web-issues". This allows us to move the content of transparent content negotiation into the Semantic Web as opposed to the physical web.
- "http://dfdf.inesc-id.pt/tr/web-issues??Document-type" would denote a particular (class) of documents retrieved from the referenced resource.
- By extension, "http://dfdf.inesc-id.pt/tr/web-issues???" denotes all document-types associated with ""http://dfdf.inesc-id.pt/tr/web-issues?" - a URI (as a resource) in question.
- And, ""http://dfdf.inesc-id.pt/tr/web-issues????" denotes all document types associated with ""http://dfdf.inesc-id.pt/tr/web-issues??" in question. I don't believe anythingelse beyond the four question marks makes any sense.
Ducument-type should be a URI, which is used to denote a MIME type. A proposal that I made in our manuscript to WWW2009.
- Thus use of "!"
- The "!" indicates a special notation syntax is used to refer to a fragment of an information.
The use of "!" is mostly for used by a client to use a short form notation to denote a specific document. Hence, "#!" implies some additional processing is needed to understand the fragment identifier. This will not conflict with the standard fragment identifier's syntax but offers an extension mechanism to make users be able to provide short and meaningful notation syntax.
The "!" could also help to be used after "?". Hence, "http://dfdf.inesc-id.pt/tr/uridl?!/1", for instance, can be used to refer to the first path component of the URI.
Of course, when someone is going to use the she-bang notation, s/he must point it to a URI where the notation is specfiied.
Second Proposal - Scheme-less URI
Let's create a scheme-less URI and let is be the http-URI sans the "http:". I believe this will help to solve any debate about creating new URI scheme. Because the scheme-less URI is essentially a URN and we only need one set of URN in the (ideal) world. So, any new URI scheme issue becomes "if we need a new URL scheme" as opposed to "if we need a URI scheme?". The first one is very easy to answer because as long as there is a new transportation protocol, then let it be.
In addition, it makes any schemed URI makes more sense. Hence, the scheme-less URI (URN) of this resource should be: "//dfdf.inesc-id.pt/tr/web-issues". And, "http://dfdf.inesc-id.pt/tr/web-issues" is an information path to this resource. What you see is actually an instance of "http://dfdf.inesc-id.pt/tr/web-issues??application/xhtml+xml", which is a document.
I believe the scheme-less URI with the "?" and "??" will be able to settle any new debate for creating new URI scheme because most of the proposed functionalities of XRI is, in fact, about making every URI's components to have a separate context, i.e., to make URI composed of URIs. This can be achieved with a URI Description Language (URIDL).
About userinfo and port
As a scheme-less URI is intended to denote a resource, I think, a scheme-less URI should not use the concept of "userinfo" and "port". The former is concerned with (who is doing the ) access whereas the latter is a concept of the physical web.
Information about the "port" should be inserted when a scheme-less URI is converted to a URL. That is: it should be bounded when the "scheme" is imposed on a scheme-less URN.
Information about the "userinfo" should be concerned with "authorization" and "authentication" and it is not the concern of a URI. Obviously, this suggests a design problem of the mailto URI, which follows the convention of "userinfo@host". However, here the "userinfo" is not used in the same way of, e.g., "http://user@host". The former in fact denotes a mailbox at host whereas the latter is intended to suggest the "user" is trying to access the "http://host". To cater the backward compatability, we may create a convention that can convert "mailto:user@host" into "//host/mailto/user". Thus, a "mailto" protocol can be racked on top of the scheme-less URI in a consistent manner. I have more thoughts about this but this is not the place to further elaborate it.
About query
In general, I do not think it is a good practice to use URIs with a query component to denote resource. The reason for that is the concern of URI's equivalence. Unlike URI's path component which is implicitly an array. Hence, it is obvious that "http://example.com/a/b" is not the same as "http://example.com/b/a". And this is consistent when the two URIs are compared with each other. But URI's query is often used as hashes. Usually, "http://example.com?a=1&b=2" will be handled in the same way as "http://example.com?b=1&a=1". But by URI's equivalence, they would nevertheless denote two resources. Thus, I think as a best practice, we can consider the URI's path component should be kept consistent while that of the query part to be volatile.
Should URI has a syntactic notion for person?
This should be a practical and social issue but not a technical one. Because conceptually, any desired semantics in a URI raises potential meaning inconsistencies later. But the Web is the Web for us (humans). Hence, a person might need a speical place in the URI syntax. The same issue can be seen from the design of HTTP protocol. In technical sense, natural language should not be separatedly negotiated because the concept of representation has nothing to do with human languages. It should deal exclusively with byte-stream, character encoding, client and other HTTP endpoint.

