July 30, 2010
Namespaces in SXML, part 1
Namespaces in SXML are tricky. The SSAX SXML parser takes a list of namespace prefix to URI assocations; the SXML document itself contains *NAMESPACE* nodes mapping prefixes to URIs; and the SXML serializer takes a list mapping prefixes to URIs as well! How does it all fit together?
In the discussion below I will be using the sxml-serializer egg for Chicken. I also define the following helper function which transforms SXML to XML and prints it to stdout:
(define (->xml doc . opts) (print (apply sxml-serializer doc opts)))
In SXML, element names usually consist of a qualifying URI and a local name, separated by a colon. This is similar to the universal names described in XML namespaces:
<{http://www.cars.com/xml}part /> <!-- XML universal name --> (http://www.cars.com/xml:part) ;; SXML name
XML 1.0 cannot handle such identifiers, because they contain illegal characters. So it does prefix mapping instead:
<cars:part xmlns:cars="http://www.cars.com/xml" />
However, SXML can handle these identifiers directly, as shown above, without any need for prefix mapping. To find the local name, you take everything right of the rightmost colon, in this case part.
Now, the application developer might not like dealing with these long URIs when querying or typing in a document, so SXML provides a way to define a shortcut name for the URI. These shortcuts are defined in the document inside *NAMESPACE* administrative nodes, usually at the top level.
> (->xml '(*TOP* (@ (*NAMESPACES* (cars "http://www.cars.com/xml"))) (cars:part))) <cars:part xmlns:cars="http://www.cars.com/xml" />
Now you can use cars to mean http://www.cars.com/xml anywhere in the document. Also, invididual elements may have their own local associations:
> (->xml '(*TOP* (cars:part (@ (@ (*NAMESPACES* (cars "http://www.cars.com/xml"))))) (cars:part))) <cars:part xmlns:cars="http://www.cars.com/xml" /> <prfx1:part xmlns:prfx1="cars" />
There, in the first cars:part element and any children, cars stands for http://www.cars.com/xml. Outside of it, the association does not exist.
What's with the second cars:part element, though? It's been rendered as the XML element prfx1:part with the namespace URI cars! Well, that's because we gave no association for the shortcut cars, so the serializer treats cars itself as the qualifying URI. Remember, when SXML elements include a colon, the left side is the full qualifying URI unless you explicitly specify a shortcut association. And when URIs don't have an associated XML prefix, one is generated for you, such as prfx1.
This brings us to our first insight, which is that XML prefixes are not SXML shortcut names. They may look similar, and even overlap sometimes in naming, but the distinction is critical.
Next time, more fun with namespaces.