3e8.org

In EBCDIC we trust.

July 31, 2010

Namespaces in SXML, part 2

Last time, we talked about the distinction between SXML shortcut names and XML prefixes, and particularly about the *NAMESPACES* node. This time let's talk about the prefixes you pass to the parser and the serializer.

The default SSAX parser in Chicken is ssax:xml->sxml. It accepts an alist that maps user namespace shortcuts (symbols) to namespaces (URI strings). Let's see what happens if we pass it a null list:

> (ssax:xml->sxml
   (open-input-string
    "<cars:part xmlns:cars=\"http://www.cars.com/xml\" />")
   '())
(*TOP* (http://www.cars.com/xml:part))

Because we didn't give it an association between shortcuts and URIs, the parser returns fully-qualified names. It completely discards the XML prefix in the original document, because the prefix is only meaningful for languages that cannot qualify element names with URIs (i.e. XML 1.0). An important consequence is that you cannot recreate the original XML document prefixes without additional information provided by the user.

Now let's associate the shortcut prefix cs to the namespace URI for the XML prefix cars, and parse the same document:

> (ssax:xml->sxml
   (open-input-string
    "<cars:part xmlns:cars=\"http://www.cars.com/xml\" />")
   '((cs . "http://www.cars.com/xml")))
(*TOP* (@ (*NAMESPACES* (cs "http://www.cars.com/xml")))
 (cs:part))

I used cs here so that you could see that the prefix cars is completely irrelevant to the output, except on the XML side in determining the universal name. It is not even retained in the SXML document. The fully qualified SXML name http://www.cars.com/xml:part is now, via the user's association, changed to the short name cs:part. This association is also recorded in the *NAMESPACES* node so that the namespace URI can be reconstructed when transformed back to XML. Notice that during parsing, the alist actually maps from URI strings to shortcut symbols, which is the opposite of what you'd expect.

Let's now consider transforming the two documents above back to XML. In the previous post, we saw the erroneous case where we tried to use a shortcut namespace without defining that association, so the shortcut was treated as a full namespace URI:

> (->xml '(*TOP* (cars:part)))
<prfx1:part xmlns:prfx1="cars" />

However, the first SXML document we produced above is valid, and will produce valid output:

> (->xml '(*TOP* (http://www.cars.com/xml:part)))
<prfx1:part xmlns:prfx1="http://www.cars.com/xml" />

There's no assocation to an XML prefix, though, so prfx1 is generated for you. This is a perfectly legal XML document, and is readable by any conformant parser, even though it looks strange.

We can pass another association list, this time to the serializer, to map URI strings to XML prefixes. Below, the namespace URI http://www.cars.com/xml is mapped to the XML prefix cars, instead of the auto-generated prfx1.

> (->xml '(*TOP* (http://www.cars.com/xml:part))
         ns-prefixes: '((cars . "http://www.cars.com/xml")))
<cars:part xmlns:cars="http://www.cars.com/xml" />

It's important to remember that the alist you pass to the serializer in ns-prefixes does not map shortcut prefixes to URIs. That's the job of the alist passed to the parser--or more accurately, the job of the *NAMESPACES* node. ns-prefixes only maps URIs to XML prefixes. Returning to the erroneous example above,

> (->xml '(*TOP* (cars:part))
         ns-prefixes: '((cars . "http://www.cars.com/xml")))
<prfx1:part xmlns:prfx1="cars" />

we can see ns-prefixes has no effect. There is no such namespace URI as http://www.cars.com/xml in that document, nor does ns-prefixes create a shortcut mapping to that URI.

Turning to the second example which was generated by the SSAX parser,

> (->xml '(*TOP* (@ (*NAMESPACES* (cs "http://www.cars.com/xml")))
           (cs:part)))
<cs:part xmlns:cs="http://www.cars.com/xml" />

we can see there is a namespace association in *NAMESPACES* from the shortcut cs to the URI http://www.cars.com/xml. Recall this association was created by the user when calling the parser, not by the XML document. During serialization, this assocation causes the shortcut namespace cs to be translated to the URI http://www.cars.com/xml. However, we still need to determine the resulting XML prefix. By default, the serializer conveniently uses the shortcut name cs as the XML prefix!

That was, however, just the default. If we actually provide a mapping from URI to XML prefix in ns-prefixes, that will override the default:

> (->xml '(*TOP* (@ (*NAMESPACES* (cs "http://www.cars.com/xml")))
           (cs:part))
         ns-prefixes: '((cars . "http://www.cars.com/xml")))
<cars:part xmlns:cars="http://www.cars.com/xml" />

So the shortcut cs is translated to the namespace URI http://www.cars.com/xml via *NAMESPACES*, and the URI is then mapped to the XML prefix cars via ns-prefixes. Again, notice how ns-prefixes contains no reference to the shortcut cs, because it never sees the shortcut names; it only sees the namespace after expansion into a URI.

Confused yet? Let me sum up.

  • When parsing XML to SXML, XML prefixed names are mapped to universal names (local names qualified with URIs), based on the xmlns attributes in the document. Optionally, these URIs are mapped to shortcut names based on an alist passed by the user, and this mapping is also stored in the *NAMESPACES* node.
  • When serializing SXML to XML, shortcut names are mapped back to URI namespace strings using the associations stored in the *NAMESPACES* node, and all URIs are then mapped to XML prefixes based on a user-provided alist. Since universal names are illegal in XML 1.0, all URIs must be mapped to prefixes; therefore, automatic prefixes are created if you do not provide a mapping.

In the next installment, we'll take a look at a more complex example.