[xml] Possible bug in canonicalization



I've encountered a likely bug in libxml's canonicalization. I am using libxml 2.6.5, the latest released version. I probably don't have the latest release of libxslt, though. This is on Red Hat Linux 8. I am not certain of this bug only because I am only accessing canonicalization through xmlstarlet, rather than through my own code. It is possible the problem is in xmlstarlet's interface to libxml2. However, the nature of the bug does mean that if it's in the libxml code it should be easy to spot. Ditto if it's in xmlstarlet.

In brief, the canonicalization is placing a line feed at the end of the canonicalized document, at least when there are no comments or PIs in the epilog (I haven't tested what happens when there are such nodes in the epilog). According to the canonical XML spec, section 4.3, this is wrong:

The C14N-20000119 <http://www.w3.org/TR/2001/REC-xml-c14n-20010315#C14N-20000119> Canonical XML draft placed a #xA after each PI outside of the document element as well as a #xA after the end tag of the document element. The method in this specification performs the same function except for omitting the final #xA after the last PI (or comment or end tag of the document element). This technique ensures that PI (and comment) children of the root are separated from markup by a line feed even if root node or the document element are omitted from the output node-set.

--
Elliotte Rusty Harold






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]