Hello,
the following simple test case
================================================================
#include <stdio.h>
#include "libxml\parser.h"
#include "libxml\parserInternals.h"
int main(int argc, char **argv) {
xmlDocPtr doc;
xmlNodePtr root;
xmlChar *str;
fprintf(stdout, "%s: using libxml version %s\n\n",
"Test xmlNode[Set/Add]Content()", xmlParserVersion);
doc = xmlNewDoc(BAD_CAST "1.0");
root = xmlNewNode(NULL, BAD_CAST "root");
xmlDocSetRootElement(doc, root);
str = xmlEncodeEntitiesReentrant(doc, BAD_CAST " X&Y ");
xmlNodeSetContent(root, str);
//str = BAD_CAST " X&Y ";
xmlNodeAddContent(root, str);
xmlDocDump(stdout, doc);
xmlCleanupParser();
exit(0);
}
================================================================
Produces the following (unexpected) output (run against several
libxml2 versions, including 2.06.26 on WinXP SP2, but this
doesn't seem to be a platform issue):
================================================================
Test xmlNode[Set/Add]Content(): using libxml version 20626
<?xml version="1.0"?>
<root> X&Y X&amp;Y </root>
================================================================
You'll notice the "double encoded" entity "&amp;".
In the actual application, I send all user input through
xmlEncodeEntitiesReentrant(), which seemed the proper way
(omitting it will result in an error when calling
xmlNodeSetContent(), e.g.
"error : unterminated entity reference Y" in the test case).
I've traced this down to a point where the difference in the both
calls seems to arise:
xmlNodeSetContent() calls xmlStringGetNodeList() to fetch a list
of text nodes from the given user input, which then again
uses xmlGetDocEntity() to (re-)replace the encoded entity with
the original character.
xmlDocDump() obviously converts the ampersand back to it's
entity (which is fine).
OTOH, xmlNodeAddContent() (via xmlNodeAddContentLen()) calls
xmlNewTextLen(), which simply does a xmlStrndup() on the
given content to create a new text node.
xmlDocDump() now handles the ampersand of the already encoded
entity as literal ampersand (not so fine).
So, the question: Am I simply wrong in using these calls,
or is it really an issue within libxml2?
If so, I'd happily provide a patch, but this seems quite a
libxml2 internal (I'd fear for side effects), and I'm not
sure which behaviour would be to address!?
Any advice would be greatly appreciated.
Ciao, Markus
Mit freundlichen Gruessen - Kind regards
Markus Keim
________________________Addressed by:________________________
ORDAT GmbH & Co. KG - Serversystems / eCom
Dipl.-Inf. (FH) Markus Keim Fon: +49 (641) 7941-0
Rathenaustr. 1 Fax: +49 (641) 7941-132
35394 Gießen mailto:markus_keim ordat com
See: http://www.ordat.com
_____________________________________________________________
I love deadlines. I like the whooshing sound they make as
they fly by. -- Douglas Adams
Attachment:
addNodeContent.c
Description: addNodeContent.c