Re: [xml] Adding an entity in a textnode



Hi,

Although ü has some meaning in HTML, it is not a predefined entity in XML.

Either define german characters as entities, or use their numerical values. Here is a XML document that contains the unicode values for all german characters:

  [astaroth:~]$ cat test.xml
  <doc>
    &#xc4; &#xe4;
    &#xd6; &#xf6;
    &#xdc; &#xfc;
    &#xdf;
  </doc>

Here is what happens when you process that document and request the output in a apropriate encoding (your mail agent must support ISO-8859-1 to see this correctly):

  [astaroth:~]$ xmllint --encode iso-8859-1 test.xml
  <?xml version="1.0" encoding="iso-8859-1"?>
  <doc>
    Ä ä
    Ö ö
    Ü ü
    ß
  </doc>

Note that your resulting HTML files will never contain &uuml; and similar. They will either contain the numeric entity, like my test.xml, or a character itself, depends on the encoding you use in the resulting HTML.

Ciao
Igor


christoph riedl wrote:
Hello everyone.
I'm basically working on a PHP project but as php uses libxml, I guess here
is the place to get a solution to my problem. Unfotunately I don't know where my "problem" is generated so I can only
describe the
symtoms in the hope that any of you can hopefully help me out.

I'm building up a xmldoc from scratch. Then I create a text node with the
following content: "german umlaut &uuml;". When I then dumpmem the whole thing,
the result lookes like "german umlaut &amp;uuml;".
The whole project is intended for generating html files in the end (after an
additional processing via sxlt). So a textnode containing an entity would be
perfectely ok.
Is there a way that I can prevent this replacement of & with &amp; in the
textnode?
My current workaround for this delemma looks something like:
ereg_replace ( "&amp;", "&", result_from_dump_mem() );
As you might guess, this is very unefficient.

I would be very thankfull for any note that would point me toward a solution
or otherwise
clear out where this comes from and why it is the way it is.

Regards,
Christoph Riedl





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]