RE: FW: [xml] How can I parse XML using ISO 8859-1 decoding?



Hi Andrew,

I'll try to give something more useful. This seems to be a point many user 
have difficulty with.

I tried and it works.
Howeveris that possible to enable this decoding gobally?

The short answer is "no". But please continue to read.

The background for this feature (as I prefer to view it), is the
fact that an XML file logically always contains UNICODE characters.
Which obvious or nor so obvious encodings are used to represent this
stream of UNICODE characters as byte stream is technical detail, 
which should be of no interest to applications. As well as some other
details, most often cited:
- attribute order
- single or double quotes
- how the angle brackets are escaped

So what to do if your world really consists only of ISO-8859-1
and your application really only want to handle ISO-8859-1

A) Layers
Systematically translate in specific layer between your application
code and libxml2

B) Decisions
You must decide what to do, if a character outside ISO-8859-1 arrives
at your translation layer
- abort
- ignore
- translate
- escape
Because different applications need different decisions here, it is
best to do the translation layer yourself. This is one of the reasons
it is not included in libxml2

Regards,
Peter Jacobi





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]