libxml - utf8 / 8bit charsets.
- From: Michael Meeks <michael ximian com>
- To: Daniel Veillard <veillard redhat com>
- Cc: gnome-hackers gnome org
- Subject: libxml - utf8 / 8bit charsets.
- Date: Mon, 26 Mar 2001 05:09:13 -0500 (EST)
Hi Daniel,
        When I looked into this issue, it seemed to me that libxml was   
being too clever for it's own good :-) but first - let me assume that the
only significant user of libxml1 is now the GNOME project - is that fair ?   
        So - there are a lot of possible char-sets that we could support,   
however looking at parser.c (xmlSwitchEncoding), it seems that we flag
errors on all encodings except ENCODING_NONE and ENCODING_UTF8.
        So - given that mixed charset xml files exist, why can we not get
libxml to simply return an exact representation of what was in the input
string - regardless of encoding. And similarly on write, we just assume
the application is going to get it correct.
        I think what screwed me up using 8 bit, was code that started
examining a byte stream as chars assuming that it was utf-8 and trying to
do validation of it to ensure that no chars in a certain range were
present. If this is the breakage, it is of very limited use to us.
  
        Of course, it's entirely possible that I just mis-remembered
everything.
  
        Regards,
  
                Michael.
-- 
 mmeeks gnu org  <><, Pseudo Engineer, itinerant idiot
[
Date Prev][
Date Next]   [
Thread Prev][
Thread Next]   
[
Thread Index]
[
Date Index]
[
Author Index]