Re: XML libs (was Re: gconf backend)



On Sun, 2003-09-28 at 06:08, Daniel Veillard wrote:
>   libxml2 is designed to be able to report multiple errors when parsing
> a resource. And your API style does not allow this. It's critical for
> a lot of work to be able to know that you have different problems
> lines 100, 120 and 134. I understand your viewpoint and will try to
> carry it on the list.

That makes sense, in a context where someone is human-editing the XML
and wants to see all the errors in the document at once.

Rather than "exceptions" the other thing that would work would be to
reliably _always_ call the error callback and set an error code on the
context if a function fails (returns NULL or whatever). Right now the
function can fail without the callback having been called. This is
perhaps a more realistic change to libxml.

If you _always_ call the error callback on error, then it's possible in
a wrapper or convenience library to convert the error callbacks into
exceptions (in fact config-loader-libxml.c in dbus tries to do this
already).

Introducing exceptions to the current API at this stage is basically a
bad idea, since you have too many old functions that don't use them and
you don't want to double the API. So perhaps the always-call-callback
approach is right.

The xmlTextReader error callback API is good, as long as the provided
error callback with xmlTextReaderSetErrorHandler() is _always_ called if
a function fails.

>   See you complain on d-d-l but did not subscribe to the friggin xml gnome org
> list to discuss the issue. On the other hand I'm pretty sure you expect
> API details and work for gtk+ or dbus to be carried on their respective
> list, isn't that unfair ?

Well I am only providing feedback because you asked. I'm not going to go
on the libxml list and provide some huge unsolicited list of requests:

 - I don't have time to write patches and that is normally what you'd
   reasonably ask if I show up with a huge list of requests.

 - all I have is feedback from my use-cases, and it should not be taken 
   as the only use case. Remember my whole point is that it may well 
   make sense to have distinct XML APIs, and so who am I to say that 
   your API should be targeted to me?

 - I'm not sure expanding the libxml API further and providing more 
   ways to do it is right _at all_, that just makes the lib larger
   and more confusing; so I need a better proposal than that.

 - I don't think this issue is the highest priority for GNOME by any 
   means, as one of expat/libxml/gmarkup basically works fine for 
   any given situation.

Perhaps the interesting thing to do is develop a tinyxml alternate lib
_or_ a wrapper API. If you or someone does that though, again, please,
do not ABI freeze it as soon as you implement it. It needs to be used in
real life by several apps and iterated through rounds of improvement
based on that.

I think this may be wrong though and xmlTextReader may be the API to go
with. It's the one I started using in config-loader-libxml.c and it
looks essentially reasonable.

I was on the libxml mailing list for a long time, btw. I just wasn't
able to keep up with the mail volume.

> [1] http://mail.gnome.org/archives/xml/2003-September/msg00146.html

Most of the APIs in this mail essentially would not be used in my use
cases, because I don't want to load an xmlDocPtr and want to do my own
I/O. I would want to feed libxml the already-loaded bytes. The way
provided in this mail is xmlReadMemory(), but that has the limitation
that you have to load the whole file at once.

What I really want is:

 context = context_new ();
 context_add_bytes (context, buffer, len);

Where you can provide the document in incremental chunks, so I could
call context_add_bytes() repeatedly appending more bytes until the
document was complete. At the end you call context_finished() or
something and the parser complains if the document isn't complete.

> [2] http://xmlsoft.org/xmlreader.html#Walking

I like the reader API. So here are the nodes I know what to do with:

    XML_READER_TYPE_ELEMENT = 1,
    XML_READER_TYPE_ATTRIBUTE = 2,
    XML_READER_TYPE_TEXT = 3,
    XML_READER_TYPE_COMMENT = 8,
    XML_READER_TYPE_DOCUMENT_TYPE = 10,
    XML_READER_TYPE_END_ELEMENT = 15,

Here are the nodes that if I wrote code I would just skip them:

    XML_READER_TYPE_NONE = 0,
    XML_READER_TYPE_CDATA = 4,
    XML_READER_TYPE_ENTITY_REFERENCE = 5,
    XML_READER_TYPE_ENTITY = 6,
    XML_READER_TYPE_PROCESSING_INSTRUCTION = 7,
    XML_READER_TYPE_DOCUMENT = 9,
    XML_READER_TYPE_DOCUMENT_FRAGMENT = 11,
    XML_READER_TYPE_NOTATION = 12,
    XML_READER_TYPE_WHITESPACE = 13,
    XML_READER_TYPE_SIGNIFICANT_WHITESPACE = 14,
    XML_READER_TYPE_END_ENTITY = 16,
    XML_READER_TYPE_XML_DECLARATION = 17

Is my resulting application going to be compliant, assuming I asked for
entity substitution? Or will my app fall over?

Again, the gmarkup API and the part of expat used in
dbus/bus/config-loader-expat.c demonstrate essentially how my apps view
an XML document.

Havoc




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]