XML libs (was Re: gconf backend)



On Sat, 2003-09-27 at 11:23, Daniel Veillard wrote:
> 
> > So you could move the new
> > backend to libxml SAX-style API and still have the same advantage.
> 
>   Are you suggesting "if you want the gconf code back on top of libxml2
> then do it yourself" ? Using "you" instead of "one" suggest this but this
> could be some misunderstanding from my part ...

"you" = "one", no I'm not saying you should do it. ;-) I'm just saying
that the xml->markup thing is not the justification for this change.

>  But gconf as well as gmarkup is your code, so do what you 
> prefer! It's just a bit of a shame that libxml2 wasn't used for your rewrite.
> I find that surprising since you're trying to promote standardization
> of tools, format and if possible codebases. It still puzzles me that I
> have an easier job getting people outside of Gnome reuse libxml2 share
> improvement and bugfixes, but that I have such a hard time within the
> GNOME project.
> 

Well, as the gconf backends link to gmarkup anyway, it's kind of "free"
while libxml2 adds 800K. And as gmarkup parses an XML subset you can
always move to full XML later.

It's only an hour of work if that to go between libxml2, expat, and
gmarkup; I abstracted that in dbus for example to make everyone happy.
(Essentially by making all of them convert the XML into three functions:
start element with attributes, end element, content text.)

I can try to explain what my dream XML library would be like if you
promise not to get offended by it:

I don't really believe that one XML library can be right for all
situations; they can have very large differences in API, code size, and
behavior in cases like error handling, namespaces, etc.
So it's very possible to not use libxml2 always while still thinking
libxml2 is an excellent library.

For the applications in GNOME my guess is that people imagine that XML
is approximately the gmarkup subset and don't figure anything else out.
I bet the apps using libxml2 get confused if libxml2 hands them anything
other than elements, attributes, and content (in UTF-8 encoding); or at
best they silently ignore other things. I know this is how I've used
libxml2 in the past. PI_NODE, XINCLUDE_START, NAMESPACE_DECL, I don't
know how to use these things, I just hope that libxml will not return
any of them and write code to skip over those nodes.

My ideal situation would be an XML-spec-compliant library that
canonicalized everything to approximately the gmarkup subset prior to
passing it to the application. Ideally the parser would be a small
library and would have robust error reporting (never print to stderr).
It would contain no I/O code at all, even to read files, the application
should do that. It would have only one API, either expat/gmarkup-like or
the .NET-like "pull" style probably. It might have some way of ensuring
that applications automatically get doctype checking and namespace
handling right. The total library size would be in the 200K range, or
perhaps less if it used GLib functionality for portability/unicode. The
library should be threadsafe in the sense that two separate parse
contexts don't share any global unprotected data.

Yes there are many XML features you couldn't use with a library like
that. But people aren't using those features anyway for most apps; the
apps just want to see a tree of elements with attributes and content,
and immediately convert that tree into an application-specific data
structure. They occasionally want to be able to load/save without losing
comments in the XML file. This parser would not be used to implement an
XSLT engine or build a DOM tree.

So the ideal lib for the cases where I've used XML files:

 - parses any well-formed XML that the app is going to be able to 
   handle
 - only one small API; expat.h is larger than I have in mind
 - API assumes conversion to app data structure, so SAX or Reader, not 
   DOM
 - text always converted to UTF-8
 - no I/O code of any kind; no error printing or LoadFile() or network
   access
 - all state is in a per-parse context object (app must do its own 
   thread locks around the context object if it wants to use it 
   from multiple threads)
 - freeing all context objects should result in the library using 
   0 bytes
 - if an error occurs, it is reported immediately to application 
   code using consistent conventions, and parsing at least optionally 
   aborts
 - application can "throw" an error itself if it doesn't like the
   elements/content it sees
 - nonvalidating, but strict about well-formedness
 - no larger than around 200K (but significantly less should be 
   possible)
 - GLib contains a GLib-native wrapper API for the library, perhaps 
   in a separate libglib-xml.so much as gobject is separate
 - has no saving code, other than a function to escape a string
 - while I'm dreaming: has "make check" covering 100% of
   basic blocks
 - is fairly fast, but it doesn't have to be the fastest ever

Something like that, surely some of the details here are wrong.
Clearly an XML library like this would suck for someone implementing
XML-intensive processing, but for just loading application data files
and parsing small strings as with GtkLabel this is IMHO the right
approach and would _in practice_ maximize the number of well-formed XML
documents our applications would handle correctly.

If we had this I do not think it would replace libxml2, because there
are instances where you need full XML details, validation, and so forth.

However I don't think it's really a high priority to go off and write
another XML library right now, one of gmarkup/expat/libxml2 is 'close
enough' for most applications. That's why you don't see me going around
lobbying for someone to write my dream XML library, it's just not a big
issue so far. But maybe we can think about it on the 2-4 year timeframe.

Havoc




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]