[xml] Re: XML libs (was Re: gconf backend)



[ Cc'ing to xml gnome org list, not that I like that, but it seems
  totally impossible to get the GNOME core developpers to actually
  go to the libxml2 related list to discuss libxml2 realted problems :-( ]

On Sun, Sep 28, 2003 at 02:07:52AM -0400, Havoc Pennington wrote:
On Sat, 2003-09-27 at 19:51, Daniel Veillard wrote:
  You think in terms where you control the input and output. the
error is that your next big client is gonna use an Oracle back-end for
your XML data, and suddenly you don't control the production anymore,
and if you use a non conformant parser you made a promise that you just
can't hold, and that kind of thing has serious long-term costs.

Slow down, I'm not advocating gmarkup. That's why I described my ideal
XML lib, and said it would be conformant.

  Okay, sorry about that. Let's stay on track. There is ways to make 
progresses I hope !

  This has nothing to do with web development versus data oriented
development. You have a spec, either you're compliant or not. It's a
contract. And all it costs you to comply to that contract is mostly
to reuse correctly a compliant library instead of trying to roll your
own.

But that isn't true. To be XML compliant in terms of handling the stuff
not found in the gmarkup-like subset, you not only have to use the
library, you have to use it properly. Or you have to let it do things
that probably break many apps.

Say I just cut metacity over to a library that handles includes and
dtds. Suddenly themes would probably be able to cause the WM to lock up
by creating unexpected I/O during theme loading. There are probably
security issues as well since themes can be untrusted. To switch to the
library then, I need a detailed understanding of what it is going to be
doing, and then I have to figure out how to turn off the I/O; but when I
turn it off, metacity's theme parser isn't XML compliant anymore, as I
understand it. The features of XML that require nonlocal or even local
I/O seem very browser-centric and problematic for a lot of apps.

   In the new set of APIs I'm currently writing, you pass 2 options
   doc = xmlReadFile(filename, NULL, XML_PARSE_NOENT | XML_PARSE_NONET);
Then the libxml2 parser will substitute entities and forbid remote resources
access.

  Well libxml2 uses callback for errors, that's the model everybody
uses and I'm not sure that was ever questionned by the relatively large
user base. Since your model seems to impose an asynchronous processing
I think this will need some discussion on the mailing-list. I cannot
change radically to a new model without at list a bit of explanation.

Basically I want to write a function:

 MyAppDataStructure*  load_xml_file (const char *filename, GError
**error);

So the question is how to do that. The problem is that functions such as
xmlLoadACatalog() (totally random example) don't return any explanation

  Well that is a mostly internal function. But Okay if you take
the xmlReadFile() example it's the same.

of the error; you can look at errno, but you don't know if the errno is
for stat() or open() or read() or there could be a parse error or
out-of-memory and errno is junk. So the only possible error to display

  libxml2 is designed to be able to report multiple errors when parsing
a resource. And your API style does not allow this. It's critical for
a lot of work to be able to know that you have different problems
lines 100, 120 and 134. I understand your viewpoint and will try to
carry it on the list.

to the user is "failed to load catalog" or something, with no further
diagnostic. Also, sometimes on failure it looks to me like
xmlGenericError was called and sometimes it wasn't.

What you want to display for a parse error is the line where the error
happened and a problem description; for an I/O error you want strerror
(errno). GError/DBusError/CORBA_environment/C++exceptions are a way to
propagate this detailed information.

Not that I really advocate doing this for libxml2; it seems like it
would basically double your API size by adding
xmlLoadACatalogWithError() and so forth. I _don't_ think this is a good
idea, for the record.

  Those are not function you're expect to use for this. I'm redesigning
a new layer of APIs for simple purpose, as I pointed already in a previous
mail. If there is a way to get them in-line then let's do it, I don't
want a bunch of grumpy Gnome users for a couple of years until I do
some more API refactoring.
  See you complain on d-d-l but did not subscribe to the friggin xml gnome org
list to discuss the issue. On the other hand I'm pretty sure you expect
API details and work for gtk+ or dbus to be carried on their respective
list, isn't that unfair ?

Though I am still hoping a conformant lib can be small and avoid some of
the problematic things like doing I/O behind the app's back.

  If you want to make progresses, and not blindly ditch libxml2 for
non-real issues, there is a window of opportunities now. the new xmlRead
based functions and the xmlReader could be made closer to your ideal API,
but as usual it will get there *if you help* . You can do that or
dream that somewhere, someone, will have the free time, money and energy
to build your dream conformant XML-1.0 library which will save you 600KB of
on-disk space, and maintain that code for the upcoming 10 years.
  So let's be realistic, if there are ways in which the proposed 
new APIs [1] or the xmlReader [2] could be made closer to what you need
read the damn resources, and comment, preferably on the list. This will
take 3 times less emails work, and will get somewhere, dammit !

Daniel

[1] http://mail.gnome.org/archives/xml/2003-September/msg00146.html
[2] http://xmlsoft.org/xmlreader.html#Walking

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]