Re: alternative gmarkup parser



Havoc Pennington <hp redhat com> writes:

> Hi,
> 
> I reimplemented GMarkup a different way, using a state machine type of
> thing. The new implementation allows incremental parsing; I also
> swapped in an expat-style vtable API instead of the thing with the
> nodes. This API is more general-purpose/flexible and has fewer entry
> points. For many applications I think it's also simpler to use, though
> it may not seem that way in theory; the test program was easier to
> write using it, and I expect it to be easier to write the rich text
> parser thing this way also. The nodes result in a lot of tedious list
> walking and switching on node type that can be avoided with the
> vtable; also the vtable allows you to signal errors as the markup is
> parsed, instead of having to walk the nodes after the fact and
> validate them.

This looks OK; I think you are right that it is probably as
easy to use as the node thing and certainly easier to language-bind.

Its sort of crying out to be an object, but aside from the
impossibility of that dependency, that wouldn't be very convenient
in C. 

As far as I'm concerned, its OK if you go ahead and commit. Though 
a test suite for glib/tests would probably be a good idea.
 
> Owen this is where I'm wanting the is_first_byte_in_char() macro, grep
> for 0x80, see what you think about alternative ways to do the code.

Very easy:

Step a) fix g_utf8_validate() to properly handle trailing incomplete
        (I think I already did this in the tree where I was working
        on utf-16 handling)
Step b) Use g_utf8_validate()

The reason I suggested adding the **end argument to g_utf8_validate()
was so this could be handled; on an incomplete character at the
end, it is supposed to return TRUE with **end pointing to the 
start of the incomplete character.

> Yes, this code contains a 750-line function. ;-) I'll split it up...
> 
> If we keep the old API with the nodes, I still want to swap in this
> implementation instead of the old implementation.
> typedef enum
> {
>   /* Hmm, can't think of any at the moment */
>   G_MARKUP_FOO = 1 << 0
>   
> } GMarkupParseFlags;

:-) Are empty enumerations valid C?
 
> GMarkupParseContext *g_markup_parse_context_new   (const GMarkupParser *parser,
>                                                    GMarkupParseFlags    flags,
>                                                    gpointer             user_data,
>                                                    GDestroyNotify       user_data_dnotify);
>
> void                 g_markup_parse_context_free  (GMarkupParseContext *context);
> gboolean             g_markup_parse_context_parse (GMarkupParseContext *context,
>                                                    const gchar         *text,
>                                                    gint                 text_len,
>                                                    GError             **error);

Is there a reason to have both the error() callback and the error result here?
When would you use one or the other?

(One thing that I might suggest if you are going to have both is that
you should pass the character/line into the error() callback in
machine-readable form.)
                                                    
Regards,
                                        Owen





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]