[xml] libxml newbie question on htmlParseChunk function

From: Van H Tran <tvhoang1980 yahoo com>
To: xml gnome org
Subject: [xml] libxml newbie question on htmlParseChunk function
Date: Fri, 2 Jun 2006 01:07:00 -0700 (PDT)

Hi all,
My very first post in this mailing list :)

Ok, i'm trying to unhtmlize some text, using the SAX
model.

Here is how i initialize the parser

void unhtmlizeHandleCharacters(void *user_data, const
xmlChar * string,
                   int length)
{
   fprintf(stderr,"string = %s", (gchar *)string);
   //process string here...
}
void unhtmlize(text)
{
    sax_p = g_new0(htmlSAXHandler, 1);
    sax_p->characters = unhtmlizeHandleCharacters;
    ctxt =
    htmlCreatePushParserCtxt(sax_p, buffer, string,
strlen(string), "",
                 XML_CHAR_ENCODING_UTF8);
    htmlParseChunk(ctxt, string, 0, 1);
}    


What's interesting is, this works with 'normal' text.
However if
text = "abc < xyz"

Then i see in the debug in func handleCharacters that
it only takes "abc " as the string, everything after
this '<' character is omitted.

So my func unhtmlize("abc < xyz") gives "abc " as the
result. 

How can i over come this? Any reply much appreciated. 

Thanks in advance
TranVan Hoang,

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]