Re: [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2
- From: Karl Dubost <karl w3 org>
- To: Andi Sidwell <andi takkaria org>
- Cc: xml gnome org, "Michael \(tm\) Smith" <mike w3 org>,	Nick Kew <nick webthing com>
- Subject: Re: [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2
- Date: Tue, 26 Aug 2008 09:36:37 +0900
Le 20 août 2008 à 23:34, Andi Sidwell a écrit :
FWIW, I've spent the summer working on a C HTML5 parser which is
approaching stability, called Hubbub[1].  It's about as half as fast  
as
libxml2 at parsing the HTML 5 spec with an O(1) treebuilder, and it's
fairly easy to bind to the libxml2 interfaces (and is being used in  
lieu
of the libxml2 HTML parser in a small Web browser, NetSurf[2], in the
development branch).  Note it's a) not buildable as a shared library  
or
b) had a formal release, but if someone wants an HTML5 parser in C,  
then
it's probably not a bad bet.
excellent news. The HTML 5 Spec authorizes more than the usual event  
of parsing by retrospectively modifying the tree (ala tidy), I wonder  
how much it would require modification in libxml2 and if indeed it is  
a better strategy to make an interface than directing including the  
code in the library.
[1] http://www.netsurf-browser.org/projects/hubbub/
[2] http://www.netsurf-browser.org/
--
Karl Dubost - W3C
http://www.w3.org/QA/
Be Strict To Be Cool
[
Date Prev][
Date Next]   [
Thread Prev][
Thread Next]   
[
Thread Index]
[
Date Index]
[
Author Index]