Re: EggRegex
- From: Marco Barisione <barisione gmail com>
- To: gtk-devel-list gnome org
- Subject: Re: EggRegex
- Date: Thu, 20 Jul 2006 11:57:16 +0200
Matthias Clasen wrote:
When I was last looking at regular expressions for GLib (which
resulted in the current eggregex code), the first decision was to
go for Perl regular expression, rather than posix. That naturally
leads to PCRE. The main gripe with PCRE was (and is) that it
had (and probably still has) relatively limited Unicode support.
The version of eggregex in libegg uses the three years old pcre 4.5. Now
pcre 6.7 has a better support for Unicode.
Now PCRE:
- handles UTF-8
- knows that, doing a caseless match, à matches À
- has generic character types for non ASCII characters, so \p{Lt}
matches a title case letter, \p{Sc} matches a currency symbol, and so on
Extended properties such as "Greek" or "InMusicalSymbols" are not supported.
And it brings its own implementation of the necessary Unicode
data, instead of using the GLib one.
Yes, but it shouldn't be too difficult to port pcre to use glib for
Unicode. I can't do it because my knowledge of Unicode is very limited.
However this would mean that we should always use the internal PCRE
instead of the system supplied one.
--
Marco Barisione
http://www.barisione.org/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]