Re: Performance implications of GRegex structure
- From: mark mark mielke cc
- To: Owen Taylor <otaylor redhat com>
- Cc: gtk-devel-list gtk org, Morten Welinder <mortenw gnome org>
- Subject: Re: Performance implications of GRegex structure
- Date: Thu, 15 Mar 2007 12:13:43 -0400
On Thu, Mar 15, 2007 at 10:56:57AM -0400, Owen Taylor wrote:
> The compiled form of a regular expression is not altered during matching,
> so the same compiled pattern can safely be used by several threads at once.
> ...
> Well, I could imagine (maybe, barely) that someone could show me numbers
> that showed that with a variety of long and complicated regular
> expressions, compiling them was still 10x as fast as matching them
> against very short strings.
To answer Owen - I expect this is because the base regcomp()/regexec()
libraries to not make this distinction. To emulate the higher
performing libraries that separate the Pattern from the Matcher would
require jumping through some hoops.
There are two cases I see. One is multithreaded scaleability. If this
was impotant, simulation for these older libraries could be performed
using a pool of pre-compiled regular expression objects. For example,
if "give me a new matcher object" would pull the compiled regular
expression from the pool, or if none is available, compile a new one,
and once complete, it would return the regular expression to the
pool. At some point, it would reach a steady state where new
compilation was not required. I expect it would begin to line up with
the number of threads using it.
The second case is ability to re-use a compiled pattern from the same
thread. I believe this is possible using the provided interface, although
the freedom to use more than one Matcher at the same time might be
convenient.
To illustrate the cost of compile-every-time vs compile-once (19X slower!):
Using the regcomp()/regexec() that comes with my FC6 system with
compile each time:
-- CUT --
$ cat r.c
#include <sys/types.h>
#include <regex.h>
int main ()
{
regex_t regex;
int i;
for (i = 0; i < 1000000; i++) {
regcomp(®ex, "constant", 0);
regexec(®ex, "text that contains constant somewhere", 0, 0, 0);
regfree(®ex);
}
return 0;
}
$ gcc -O3 -o r r.c
$ time ./r
./r 15.04s user 0.04s system 99% cpu 15.223 total
-- CUT --
Using the regcomp()/regexec() that comes with my FC6 system with
compile once:
-- CUT --
$ cat r2.c
#include <sys/types.h>
#include <regex.h>
int main ()
{
regex_t regex;
int i;
regcomp(®ex, "constant", 0);
for (i = 0; i < 1000000; i++) {
regexec(®ex, "text that contains constant somewhere", 0, 0, 0);
}
regfree(®ex);
return 0;
}
$ gcc -O3 -o r2 r2.c
$ time ./r2
./r2 0.77s user 0.00s system 100% cpu 0.773 total
-- CUT --
--
mark mielke cc / markm ncf ca / markm nortel com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
http://mark.mielke.cc/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]