Rudimentary automated duplicate finding



In the next step towards taking over the world, I've[1] written a cgi
that, given a bug #, will search the database for bugs with matching
stack traces.

It's a long way from perfect, but I tested it a bit tonight by clearing
out some rhythmbox and galeon1.2 bugs. I'd appreciate use by others
(esp. in 1.x bugs) and comments/feedback/long, long lists of bugs it
totally botches.

The URL[4]:
	http://bugzilla.gnome.org/simple-dup-finder.cgi

Known bugs:
      * does weird things with traces for things like 86835
      * 83904 is a dup of 78700 but not vice-versa <- big one, not sure
        how to fix/handle.
      *   
Improvements I intend to add:
      * ignore common functions like glog*. I'm taking suggestions for
        other things to drop- note that I already 'skip' everything up
        to and including sigaction, killpg, and <signal handler called>
        so please don't suggest the libgnomeui stuff in the signal
        handler.
      * if no dups, do a 'looser' search- only 3 symbols, maybe? Not
        sure what the algorithm is for this- again, taking suggestions.
      * if 'no symbol table is found' repeatedly, we'll automatically
        return a 'this crash isn't useful' note.
      * When we upgrade to 2.16, take advantage of 2.16's better DB
        layout to figure out what bugs already have 'high' duplicate
        scores. [i.e., right now, when I search on a duplicate of bug
        59500, I get several hundred dups including 59500; with the 2.16
        layout it gets simple to 'highlight' 59500 as the one the rest
        of those are marked a duplicate of.]

I hope this helps everyone who deals with stack traces, and I hope we
can eventually make it robust enough to have it /automatically/ parse
and mark all bug-buddy stack traces. We'll see about that :)

Luis

[1]well, Ben Liblit and Ben Frantzdale did the hard part[2] and I just
tagged along :)[3]
[2]wrote a one line regexp
[3]wrote the other easy 100 lines :) 
[4]Drum Roll Please




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]