On Tue, 31 Oct 2006 10:26:41 -0800, Carl Worth wrote: > On Tue, 31 Oct 2006 15:26:35 +0100 (CET), Tim Janik wrote: > > i.e. using averaging, your numbers include uninteresting outliers > > that can result from scheduling artefacts (like measuring a whole second > > for copying a single pixel), and they hide the interesting information, > > which is the fastest possible performance encountered for your test code. > > If computing an average, it's obviously very important to eliminate > the slow outliers, because they will otherwise skew it radically. What > cairo-perf is currently doing for outliers is really cheesy, > (ignoring a fixed percentage of the slowest results). One thing I > started on was to do adaptive identification of outliers based on the > "> Q3 + 1.5 * IQR" rule as discussed here: > > http://en.wikipedia.org/wiki/Outlier For reference (or curiosity), in cairo's performance suite, I've now changed the cairo-perf program, (which does "show me the performance for the current cairo revision"), to report minimum (and median) times and it does do the adaptive outlier detection mentioned above. But when I take two of these reports generated separately and compare them, I'm still seeing more noise than I'd like to see, (things like a 40% change when I _know_ that nothing in that area has changed). I think one problem that is happening here is that even though we're doing many iterations for any given test, we're doing them all right together so some system-wide condition might affect all of them and get captured in the summary. So I've now taken a new approach which is working much better. What I'm doing now for cairo-perf-diff which does "show me the performance difference between two different revisions of cairo" is to save the raw timing for every iteration of every test. Then, statistics are generated only just before the comparison. This makes it easy to go back and append additional data if some of the results look off. This has several advantages: * I can append more data only for tests where the results look bad, so that's much faster. * I can run fewer iterations in the first place, since I'll be appending more later as needed. This makes the whole process much faster. * Appending data later means that I'm temporally separating runs for the same test and library version, so I'm more immune to random system-wide disturbances. * Also, when re-running the suite with only a small subset of the tests, the two versions of the library are compared at very close to the same time, so system-wide changes are less likely to make a difference in the result. I'm really happy with the net result now. I don't even bother worrying about not using my laptop while the performance suite is running anymore, since it's quick and easy to correct problems later. And when I see the results, if some of the results looks funny, I re-run just those tests, and sure enough the goofy stuff just disappears, (validating my assumption that it was bogus), or it sticks around no matter how many times I re-run it, (leading me to investigate and learn about some unexpected performance impact). And it caches all of those timing samples so it doesn't have to rebuild or re-run the suite to compare against something it has seen before, (the fact that git has hashes just sitting there for the content of every directory made this easy and totally free). The interface looks like this: # What's the performance impact of the latest commit? cairo-perf-diff HEAD # How has performance changed from 1.2.0 to 1.2.6? from 1.2.6 to now? cairo-perf-diff 1.2.0 1.2.6 cairo-perf-diff 1.2.6 HEAD # As above, but force a re-run even though there's cached data: cairo-perf-diff -f 1.2.6 HEAD # As above, but only re-run the named tests: cairo-perf-diff -f 1.2.6 HEAD -- stroke fill The same ideas could be implemented with any library performance suite, and with pretty much any revision control system. It is handy that git makes it easy to easily name ranges of commits. So, if I wanted a commit-by-commit report of every change that is unique to some branch, (let's say, whats on HEAD since 1.2 split off), I could do something like this: for rev in $(git rev-list 1.2.6..HEAD); do cairo-perf-diff rev done -Carl PS. Yes, it is embarrassing that no matter what the topic I end up plugging git eventually.
Attachment:
pgptMUKCNSOrs.pgp
Description: PGP signature