Re: Some performance notes



Soeren Sandmann <sandmann daimi au dk> writes:

> Owen Taylor <otaylor redhat com> writes:
> 
> > So, without including hard data (I don't have much), here are
> > my observations:
> 
> I think it is important to also do some measurements with real
> applications to see which parts of Gtk they spend their time in.
> 
> >  * With debugging turned off, the bulk of time was spent in
> >    the signal emission code and its GValue handling (40-50%)
> 
> Speeding up signals may be a matter of caching.  I can easily imagine
> time-critical code emitting the same signal over and over.  I don't
> know if caching makes any sense in the code, though.

In general, the signal emissions aren't retrieving information
so I don't think caching is relevent. An exception to this is
size-request. There might be some noticeable improvements
for resizing if we:

 - Add a private flag for the widget "has requisition"
 - Clear the flag initially and on queue-resize
 - Set the flag after ::size-request.
 - Use widget->requisiiton instead of emitting ::size-request
   when the flag is set.

Another emit-less-signals approach is that we could avoid
calling size-allocate if:

 a) queue_resize wasn't called on the widget itself
 b) The allocation of the widget is the same as before 
    
Unfortunately, widgets have been known to store things other than
their actual allocation in widget->allocation, which poses a problem
for this.

A lot of signals are also emitted at setup time (parent-set,
heirarchy-changed, notify, show, map, etc), and setup time is
something we also really need to work on - perhaps more than
resizing. So, I don't think "emit less signals" is the full solution.
 
> >  * When opaque-resizing, another ~10 percent of non-debug time
> >    was spent maintaining invalid regions. This is probably quite 
> >    optimizable - some extra region copies are made, and it looks like 
> >    that adding a "completely invalid" flag to GdkWindow might allow bypassing
> >    a bunch of computations, since it seems like windows were
> >    getting invalidated repeatedly.
> 
> The tests I have been doing are concerned with snappyness or
> interactive performance. 'Snappyness' is really measured in how many
> event can we respond to when 'something is going on', like opaquely
> resizing something.
> 
> To experiment with interactive performance, I made panes resize
> opaquely and resizes take effect immediately. 
> Then I put a VPane in the toplevel window in testgtk and ran the
> whole thing under a profiler and tried to extract from the
> call-graph where the time was spent /when responding to events/.
> 
> The result was that by far the most time was spent redrawing (

I'm curious what machine you were doing this testing on. 
(CPU, video card, and amount of ram on the video card)
I never got any noticeable performance problems with your opaque
paned resize patch on a :

 celeron 400mhz, 32meg matrox g400, 1280x1024 16bpp

So I assume your setup must be considerably slower.

> and a non-negligible amount on gtk_widget_style_get()).  

Yes, I saw this too. I think g_param_spec_pool_lookup() may
need a good kick in the rear. 

> This is good, because drawing means something entertaining is going on
> on the screen, which is another way of saying that it feels snappy.
> It also means that if resizes are handled this way, speedups to the
> drawing code immediately translates into snappyness.  In fact, it
> doesn't even have to be speed-ups - just moving time-consuming stuff
> out of the interaction loop helps a lot.  

The current idle approach for drawing and resizing should work almost
exactly as well as compression as long as the toolkit can handle
events as fast as they come in. As long as you push expensive handling
to the idles, there should be no reason GTK+ can't handle events as
fast as they come in.

  Queue configure event 1    
  Handle event 2
  Handle event 3
  Queue configure event 4 
  Handle queued configures

  Compress configure event 1 and 4 and handle it   
  Handle event 2
  Handle event 3
   
Aren't significantly different performance-wise as long as
the event handling is much faster than handling the 
configures, which I believe is the case for GTK+, or 
at least should be the case.

If you, with your opaque paned resize patch, can get to the 
point where it never catches up with the mouse and
sticks at the old location, I'd be very interested to
know what GTK+ is doing while it is sticking.

Advantages queueing has over compression are:

 - safe, better compression - you frequently get the situation
   where you can't safely compress event type A across event
   (You can't compress exposes across configures safely), but
   queueing allows you to compress as much as makes sense.

 - less complicated code

 - reading forward in the event queue in X is really inefficent.

Expose compression in GTK+-1.2 is O(n^2) in the size of the queue
and has to avoid a lot of meaningful compression because of 
safety issues with intervening events.

> (At some point I introduced > a low priority idle loop to take care of
> calls to g_object_unref - this really made a difference).

I assume you mean by this "to take care of freeing objects"  -
if g_object_unref() is slower than adding something to a queue,
we have a real big problem.

Even queued destruction sound a bit dubious good idea - I keep
resizing my window and the memory usage keeps going up? 
 
> One speedup that is applied in the attached patch was not at the
> drawing code itself, but rather in avoiding a lot of unnecessary
> drawing.  Every resize currently results in a total redraw of
> everything in the widget tree.  This happens because queue_resize()
> calls queue_clear() and again because size_allocate calls queue_draw()
> on all widgets that changed size which invalidates a widget *and all
> of its child widgets*.
> 
> The attached patch only redraws widgets that actually changed size (or
> moved in the case of NO_WINDOW widgets).  The problem with this is
> that windows that are exposed are not redrawn until the server send an
> expose event, and that can take a while.  The patch attends to this
> problem with a cheasy hack that fakes two expose events (yes, it is
> bad, and yes, I know it really doesn't work).

I think some fixes in this area are definitely possible - I
also was noticing the extraneous call to queue_clear() in
the resize-a-toplevel case. But the details here don't 
sound quite right.

(GTK+-2.0 does some neat tricks to make sure that it never has
to wait for responses from the server for expose events it 
knows it is generating, and that's something I would be 
very unhappy to give up.)

Note also that if someone calls queue_resize on a widget 
explicitely, we guarantee that the widget will be redrawn
completely, and we can't change that without breaking 
compatibility. 

> In general, the patch is certainly not production code (don't opaque
> resize the windows two much - it handles every single ConfigureNotify
> event.  And don't look at the changes to gtkwidget.h or at the
> gtk_widget_create_window()).  However, it does demonstrate that
> improvements to interactive performance are possible.
> 
> >    A lot of creation and destruction of graphics contexts and 
> >    setting of clip rectangles could be avoided. This probably
> >    would cut down the client side overhead to very little. 
> 
> That was my observation too.  Caching the GCs instead of creating and
> destroying them made a significant difference.  Also caching of the
> backing store pixmaps made a difference.

I don't think caching backing store pixmaps is a good idea - 
that consumes a lot of server resources we don't need permanently.

(In one test I noticed mozilla keeping around a huge pixmap
even when it was not viewable, and slowing down GTK+-2.0 opaque
resize frame-rates by a factor of 2 because GTK+ couldn't get
pixmaps in video ram, so drawing was unaccelerated.)

Caching GC's is more friendly, and though my tests seemed to
indicate that it would be, at best, a 5% or so speedup in
drawing heavy operations, might make sense.
 
> >  * Most of the other time spent looked pretty spread out - though
> >    once we tackle the obvious stuff, more bottlenecks may
> >    be apparent.
> > 
> > Generally, on my tests on a 400mhz celeron I felt fairly good about
> > the overall performance with debugging off. 
> 
> > Opaque resizing was a little more sluggish than I would like, but
> > other operations seemed pretty snappy, and it would definitely have
> > been useable on a slower machine.
> 
> Opaque window resizing sucks currently because resizes are queued up
> and not handled until the idle loop - essentially, Gtk is throwing
> most of the ConfigureNotify events away instead of handling them.  My
> guess would be that configure compression plus immediate resizes would
> help, but I haven't tried it yet.

I couldn't see this effect at all on my machine, I did see (with my
slower, earlier tests with debugging enabled) that I was triggering a
pathology in the XFree86 scheduler on my machine. Try starting your
server with -dumbSched and this behavior may go away.

Also, --disable-debug makes a big difference. Since we'll be
shipping with debugging turned off unless we can fix the overhead,
it may make sense to do performance with with --disable-debug.

Regards,
                                        Owen





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]