Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.



On Sat, Aug 29, 2009 at 1:04 AM, Christian Hergert<chris dronelabs com> wrote:
>
>> On Fri, Aug 28, 2009 at 11:49 PM, Christian Hergert<chris dronelabs com>
>>  wrote:
>>>
>>> Hi,
>>>
>>> What you mentioned is good information to start hunting.  Was the CPU
>>> time
>>> related to IO wait at all?  Always get accurate numbers before
>>> performance
>>> tuning.  "Measure, measure, measure" or so the mantra goes.
>>
>> Perhaps a stupid question but what is a good way of profiling io? cpu
>> is easy but i've never done io.
>> In this case my hdd is certainly able to do more then 10 thumbnails
>> per second however i could see a potential issue when someone with a
>> slower hdd and a faster cpu then mine is thumbnailing a lot of images.
>> There the hdd will likely be the bottleneck.
>
> You can do something really crude by reading from /proc/pid/* (man proc for
> more info).  Or you could try using some tools like sysstat, oprofile,
> system-tap, etc.  We really need a generic profiling tool that can do all of
> this stuff from a single interface.  However, at the current time, I've been
> most successful with just writing one off graphing for the specific problem.
>  For example, put in some g_print() lines and grep for those and then graph
> them using your favorite plotter or cairo goodness.
>
>>> Unfortunately, the symptom you see regarding IO will very likely change
>>> under a different processing model.  If the problem is truly CPU bound
>>> then
>>> you will only be starting IO requests after you were done processing.
>>>  This
>>> means valuable time is wasted while waiting for the pages to be loaded
>>> into
>>> the buffers.  The code will just be blocking while this is going on.
>>
>> And how can i test that?
>
> ltrace works for simple non-threaded applications.  Basically you should see
> in the profiling timings that one work item happens sequentially after the
> previous such as (load, process, load, process, ...)
>
> I would hate to provide conjecture about the proper design until we have
> more measurements.  It is a good idea to optimize the single threaded
> approach before the multi-core approach since it would have to be done
> anyway and is likely less complex of a problem before the additional
> threads.
>
>>> What could be done easily is every time an item starts processing it
>>> could
>>> asynchronously begin loading the next image using gio.  This means the
>>> kernel can start paging that file into the vfs cache while you are
>>> processing the image.  This of course would still mean you are limited to
>>> a
>>> single processor doing the scaling.  But if the problem is in fact cpu
>>> bound, that next image will almost always be loaded by time you finish
>>> the
>>> scale meaning you've maximized the processing potential per core.
>>
>> That sounds like a nice way to optimize it for one core. But could
>> there be any optimization possible in my case? since i have 100% cpu
>> usage for one core with just the benchmark.
>
> You can't properly optimize for the multi-core scenario until the
> single-core scenario is fixed.
>
>>> To support multi-core, like it sounds like you want, a queue could be
>>> used
>>> to store the upcoming work items.  A worker per core, for example, can
>>> get
>>> their next file from that queue.  FWIW, I wrote a library, iris[1], built
>>> specifically for doing work like this while efficiently using threads
>>> with
>>> minimum lock-contention.  It would allow for scaling up threads to the
>>> number of cores and back down when they are no longer needed.
>>>
>> That sounds very interesting.
>> Just one question about the queue. Would it be better to thread the
>> application (nautilus) or the library (glib)? If your answer is the
>> library then the queue has to be passed from nautilus to glib. I would
>> say glib because all application have benefit from it without
>> adjusting there code.
>
> I haven't looked at this code in detail yet, so I cannot confirm or deny.
>  My initial assumption would be that the thumb-nailing API (again, I have no
> experience with it yet) should be restructured around an asynchronous design
> (begin/end methods) and the synchronous implementation built around that.
>  And of course, nobody should use the synchronous version unless they
> *really* have a reason to.
>
> FWIW, I would be willing to help hack on this, but I'm swamped for at least
> the next few weeks.
>
> -- Christian
>

I guess the next thing for me would be to get more accurate benchmarks.
I right now have the benchmarks in timings (so, how long does making a
pixbuf from an image take, how long to do the scaling (surprisingly
short!) and how long to save it) but i guess i need to expand that a
bit with io timings as well. I will just give it a try.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]