Performance using multiple cpus

From: Stefan Westerfeld <stefan space twc de>
To: Beast Liste <beast gnome org>
Cc: Tim Janik <timj gnu org>
Subject: Performance using multiple cpus
Date: Wed, 25 Jul 2018 18:17:05 +0200

   Hi!

If there is more than one cpu available, BEAST automatically uses more than one
cpu for computing the actual audio output. I tried to figure out if this in
fact helps to make things faster.

The test: on my system (8 threads maximum), render party monster as quickly as
possible, with 1-8 cpus, and measure the real time it takes to complete the
job. I also tested different block sizes, from 64 to 1024.

Ideal result: using more cpus should speed up computation, ideally using two
cpus would make things twice as fast, using three cpus would make things three
times faster and so on.

Actual result: adding more cpus makes things go slower (all times in seconds,
less is better), for all block sizes:

# BSE_BLOCK_SIZE=64
1 9.57
2 26.02
3 30.65
4 35.51
5 40.28
6 44.47
7 47.67
8 50.53
# BSE_BLOCK_SIZE=128
1 6.69
2 14.75
3 16.50
4 19.25
5 21.79
6 23.80
7 25.43
8 26.73
# BSE_BLOCK_SIZE=256
1 5.40
2 9.63
3 9.95
4 11.19
5 12.34
6 13.30
7 14.07
8 14.68
# BSE_BLOCK_SIZE=512
1 4.59
2 7.25
3 6.75
4 7.26
5 7.59
6 7.91
7 8.23
8 8.59
# BSE_BLOCK_SIZE=1024
1 4.52
2 7.18
3 6.73
4 7.25
5 7.58
6 7.90
7 8.23
8 8.61

Why is it slower with more cpus? This is difficult to guess from this test
alone, but I believe that using two cpus (or more) will lead to more overhead
from synchronization. Also if we just use one cpu, this cpu will have all the
memory needed to do the synthesis in its cache, whereas if we use more than one
cpu, the memory touched by one cpu will have to be transferred to another cpu.

For smaller block sizes the slowdown caused by using multiple cpus is more
extreme than for large block sizes, but using only one cpu is always the
fastest option.

If I had to guess what should be done to improve this, I'd say we could group
related computations, and use the same cpu for each group. The simplest thing
that could possibly work would be to use one cpu per track. This may not be
ideal in all situations (user could have only one track), but it should work
relatively well for many cases. And it should at least not slow down things
like our current strategy.

We could also use one cpu per voice for more fine grained grouping that
still preserves locality.

   Cu... Stefan
-- 
Stefan Westerfeld, http://space.twc.de/~stefan

Attachment: multi-cpu-test.diff
Description: Text Data

Attachment: multi-cpu-test.sh
Description: Bourne shell script

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]