Hi! If there is more than one cpu available, BEAST automatically uses more than one cpu for computing the actual audio output. I tried to figure out if this in fact helps to make things faster. The test: on my system (8 threads maximum), render party monster as quickly as possible, with 1-8 cpus, and measure the real time it takes to complete the job. I also tested different block sizes, from 64 to 1024. Ideal result: using more cpus should speed up computation, ideally using two cpus would make things twice as fast, using three cpus would make things three times faster and so on. Actual result: adding more cpus makes things go slower (all times in seconds, less is better), for all block sizes: # BSE_BLOCK_SIZE=64 1 9.57 2 26.02 3 30.65 4 35.51 5 40.28 6 44.47 7 47.67 8 50.53 # BSE_BLOCK_SIZE=128 1 6.69 2 14.75 3 16.50 4 19.25 5 21.79 6 23.80 7 25.43 8 26.73 # BSE_BLOCK_SIZE=256 1 5.40 2 9.63 3 9.95 4 11.19 5 12.34 6 13.30 7 14.07 8 14.68 # BSE_BLOCK_SIZE=512 1 4.59 2 7.25 3 6.75 4 7.26 5 7.59 6 7.91 7 8.23 8 8.59 # BSE_BLOCK_SIZE=1024 1 4.52 2 7.18 3 6.73 4 7.25 5 7.58 6 7.90 7 8.23 8 8.61 Why is it slower with more cpus? This is difficult to guess from this test alone, but I believe that using two cpus (or more) will lead to more overhead from synchronization. Also if we just use one cpu, this cpu will have all the memory needed to do the synthesis in its cache, whereas if we use more than one cpu, the memory touched by one cpu will have to be transferred to another cpu. For smaller block sizes the slowdown caused by using multiple cpus is more extreme than for large block sizes, but using only one cpu is always the fastest option. If I had to guess what should be done to improve this, I'd say we could group related computations, and use the same cpu for each group. The simplest thing that could possibly work would be to use one cpu per track. This may not be ideal in all situations (user could have only one track), but it should work relatively well for many cases. And it should at least not slow down things like our current strategy. We could also use one cpu per voice for more fine grained grouping that still preserves locality. Cu... Stefan -- Stefan Westerfeld, http://space.twc.de/~stefan
Attachment:
multi-cpu-test.diff
Description: Text Data
Attachment:
multi-cpu-test.sh
Description: Bourne shell script