Re: Fixing gnome-speech



Bill Haneman píše v St 28. 06. 2006 v 17:36 +0100:
> I'm not sure I agree that speech engines should not do their own audio
> output.  While I think you have identified some real problems with that
> approach, it's not clear that the ".wav file" approach has a low enough
> latency.  If tests show that latency is not a problem, then passing the
> synthesized audio bits to the driver for processing (perhaps via
> multiplexing/mixing in most situations, or for pre-emptive audio in
> others) does seem to have advantages.

Hello Bill,

I did some simple tests on socket performance some months ago. I only
tried on my home system (AMD Athlon 1800, kernels 2.6.8 and 2.6.16).
I did the test between two single thread testing processes on the same
machine whose only task was to setup the communication and write/read
data through the given port. 

The speed of data transfer very much depends on the size of the blocks
read and written. The best I could get was with 4KB blocks. Incresing
block size past this value gave little imporovements. The speed of data
transfer was about 10ms/MB (10%) (or 100MB/s if you prefer). I found
latency to be negligible by three orders of magnitude for 1MB of data
(it is somewhere about 10us). I also tried to raise the priority of the
two processes, but this only improved the results by about 10%. A very
simple graph, just to make a quick idea, is here:
http://www.freebsoft.org/~hanke/socket-times.ps

I'm not any expert on IPC communication, so it is well possible that
there is something more I could do to improve the results.

Assuming 44kHz, 16 bits per sample, uncompressed data, it takes about
0,5MB to store 5 seconds of speech corresponding to transfer time of
about 5ms under my setup.

Gary Cramblitt pointed to me another simple test which seems to confirm
my results and also gives results for pipes:
http://rikkus.info/sysv-ipc-vs-pipes-vs-unix-sockets.html#conclusion


I'd say this speaks in favor of transfering audio from the synthesizers
to higher levels where it can be played more conveniently. If audio is
processed asynchronously (played as soon as some data are available),
these delays introduced by socket communications do not seem to be a
significant problem. Even if the audio started to play as late as 5
seconds of it are available (about one full sentence), the time 5ms
is still very well acceptable.

Also we must take into account that there are not many voices currently
which provide the quality of speech I've used and that hardware is
getting faster. On the other hand, I neglected the effect of other
threads running inside the same process.

With regards,
Hynek Hanke




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]