Re: [g-a-devel] [u-a-dev] gnome-speech, and audio output, moving forward.

From: Bill Haneman <gnome billhaneman ie>
To: Willie Walker <William Walker Sun COM>
Cc: Ubuntu Accessibility development discussions <ubuntu-accessibility-devel lists ubuntu com>, Orca screen reader developers <orca-list gnome org>, Gnome Accessibility List <gnome-accessibility-list gnome org>, GNOME Accessibility Developers <gnome-accessibility-devel gnome org>
Subject: Re: [g-a-devel] [u-a-dev] gnome-speech, and audio output, moving forward.
Date: Tue, 18 Sep 2007 18:57:43 +0100

HI Luke, Will, and all:

For what it's worth, I agree with the bulk of what's been said already.It will be fantastic to get some more sanity in the speech/audio arena.

As for the first item Will identifies as a 'proposal', namely relying onthe TTS engine to return digital sound samples rather than doing theoutput itself, I think this is a great idea but I would just suggestlooking carefully at the potential latency issues there.

Also, key requirements of any speech/audio integration API(s) includethe ability to know, at least roughly, two pieces of information: whatis currently in the output queue and approximately how close tocompletion it is, and the ability to "sync up" and actually know, atsome point in time, exactly what has been spoken. These are subtlydifferent, in that the second one requires information about completionas opposed to "approximate progress". I think the second one implies atleast some degree of interrupt capability in the audio output stream aswell. Use cases include audio/voice synchronization, braillesynchronization, and (perhaps more importantly), the ability to reliablybreak an utterance into pieces and restart output at a known point.

As for moving away from Bonobo Activation (note; not the same as"Bonobo" in the broad sense), I think this makes sense. I also thinkmoving away from the use of CORBA for gnome-speech IPC is a good idea;the speech APIs seem like excellent candidates for dBUS migration and wehave very few, if any, platform bincompat guarantees to deal with aslong as the consumers of the speech interfaces are kept in the loop.


Best regards,

Bill

Willie Walker wrote:

Hi Luke:

First of all, I say "Hear, hear!"  The audio windmill is something
people have been charging at for a long time.  Users who rely upon
speech synthesis working correctly and integrating well with the rest of
their environment are among those that need reliable audio support most
critically.

I see two main proposals in the below:

1) Modify gnome-speech drivers to obtain samples from their
   speech engines and then handle the audio playing themselves.
   This is different from the current state where the
   gnome-speech driver expects the speech engine to do all the
   audio management.

   This sounds like an interesting proposal.  I can tell you
   for sure, though, that the current gnome-speech maintainer
   has his hands full with other things (e.g., leading Orca).
   So, the work would need to come from the community.

2) As part of #1, move to an API that is pervasive on the system.
   The proposed API is GStreamer.

   Moving to a pervasive API is definitely very interesting, and
   I would encourage looking at a large set of platforms:  Linux

to Solaris, GNOME to KDE, etc. An API of recent interest isPulse Audio (https://wiki.ubuntu.com/PulseAudio), which might

   be worth watching.  I believe there might be many significant
   improvements in the works for OSS as well.

In the bigger scheme of things, however, there is discussion of
deprecating Bonobo.  Bonobo is used by gnome-speech to activate
gnome-speech drivers.  As such, one might consider alternatives to
gnome-speech.  For example, SpeechDispatcher
(http://www.freebsoft.org/speechd) or TTSAPI
(http://www.freebsoft.org/tts-api-provider) might be something to
consider.  They are not without issue, however.  Some of the issues
include cumbersome configuration, reliability, etc.  I believe that's
all solvable with work.  The harder issue in my mind is that they will
introduce an external dependency for things like GNOME, and I've also
not looked at what their licensing scheme is.

Will

References:
- [g-a-devel] gnome-speech, and audio output, moving forward.
  - From: Luke Yelavich
- Re: [g-a-devel] [u-a-dev] gnome-speech, and audio output, moving forward.
  - From: Willie Walker

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]