Re: [u-a-dev] gnome-speech, and audio output, moving forward.
- From: Bill Haneman <gnome billhaneman ie>
- To: Willie Walker <William Walker Sun COM>
- Cc: Ubuntu Accessibility development discussions <ubuntu-accessibility-devel lists ubuntu com>, Orca screen reader developers <orca-list gnome org>, Gnome Accessibility List <gnome-accessibility-list gnome org>, GNOME Accessibility Developers <gnome-accessibility-devel gnome org>
- Subject: Re: [u-a-dev] gnome-speech, and audio output, moving forward.
- Date: Tue, 18 Sep 2007 18:57:43 +0100
HI Luke, Will, and all:
For what it's worth, I agree with the bulk of what's been said already.
It will be fantastic to get some more sanity in the speech/audio arena.
As for the first item Will identifies as a 'proposal', namely relying on
the TTS engine to return digital sound samples rather than doing the
output itself, I think this is a great idea but I would just suggest
looking carefully at the potential latency issues there.
Also, key requirements of any speech/audio integration API(s) include
the ability to know, at least roughly, two pieces of information: what
is currently in the output queue and approximately how close to
completion it is, and the ability to "sync up" and actually know, at
some point in time, exactly what has been spoken. These are subtly
different, in that the second one requires information about completion
as opposed to "approximate progress". I think the second one implies at
least some degree of interrupt capability in the audio output stream as
well. Use cases include audio/voice synchronization, braille
synchronization, and (perhaps more importantly), the ability to reliably
break an utterance into pieces and restart output at a known point.
As for moving away from Bonobo Activation (note; not the same as
"Bonobo" in the broad sense), I think this makes sense. I also think
moving away from the use of CORBA for gnome-speech IPC is a good idea;
the speech APIs seem like excellent candidates for dBUS migration and we
have very few, if any, platform bincompat guarantees to deal with as
long as the consumers of the speech interfaces are kept in the loop.
Best regards,
Bill
Willie Walker wrote:
Hi Luke:
First of all, I say "Hear, hear!" The audio windmill is something
people have been charging at for a long time. Users who rely upon
speech synthesis working correctly and integrating well with the rest of
their environment are among those that need reliable audio support most
critically.
I see two main proposals in the below:
1) Modify gnome-speech drivers to obtain samples from their
speech engines and then handle the audio playing themselves.
This is different from the current state where the
gnome-speech driver expects the speech engine to do all the
audio management.
This sounds like an interesting proposal. I can tell you
for sure, though, that the current gnome-speech maintainer
has his hands full with other things (e.g., leading Orca).
So, the work would need to come from the community.
2) As part of #1, move to an API that is pervasive on the system.
The proposed API is GStreamer.
Moving to a pervasive API is definitely very interesting, and
I would encourage looking at a large set of platforms: Linux
to Solaris, GNOME to KDE, etc. An API of recent interest is
Pulse Audio (https://wiki.ubuntu.com/PulseAudio), which might
be worth watching. I believe there might be many significant
improvements in the works for OSS as well.
In the bigger scheme of things, however, there is discussion of
deprecating Bonobo. Bonobo is used by gnome-speech to activate
gnome-speech drivers. As such, one might consider alternatives to
gnome-speech. For example, SpeechDispatcher
(http://www.freebsoft.org/speechd) or TTSAPI
(http://www.freebsoft.org/tts-api-provider) might be something to
consider. They are not without issue, however. Some of the issues
include cumbersome configuration, reliability, etc. I believe that's
all solvable with work. The harder issue in my mind is that they will
introduce an external dependency for things like GNOME, and I've also
not looked at what their licensing scheme is.
Will
[
Date Prev][Date Next] [
Thread Prev][Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]