Re: Fixing gnome-speech
- From: Bill Haneman <Bill Haneman Sun COM>
- To: Hynek Hanke <hanke brailcom org>
- Cc: Enrico Zini <enrico enricozini org>, gnome-accessibility-list gnome org
- Subject: Re: Fixing gnome-speech
- Date: Wed, 28 Jun 2006 17:36:40 +0100
Hi Hynek, All:
I'm not sure I agree that speech engines should not do their own audio
output. While I think you have identified some real problems with that
approach, it's not clear that the ".wav file" approach has a low enough
latency. If tests show that latency is not a problem, then passing the
synthesized audio bits to the driver for processing (perhaps via
multiplexing/mixing in most situations, or for pre-emptive audio in
others) does seem to have advantages.
Hynek, I think you've also identified a good reason for one of the "many
layers" in our architecture... we don't really want a bug in the speech
engine to crash our TTS service. Using a C API, even when licenses
permit, usually means sharing process space with the driver, and for
many drivers the code is closed-source, making diagnosis and recovery
very difficult indeed. In such a situation we probably need to
implement the process-space separation in our own TTS architecture, so
that we can restart the engine when things go badly wrong.
regards
Bill
On Wed, 2006-06-28 at 16:11, Hynek Hanke wrote:
> > Festival is free software, so this is of course fixable. Having looked
> > at the code, it's simple code and it wouldn't break if it'd be stretched
> > a bit. But that's not improving a driver: that's improving festival (if
> > the authors allow) and then having to depend on a very new version of
> > it.
>
> Hi Enrico,
>
> also the problem with speech engines doing their own audio output
> (apart from what you said about Festival) is that this audio output
> needs to be configured at several places if several engines are used,
> many places where code needs to be updated if a new audio technology
> comes etc.
>
> > [...]
> > So the proper way to implement a festival driver seems to me to use the
> > text-to-wave function and then do a proper handling of playing the
> > resulting wave, hopefully using the audio playing technology that's
> > trendy at the moment.
>
> Yes, I agree. Actually this is what both Speech Dispatcher and KTTSD are
> doing and I think I've heard Gnome Speech would also like to go this way
> in the future.
>
> > I looked into esd without understanding if it is
> > trendy anymore, and I look at gstreamer without understanding if it
> > isn't a bit too complicated as a default way to play a waveform.
>
> This is fairly complicated. I've investigated into possibilities for
> audio output and I've ended up sumarizing our requirements if such a
> technology should eventually come in the future and writing my own
> small library for output to OSS, Alsa and NAS. Please see
> http://lists.freedesktop.org/archives/accessibility/2005-April/000049.html
> and feel free to have comments. One of the problems is the latency we
> need. That ruled out both ESD and Gstreamer at that time, I'm not sure
> what is the state now with Gstreamer. Another thing is that if we are
> aiming for a desktop independent speech technology, we need desktop
> independent audio output.
>
> > I don't know much about the APIs of other speech engines. If they all
> > had a text-to-wave function
>
> Most of the engines do. Some don't, but this is their drawback (what if
> I want to have the audio synthesized and save to a file?). As you said,
> it is very desirable to retrieve the audio for those engines that
> support it.
>
> > , then it can be a wise move to implement a
> > proper audio scheduler to share among TTS drivers, which could then
> > (reliably) support proper integration with the audio system of the day,
> > progress report, interruption and whatever else is needed. This would
> > ensure that all TTS drivers would have the same (hopefully high) level
> > of reliability wrt audio output.
>
> Yes, that is mine dream too! Would you be wiling to help with this?
> I think we would first have to see what is new and consider the options
> again.
>
> > > Now, one of the big problems is that Festival doesn't offer proper logs.
> > > It would often refuse connection for a stupid typo in the configuration
> > > file and not give any clue to the user. This is something which should
> > > be fixed.
> > This can probably be fixed: festival can be told not to load any config
> > file
>
> This is not really useful. Configuration is really needed.
>
> > , and log can be implemented adding a couple of printfs before calls
> > to the C++ API.
>
> That is the log from the side of the speech api provider (Gnome Speech
> etc.). This already exists in Dispatcher and as I said is automatic from
> a TCP API. I was talking about logs on the side of Festival.
>
> You will never be able to discover why a particular voice was not
> loaded/doesn't work, why a sound icon is not playing, what is the typo
> in your configuration files, why is it not finding a module (wrong path)
> and such from just talking to Festival via its API (be it C++ or TCP).
>
> Currently the only way for the users to fix such problems is to run
> Festival from command line and hope it will write some cryptic message
> to stderr. Then what is left are guesses, past experiences with problems
> and black magic. We must be able to diagnose problems.
>
> >> [from my earlier post]
> >> Now, one of the big problems is that Festival doesn't offer proper
> >> logs.
>
> You say you find the Festival C code clear and modifications not
> difficult. If this could be fixed, that would be superb. I don't think
> Alan would object to include the patch. And it would not introduce
> a dependency for us. I don't know however how soon it could get
> into some official release. But I think it is worth looking into.
>
> > And something like a TTS driver which becomes the main
> > form of access to the computer should be designed to properly restart in
> > case of segfaults in its own code, be it festival or whatever else.
>
> Yes, this is something we tried in Speech Dispatcher, but it doesn't
> always work. We should get this part right in TTS API. The objection
> that with the TCP API it is easier to see what part is crashing, after
> which commands exactly, however remains.
>
> With regards,
> Hynek Hanke
>
>
> _______________________________________________
> gnome-accessibility-list mailing list
> gnome-accessibility-list gnome org
> http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]