Re: Fixing gnome-speech

From: Bill Haneman <Bill Haneman Sun COM>
To: Hynek Hanke <hanke brailcom org>
Cc: Enrico Zini <enrico enricozini org>, gnome-accessibility-list gnome org
Subject: Re: Fixing gnome-speech
Date: Wed, 28 Jun 2006 17:36:40 +0100

Hi Hynek, All:

I'm not sure I agree that speech engines should not do their own audio
output.  While I think you have identified some real problems with that
approach, it's not clear that the ".wav file" approach has a low enough
latency.  If tests show that latency is not a problem, then passing the
synthesized audio bits to the driver for processing (perhaps via
multiplexing/mixing in most situations, or for pre-emptive audio in
others) does seem to have advantages.

Hynek, I think you've also identified a good reason for one of the "many
layers" in our architecture... we don't really want a bug in the speech
engine to crash our TTS service.  Using a C API, even when licenses
permit, usually means sharing process space with the driver, and for
many drivers the code is closed-source, making diagnosis and recovery
very difficult indeed.  In such a situation we probably need to
implement the process-space separation in our own TTS architecture, so
that we can restart the engine when things go badly wrong.

regards

Bill

On Wed, 2006-06-28 at 16:11, Hynek Hanke wrote:
> > Festival is free software, so this is of course fixable.  Having looked
> > at the code, it's simple code and it wouldn't break if it'd be stretched
> > a bit.  But that's not improving a driver: that's improving festival (if
> > the authors allow) and then having to depend on a very new version of
> > it.
> 
> Hi Enrico,
> 
> also the problem with speech engines doing their own audio output
> (apart from what you said about Festival) is that this audio output
> needs to be configured at several places if several engines are used,
> many places where code needs to be updated if a new audio technology
> comes etc.
> 
> > [...]
> > So the proper way to implement a festival driver seems to me to use the
> > text-to-wave function and then do a proper handling of playing the
> > resulting wave, hopefully using the audio playing technology that's
> > trendy at the moment.
> 
> Yes, I agree. Actually this is what both Speech Dispatcher and KTTSD are
> doing and I think I've heard Gnome Speech would also like to go this way
> in the future.
> 
> > I looked into esd without understanding if it is
> > trendy anymore, and I look at gstreamer without understanding if it
> > isn't a bit too complicated as a default way to play a waveform.
> 
> This is fairly complicated. I've investigated into possibilities for
> audio output and I've ended up sumarizing our requirements if such a
> technology should eventually come in the future and writing my own
> small library for output to OSS, Alsa and NAS. Please see
> http://lists.freedesktop.org/archives/accessibility/2005-April/000049.html
> and feel free to have comments. One of the problems is the latency we
> need. That ruled out both ESD and Gstreamer at that time, I'm not sure
> what is the state now with Gstreamer. Another thing is that if we are
> aiming for a desktop independent speech technology, we need desktop
> independent audio output.
> 
> > I don't know much about the APIs of other speech engines.  If they all
> > had a text-to-wave function
> 
> Most of the engines do. Some don't, but this is their drawback (what if
> I want to have the audio synthesized and save to a file?). As you said,
> it is very desirable to retrieve the audio for those engines that
> support it.
> 
> > , then it can be a wise move to implement a
> > proper audio scheduler to share among TTS drivers, which could then
> > (reliably) support proper integration with the audio system of the day,
> > progress report, interruption and whatever else is needed.  This would
> > ensure that all TTS drivers would have the same (hopefully high) level
> > of reliability wrt audio output.
> 
> Yes, that is mine dream too! Would you be wiling to help with this?
> I think we would first have to see what is new and consider the options
> again.
> 
> > > Now, one of the big problems is that Festival doesn't offer proper logs.
> > > It would often refuse connection for a stupid typo in the configuration
> > > file and not give any clue to the user. This is something which should
> > > be fixed.
> > This can probably be fixed: festival can be told not to load any config
> > file
> 
> This is not really useful. Configuration is really needed.
> 
> > , and log can be implemented adding a couple of printfs before calls
> > to the C++ API. 
> 
> That is the log from the side of the speech api provider (Gnome Speech
> etc.). This already exists in Dispatcher and as I said is automatic from
> a TCP API. I was talking about logs on the side of Festival.
> 
> You will never be able to discover why a particular voice was not
> loaded/doesn't work, why a sound icon is not playing, what is the typo
> in your configuration files, why is it not finding a module (wrong path)
> and such from just talking to Festival via its API (be it C++ or TCP).
> 
> Currently the only way for the users to fix such problems is to run
> Festival from command line and hope it will write some cryptic message
> to stderr. Then what is left are guesses, past experiences with problems
> and black magic. We must be able to diagnose problems.
> 
> >> [from my earlier post]
> >> Now, one of the big problems is that Festival doesn't offer proper
> >> logs.
> 
> You say you find the Festival C code clear and modifications not
> difficult. If this could be fixed, that would be superb. I don't think
> Alan would object to include the patch. And it would not introduce
> a dependency for us. I don't know however how soon it could get
> into some official release. But I think it is worth looking into.
> 
> >  And something like a TTS driver which becomes the main
> > form of access to the computer should be designed to properly restart in
> > case of segfaults in its own code, be it festival or whatever else.
> 
> Yes, this is something we tried in Speech Dispatcher, but it doesn't
> always work. We should get this part right in TTS API. The objection
> that with the TCP API it is easier to see what part is crashing, after
> which commands exactly, however remains.
> 
> With regards,
> Hynek Hanke
> 
> 
> _______________________________________________
> gnome-accessibility-list mailing list
> gnome-accessibility-list gnome org
> http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list

References:
- Fixing gnome-speech
  - From: Enrico Zini
- Re: Fixing gnome-speech
  - From: Tomas Cerha
- Re: Fixing gnome-speech
  - From: Olivier BERT
- Re: Fixing gnome-speech
  - From: Bill Haneman
- Re: Fixing gnome-speech
  - From: Willem van der Walt
- Re: Fixing gnome-speech
  - From: Hynek Hanke
- Re: Fixing gnome-speech
  - From: Enrico Zini
- Re: Fixing gnome-speech
  - From: Hynek Hanke

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]