Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca

From: Willie Walker <William Walker Sun COM>
To: orca-list gnome org
Subject: Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
Date: Thu, 10 Apr 2008 10:27:39 -0400

Thanks Milan.

So...I have a dilemma. The imperfect gnome-speech-based solution inOrca exists and generally works. The emergence of PulseAudio is alsohelping to address one of the major issues (audio device contention).While not perfect, the current solution provides emulation for missingTTS features in a very expedient and controllable means to give userswhat they want *today*. We can also quickly make adjustments tognome-speech to provide support for features enabled by the speechengine (e.g., verbalized punctuation, capitalization, etc.) and we canquickly adjust Orca to pass things on to the speech engine rather thanemulate them at the Orca layer. In addition, all of this isencapsulated in GNOME, making it easy to manage from the release andpackaging standpoints.

What I'm getting from Brailcom is a proposed solution that, whenimplemented, seems like it could address a number of problems. It willeliminate the need for Orca to do emulation of missing features. Itwill provide features that are on the Orca requirements list, but whichare not currently implemented (e.g., verbalized capitalization, audioicons, etc.). It will also act as a system service that many apps canuse, which will run on a large number of platforms, and which does notrequire a desktop to be running.

That's great. As a result of this promise, I permitted the speechdispatcher code into Orca as a means to provide a proving ground. It isstill interesting to me, but it does not come without issue: it isincomplete, it is not an accepted dependency for GNOME, dogmatic pursuitof purism, etc.

What I didn't expect was inflexible opposition from Brailcom to thepractical solutions provided by Orca, such as the notion of the userspecifying pronunciation definitions at a higher level. Until theunsophisticated user has a convenient mechanism for doing things such astweaking pronunciations, Orca is going to provide a means to do this.Until verbalized punctuation is guaranteed to be supported by the lowerlayers, Orca is going to provide a means to emulate this. As the Orcaproject lead, this is my decision, and it is based upon userrequirements. I hear Brailcom loud and clear - you don't like this.Please, let's agree to disagree and let's focus on SpeechDispatcher.

Until it is complete, stable, and we're sure it helps us meet the userrequirements, I cannot make SpeechDispatcher a supported part of Orca.We have at least gotten to the point where we've identified the API thatwill be exposed to Orca, which is the speechd Python bindings. With theexception of some things, it sees like a viable API, though I need todig into it a little deeper.

Assuming the API is workable as is, do you have an estimate for theamount of work (cost and timeframe) needed to complete theimplementation and provide complete support for at least eSpeak,Festival, Cepstral, DECtalk, and IBMTTS? What is your support model andrelease schedule going to be once the implementation is done? What isyour community model going to be (e.g., can others outside Brailcomcontribute patches/enhancements to SpeechDispatcher)?


Will

Milan Zamazal wrote:

"WW" == Willie Walker <William Walker Sun COM> writes:


    WW> One of the questions I have right now is the ability for a
    WW> client to programmatically configure various things in
    WW> SpeechDispatcher, such as pronunciations for words.  In looking
    WW> at the existing API, I'm not sure I see a way to do this.  Nor
    WW> am I sure if this is something that a speech dispatcher user
    WW> needs to do on an engine-by-engine basis or if there is a
    WW> pronunciation dictionary that speech dispatcher provides for all
    WW> output modules to use.

SSIP supports SSML, so in theory it is possible to pass pronunciation
etc. using its means.  In practice SSML is probably only little
supported, if at all, in most TTS systems so it wouldn't work.  But it's
not a fault of Speech Dispatcher, it's just missing feature of something
-- preferably it should be present in TTS systems, or at least in some
frontend to them.  I think new speech dispatcher TTS driver library
should provide means for parsing SSML and the drivers should handle it
some way if the corresponding TTS system can't.

Just one remark to pronunciation as a typical representative of some
problems: It is important to distinguish between special pronunciation
and regular pronunciation.  In the first case, e.g. when some word
should be pronounced in a non-regular way for some reason, it's
completely valid to pass pronunciation information from the client to
the engine.  But in the latter case, e.g. when some engine mispronounces
some words, the client should no way attempt to "fix" it, this would
only make the situation worse.  The proper solution is to fix
pronunciation in the engine.  TTS drivers may attempt to work around it
when fixing the engine is not possible, but in this particular case it
should be considered as an extreme approach, to be applied only when
really nothing else works.  As for common pronunciation dictionaries I
doubt it can be done on a common level because different synthesizers
use different phoneme sets and their representation.

On the other hand it seems reasonable to handle some other features such
as signalling capitalization, punctuation, sound icons, etc. on a common
basis in the TTS drivers.  But beware, this may require language
dependent text analysis and may interfere with TTS processing of some
synthesizers.  So it shouldn't be applied universally and each of the
TTS drivers must have free choice how to handle such things -- whether
to let it on the synthesizer or whether (when the synthesizer is unable
to handle the requirements) to use TTS driver means.  When one thinks
about it more it becomes clear that it would be very useful to have just
a single common text analysis frontend to free speech synthesizers and
to make different synthesizers start their own work only after the
phonetic transcription of the input is available.  But this is another
issue.

[...]

    WW> I wasn't sure how to interpret "No", but my interpretation was
    WW> that emulation was NOT done, and this seems to match my
    WW> interpretation of "Right" above.  But, maybe "No" meant
    WW> something like "No, speech dispatcher itself doesn't do
    WW> emulation, but that can be done at a lower layer in the speech
    WW> dispatcher internals."  If that's the case, from the client's
    WW> point of view, it's still speech dispatcher, and the client can
    WW> now depend upon speech dispatcher to do the emulation.

Yes, I think there is some terminology confusion here.  The new Speech
Dispatcher contains TTS API and drivers as its part, while the current
implementation is focused basically just on message dispatching.  I'd
suggest to name the parts explicitly in discussion (dispatching,
interface, output modules, TTS API, TTS drivers, configuration) to avoid
confusion.

In my opinion it's basically as you write above.  Clients nor any of the
Speech Dispatcher parts with the exception of TTS drivers should care
about their emulation of missing TTS features.  They should perform
their own jobs and should rely on TTS systems and their TTS drivers to
ensure proper speech output.  Presence of common TTS API should
guarantee the emulation work will be done only once in a single place
behind the TTS API, i.e. in speech synthesizers (preferably) or in the
TTS drivers (when the TTS system way is not possible).  Possible
creation of common TTS processing frontend to speech synthesizers
mentioned above comes to play here, but considering current state of
things it would be premature to get distracted by this idea too much.

    WW> Let me try to rephrase this question: from Orca's point of view,
    WW> if text is handed off to speech dispatcher via speechd, will we
                                                       ^^^^^^^
                                                       SSIP?
    WW> be guaranteed that the appropriate emulation will be provided
    WW> for features that are not supported by a speech engine?  For
    WW> example, if an audio cue is desired for capital letters, will
    WW> the Orca user be guaranteed that something in Speech Dispatcher
    WW> will play an audio icon for capitalization if the engine doesn't
    WW> support this directly?  Or, if verbalized punctuation is not
    WW> supported by the engine, will the Orca user be guaranteed that
    WW> something in Speech Dispatcher will emulate the support if the
    WW> engine does not support this directly?

My simple answer is Yes (the detailed answer is above).

I'm not sure I'd agree with everyone here on particular details, but I
hope the basic ideas and explanations outlined above might be acceptable
to all members of the Speech Dispatcher team, as well as to Orca and
other client development teams.

Thanks for your questions helping to clarify things!

Regards,

Milan Zamazal
_______________________________________________
Orca-list mailing list
Orca-list gnome org
http://mail.gnome.org/mailman/listinfo/orca-list
Visit http://live.gnome.org/Orca for more information on Orca

Follow-Ups:
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Milan Zamazal
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Steve Holmes
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Halim Sahin
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Luke Yelavich
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Jan Buchal

References:
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Jan Buchal
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Willie Walker
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Jan Buchal
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Willie Walker
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Jan Buchal
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Halim Sahin
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Jason White
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Halim Sahin
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Milan Zamazal
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Halim Sahin
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Jason White
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Willie Walker
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Milan Zamazal
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Willie Walker
- Re: [orca-list] Punctuation, capital letters, exchange of characters and strings, generally error in the design of Orca
  - From: Milan Zamazal

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]