[g-a-devel] gnome-speech festival driver problems explained



(sort of)

Hi.

After studying the (uncommented) festival driver source code, I got to
some conclusions:

 - the Italian speech synthesis doesn't like speaking nothingness:

   $ festival
   Festival Speech Synthesis System 1.4.3:release Jan 2003
   Copyright (C) University of Edinburgh, 1996-2003. All rights reserved.
   For details type `(festival_warranty)'
   festival> (SayText "")
   #<Utterance 0xa72bbcc8>
   festival> (SayText " ")
   #<Utterance 0xa72d3d08>
   festival> (voice_lp_diphone)
   lp_diphone
   festival> (SayText " ")
   SIOD ERROR: wrong type of argument to get_c_val
   festival> (SayText "")
   SIOD ERROR: wrong type of argument to get_c_val
   festival>

 - the festival driver sends LOTS of nothingness:

   festivalsynthesisdriver.c:945:
   festival_synthesis_driver_say_raw (d, "(SayText \"");
   festival_synthesis_driver_say_raw (d, escaped_string);
   festival_synthesis_driver_say_raw (d, "\")\r\n");

   festival_synthesis_driver_say_raw (d, "(SayText \"\")\r\n");

   basically, after every string that is sent, and empty string is sent.
   Why?  No idea.  A comment explaining why would really have helped.

 - festival_synthesis_driver_is_speaking is broken: when festival has
   only one wave in the audio spooler, it says that the queue is empty.
   I enabled debugging info in the driver, and every single time it
   queried the audio queue, it's always been reported empty, even if it
   was actually speaking.

 - I tried simplifying the interaction a bit: I got rid of all the
   (useless) audio queue enquiries and I simplified the way text is
   sent, sending all in a single bunch:

   escaped_string = g_malloc (strlen (text)*2+1+20);
   strcpy(escaped_string, "(SayText \"");
   ptr1 = text;
   ptr2 = escaped_string + strlen(escaped_string);
   while (ptr1 && *ptr1)
   {
           if (*ptr1 == '\"' || *ptr1 == '\\')
               *ptr2++ = '\\';
           *ptr2++ = *ptr1++;
   }
   *ptr2++ = '"';
   *ptr2++ = ')';
   *ptr2++ = '\r';
   *ptr2++ = '\n';
   *ptr2 = 0;
   [...]
   festival_synthesis_driver_say_raw (d, escaped_string);

   oh, and I also escaped \ characters, which weren't escaped before
   (security risk?  I didn't investigate). 

The result was that things were a little bit more stable, but not much.
Sound would stop (reliably reproducible by hitting ALT+F1 to go to the
panel menu), but switching window with ALT+Tab would usually bring it
back.  However, sometimes gnopernicus wouldn't read its own menu entries
unless one plays with ALT+Tab a bit more.  So, it seems that window
switching has a therapeutic influence here.

When sound stops, what happens is that gnopernicus doesn't send data to
the speech driver at all.  I suspect that what happens is that the
driver status reporting is confusing gnopernicus somehow.

I tried to rewrite the festival driver using the festival C++ API
instead of the pipeline to a festival server, but got stuck with the
audio output: the festival audio scheduler has unreliable status report,
and I'd have to implement a queryable and interruptible audio scheduler,
which is something I'd spend days doing because I'm not familiar with
glib even loops and esd/gstreamer programming.

So, problems identified so far:

 - italian voices hate empty/blank strings
 - the driver sends lots of empty strings.  The errors should be ignored
   by the server, though.
 - is_speaking report is unreliable
 - gnopernicus tends to stop sending data to the festival driver,
   possibly because of getting confused by the driver status reports.
   Switching windows seems to shake gnopernicus back to normal.
 - the way SayText strings are constructed should be improved, see the
   code snipped above.

I'm sorry I didn't pinpoint the problems better, but the festival
driver's code is full of enqueuing callbacks into lists and glib event
queues, and is hard to follow for me.  I would have liked to write the
author and work on it together, but it's basically anonymous (it just
says "Sun Microsystem").

I tried with orca as well, which seems to be more reliable in sending
data to the speech driver, until at some point it caused all my desktop
to hung up to the point of needing a CTRL+ALT+Backspace to restart X.

I'll now try to work out how to make the festival voices work fine with
empty/blank strings.  I would be happy if someone can tell me what door
to knock regarding the gnome-speech festival driver internals.  


Ciao,

Enrico

-- 
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico debian org>

Attachment: signature.asc
Description: Digital signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]