Re: Festival vs. F-lite disk space requirements
- From: Willie Walker <William Walker Sun COM>
- To: Gnome Accessibility <gnome-accessibility-list gnome org>
- Cc: Bill Haneman <Bill Haneman Sun COM>
- Subject: Re: Festival vs. F-lite disk space requirements
- Date: Fri, 24 Feb 2006 09:24:08 -0500
Hi:
If one looks at the Venn diagram of Festival/Flite/FreeTTS, the
common logic is quite similar, but Festival's circle is much much
larger than the other two. The common logic from this Venn diagram
basically consists of a process of preprocessing input text,
selecting units to play, combining them, and then playing them.
Unfortunately, the data files are typically the largest hunk of data,
and each engine uses its own format (Festival == Binary/ASCII EST
files, Flite == C code, FreeTTS == Binary/ASCII of its own ilk). In
retrospect, if we had understood the EST file format and Festival
signal processing code better, FreeTTS probably could have used
Festival data directly instead of duplicating the RELP-based approach
in Flite.
Festival has a core set of logic (Scheme interpreter, scheme files,
signal processing code, etc.) to deal with voice data, and one can
download voice data separately for it. I think the bulk of the data
consists mostly of pronunciation lexicons and usually some processed
form of the actual voice recordings (i.e., the "group" file). To get
a small set for perhaps en_US only, one could take a look at using
the kal_diphone voice data and attempt to discover only the bare
stuff needed to make it work. You'd probably also want to keep the
MBROLA support in there as it tends to be small and can interface
with MBROLA (a separate download) to get you better sounding voices.
In any case, I *think* you're looking at about 22Meg total for a
minimal kal_diphone-based en_US festival install, though I think you
might be able to get 4Meg or so smaller if it's possible to prune
some what-I-think-might-be-redundant lexicon data. These numbers are
based upon my install of festival on my FC4 machine, and may actually
be extra large because I've been goofing with the ARCTIC and HTS
voice support. As an aside, there's some odd interaction between the
festival server and gnome-speech on Ubuntu, which causes the CPU to
throttle to 100%. I took a quick poke at this at one time, and it
looks like there's something bogus happening on the socket/pipe
communication between the two. My current commitments and pressures
haven't afforded me the time to really dig into and solve the
problem, though. :-(
Flite (a C-based engine) is based on data imported from Festival.
Last I knew, its voice data files are compiled in as source code and
you get what you get. I'm not sure there is opportunity for pruning,
though you could get rid of the unit selection voice that's good for
only speaking the time. But...it still tends to be what it is: a
small, fast, runtime engine. :-) It's been a long time since I
looked at the code, so I don't remember sizing information, but I
think it is the smallest of the bunch. There's also no direct gnome-
speech support for it, other than indirectly through the recently
added speech-dispatcher driver for gnome-speech (thanks Hynek!).
Given resources, one probably could write a gnome-speech driver for
flite and bypass this indirection.
FreeTTS (a Java-based engine) is based on logic from Festival and
Flite, though it really is mostly a Flite clone in Java. Like
Festival, it consists of core logic that can operate on voice data.
To get a small set, you could ship only what's needed for the kevin
voices, but you'd probably also want to keep the MBROLA support
because it has similar benefits as what you get with Festival. I
think the total would be about 6.5Meg or so, but then you will also
need the Java virtual machine.
Hope this helps, and please let me know if you have any more questions,
Will
I think Flite uses the same file format. ( Will, please correct me if
I'm wrong). It also requires a Java JRE, are you planning to include
Java in the live CD?
BTW, in the past, Java was required for OpenOffice.org accessibility,
but that's not true of the latest version.
Bill
On Fri, 2006-02-24 at 12:25, Henrik Nilsen Omma wrote:
Hello,
We are working on packaging screen reader support for the Ubuntu Live
CD, but have gotten ourselves a little confused regarding file
sizes ...
Being a Live CD we are quite limited on disk space. We were thinking
that we should use the smaller F-lite, rather than the full Festival,
assuming it had smaller speech files. However, because gnome-speech
doesn't have direct support for F-lite we also needed to include
speech
dispatcher (and gnome-speech from CVS), so it begins to grow.
Can anyone shed some light on the relative space requirements of
Festival vs. F-lite? Does Festival include all it's supported
languages
by default or are they packaged separately (as packaged in
Debian)? We
would be happy to settle for English-only support this time around.
Thank you. Any advice will be greatly appreciated.
- Henrik
_______________________________________________
gnome-accessibility-list mailing list
gnome-accessibility-list gnome org
http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list
_______________________________________________
gnome-accessibility-list mailing list
gnome-accessibility-list gnome org
http://mail.gnome.org/mailman/listinfo/gnome-accessibility-list
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]