open translations database
- From: Aoife Dunne - Sunsoft ELC <Aoife Dunne ireland sun com>
- To: whampton staffnet com, gnome-i18n gnome org,StefanRieken SoftHome net
- Subject: open translations database
- Date: Wed, 1 Nov 2000 12:16:43 +0000 (GMT)
Dear Stephen & All
My name is Aoife Dunne and I am the project manager responsible
for the GNOME Localisation at Sun.
I am writing this mail in the hope that I can take Stephen's
suggestions one step future helping the open source community in
providing localised product versions of GNOME and similar open
source products thereafter. I work for Sun Microsystems who are
planning on shipping GNOME with the next marketing release of
Solaris, therefore I am writing this mail with the GNOME project
in mind. However, we want any solution to be for general benefit
of free and open source software and I would be very interested in
offering our team assistance across all localised open source
software.
How can Sun help:
Stefan mentioned it would be nice to have a web-accessible
"database" (or just a simple file) which would contain one or more
set of standard English and associated translations for standard
words/terms. Develops and translators of software and
documentation could use the terminology listings as reference.
Terming Tool
------------
We have a script, which extracts terms from the English software
files, providing suitable terms for the initial database/file. A
term is defined as no more than one or two words. This script
extracts terms from the strings, removes duplications, ignores
terms such as "the, is, numbers etc.". It is not possible to
extract the associated translated terms, so it would require
translators to provide the translated terms. Once this is done,
the terminology listings can be posted to a web site, where it can
be updated/modified as development of applications progress. It
is preferred that the suite of applications within a product use
the same terminology ensuring consistency, however by defining the
application it is possible to use different terms when
appropriate.
Sample
English Term English Definition Translated Application
Print
Save
Save-To
Initially it may not be possible for me to supply the source of
the terming tool due to licensing problems, however I can help
immediately by supplying a simple text file with the English
terms. Would this be of help?
Translation Memory
------------------
We are currently in the process of developing a translation memory
(TM) system which runs on Unix. How it works: Basically, TM is
all about recycling your previous translations in order to retain
quality consistency, save time and money. TM is based on a string
whereas terming is based on terms consisting of maybe one or two
words. When as translation has been completed, the English and
the associated translated .po files are run through a .po file
parser which splits the files up into strings. The files are then
run through an alignment tool which generates files containing
string pairs. Each string pair consists of an English string and
its corresponding translated string. These aligned files are then
imported into a database or translation memory (we are using,
Oracle for this). When an updated version of the .po files comes
along, the English files are run against the database using a
translation memory tool. The TM tool searches the database for
matches for each English string. If it finds an exact match (it
always compares English against English), it inserts the
corresponding translation into the file, leaving the translator to
translate only what's new or what has changed. However, the
translator still has the freedom to overwrite or correct a
database translation if he/she so wishes. Generally the database
is kept at a central location and is populated with new
translations once the translation of a new application/product is
complete. Obviously, it is necessary to monitor the quality of
what goes into the database. Otherwise it's "garbage in, garbage
out".
The TM system is still in development but is coming close to
completion. We may be able to help by providing you with a .po
file parser. However, we would need to look into possible
licensing issues.
Style Guides
------------
We have some localised versions of a style guidelines. These
guidelines are used to aid the translators. For example, in
France how the date, time formats should be localised. In many
countries such data is correct in many formats, however, the use
of style guides decide on the preferred format for the use of
consistency. Our style guides could be used as reference and
updated to create a GNOME specific style guide for all languages.
Let me know if you are interested and I will send you a copy of
our country specific style guides.
How else can Sun help,
* possible act a the host for the translation memory database,
populating newly translated products.
* provide linguistic quality assurance feedback and implement
linguistic changes if necessary checking for grammar, spelling,
inconsistencies etc.
If any of the above suggestions would be of help and if you have
any other suggestion on what I can bring to the table, please let
me know. Looking forward to getting any feedback.
Best Regards
Aoife
> X-Unix-From: StefanRieken@SoftHome.net Thu Oct 12 18:56:35 2000
> Delivered-To: gnome-i18n@gnome.org
> Subject: open translations database
> From: Stefan Rieken <StefanRieken@SoftHome.net>
> To: whampton@staffnet.com, gnome-i18n@gnome.org
> Date: 12 Oct 2000 16:55:26 -0100
> Mime-Version: 1.0
> X-BeenThere: gnome-i18n@gnome.org
> X-Loop: gnome-i18n@gnome.org
> X-Mailman-Version: 2.0beta5
> List-Id: Internationalization (I18N) of GNOME
<gnome-i18n.gnome.org>
>
> To the folks at openstandards.org and the gnome-i18n mailing
list.
>
> Hello,
>
> This mail was sent out to give space to an idea that I developed
only
> today. This idea is rough, unimplemented and untested.
Nevertheless, I
> hope that it is of interest for you. This mail was sent to the
addresses
> mentioned above, just because I didn't know any better place to
start.
> If you believe I shouldn't have sent it to you or your list, I
> apologise. If you believe I missed someone out, you are free to
forward
> this. (But I must warn you in advance that this idea is too
young for me
> to know if it will survive my busy schedule.)
>
> Problem:
>
> The current translation of open source software suffers from a
lack of
> manpower. Thjs usually doesn't result in a lack of translations,
but in
> bad translations. Half of the time translation engines such as
Babelfish
> are being used. These engines often can't produce correct
translations
> of small strings because of a lack of context (e.g.: the title
of the
> window I am writing this message in says, directly translated
back to
> English: "is composing a new message" instead of "Compose a new
> message"). They also don't care about the size of the translated
string,
> which can be important when used in a program. Translation by
> individuals can often also cause errors. These vary from
inconsistencies
> to overlooking spelling caveats common for the target language.
>
> It would be helpful to have one or more sets of standard
translations
> for standard words and strings. Translators of software would
benefit
> from this, but also translators of larger documents that contain
> standard words and strings (such as "radio button"; you'll be
surprised
> to know how hard it is in some languages to come up with a good
default
> translation for it).
>
> Context:
>
> I am writing this with the GNOME project in mind, because I am
known
> with it. However, I want my solution to be for the general
benefit of
> free and open source software.
>
> There are a lot of standard strings in applications. Many GUI
standards
> define which ones you can use. Desktop projects such as GNOME
often have
> a set of these standard strings, and their translations,
included. They
> can, however, not provide translations for less commonly
strings.
> Another problem arises when standard strings are part of bigger
strings
> (e.g. when "show toolbar" is standard, and a string like "show
main
> toolbar" is being used). Most open source projects don't really
care
> about documenting their use of standard strings, as the
implementation
> should be clear enough.
>
> In the past, I have done some minor translation work for ATO.
This is an
> international organisation of translators of Amiga software (the
Amiga
> Translation Organisation). They were pretty well organised (but
being an
> Internet development newbie, it took me some time to get known
with the
> organisation). One of the best parts of the organisation (of the
Dutch
> division anyway), was a document that described the translation
process,
> and also contained a list of common Amiga terms and their
translations.
>
> Because I want my solution to be global, and not e.g.
Amiga-specific, I
> think it is not a good idea to provide a procedure for the
translation
> process. Different projects may have different standards. I also
don't
> think that a small list of common terms will do the trick.
Again, these
> terms may vary slightly from one project to another, and if we
are going
> to sum up only a few general words, the result wouldn't be
really
> useful.
>
> Solution:
>
> I was thinking that it would be nice to have a web-accessible
database
> being set up to tackle this problem. The "database" (or just a
simple
> file) would initially be empty, but it would be available for
> modification through a CGI script. This service should be
neutral, so
> that we wouldn't get duplicate attempts to solve this global
problem.
> (E.g. hosting it at gnome.org wouldn't make it very neutral to
KDE folks
> ;-).
>
> The interesting part is how the database should look and behave.
I only
> have given this part little attention as of yet. There are,
however, a
> few schemes one could follow, and I imagine that one of these
schemes
> would be more or less ideal.
>
> The Economy Scheme:
> Simply feed the database a list of words and their translations,
per
> language. This would be the scheme of preference if it turns out
that my
> time, help and knowledge are really low.
>
> The Business Scheme:
> Same as above, but now with even more features! ;-), including:
>
> - an argument-based history of the translation. Example:
>
> "English: 'file', Dutch: 'bestand'
> Previous translation 'bestant' is wrong because of a
misspelling
> Previous translation 'document' is inaccurate"
>
> - a project-specific translation. Example:
>
> "English: 'edit', Dutch:
> 'Bewerken' (KDE standard)
> 'Bewerk' (GNOME standard)"
>
> - per-project tips and guidelines. Example:
>
> "English: 'Are you sure you want to ...',
> KDE tip: doubting the user is not friendly. Please use
'Please
> confirm ...' instead."
>
> - per-language (and per-project?) tips. Example:
>
> "English: edit, Dutch: bewerk
> Dutch language tip (GNOME): always use infinitive[*]"
>
> - automatic parsing of your .po files??
> - automatic updating of a few registered .po files??
>
> So this is my plan for a "translation bazaar". As said, the idea
is that
> it is empty at start, and then maybe someone would dump a few
GNOME and
> KDE .po files into this database, and the initial revision
process can
> kick off. But the real idea is that folks supply their own
strings they
> want to have translated, and the database would slowly get
filled, while
> translations grow to be more accurate over time because of
revisions.
>
> But actually I've no idea if this would become a success. I know
that I
> myself have only little time and resources, so I'd be happy
already if I
> only managed to get the Economy scheme. I also never worked with
.po
> files and stuff. But I did do some CGI and Perl stuff recently,
then
> again I can't say that I have a good cgi-bin place to put this.
It would
> be really cool if folks could just file their (not too specific)
.po or
> similar files into the system, and that the system automatically
keeps
> these files translated and up to date. But as said, I don't know
> anything of this .po stuff, so that really is beyond my
potential. But
> if someone thinks "yeah, this is a really neat idea, and I can
do it!",
> I would be delighted to form some kind of team, of course. It
may also
> take some not-me expertise to support languages with different
> alphabets.
>
> So in fact, it will kind of depend on what you guys think of
this idea.
> Can it succeed? Will it be popular? Will this system become a
standard
> part of e.g. the rules for GNOME translation, if it works? Do
you feel
> like working on it? Do you have a good CGI space?
>
> I must say, I don't know if this is a good idea, or if it is
only a nice
> theory with no practical value. So I really look forward to any
> feedback.
>
> Greets,
>
> Stefan
>
> [*] I'm not sure if this is the correct term because it's been a
while
> since I had to learn it. But the problem Dutch translators have
to face
> is that in English, in "I edit", the word "edit" is the same as
in "to
> edit" and "you edit", while in Dutch it is not. So when
translating to
> Dutch, you need to know which one to choose.
>
>
> _______________________________________________
> gnome-i18n mailing list
> gnome-i18n@gnome.org
> http://mail.gnome.org/mailman/listinfo/gnome-i18n
Aoife Dunne
Program Manager
European Localisation Centre
Sun Microsystems Ireland Ltd
Hamilton House
East Point Business Park
Dublin 3
Ireland
Tel.: +353-1-8199-266
Fax:. +353-1-8199-261
Email: aoife.dunne@Ireland.Sun.COM
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]