open translations database

From: Stefan Rieken <StefanRieken SoftHome net>
To: whampton staffnet com, gnome-i18n gnome org
Subject: open translations database
Date: 12 Oct 2000 16:55:26 -0100
To the folks at openstandards.org and the gnome-i18n mailing list.

Hello,

This mail was sent out to give space to an idea that I developed only
today. This idea is rough, unimplemented and untested. Nevertheless, I
hope that it is of interest for you. This mail was sent to the addresses
mentioned above, just because I didn't know any better place to start.
If you believe I shouldn't have sent it to you or your list, I
apologise. If you believe I missed someone out, you are free to forward
this. (But I must warn you in advance that this idea is too young for me
to know if it will survive my busy schedule.)

Problem:

The current translation of open source software suffers from a lack of
manpower. Thjs usually doesn't result in a lack of translations, but in
bad translations. Half of the time translation engines such as Babelfish
are being used. These engines often can't produce correct translations
of small strings because of a lack of context (e.g.: the title of the
window I am writing this message in says, directly translated back to
English: "is composing a new message" instead of "Compose a new
message"). They also don't care about the size of the translated string,
which can be important when used in a program. Translation by
individuals can often also cause errors. These vary from inconsistencies
to overlooking spelling caveats common for the target language.

It would be helpful to have one or more sets of standard translations
for standard words and strings. Translators of software would benefit
from this, but also translators of larger documents that contain
standard words and strings (such as "radio button"; you'll be surprised
to know how hard it is in some languages to come up with a good default
translation for it).

Context:

I am writing this with the GNOME project in mind, because I am known
with it. However, I want my solution to be for the general benefit of
free and open source software.

There are a lot of standard strings in applications. Many GUI standards
define which ones you can use. Desktop projects such as GNOME often have
a set of these standard strings, and their translations, included. They
can, however, not provide translations for less commonly strings.
Another problem arises when standard strings are part of bigger strings
(e.g. when "show toolbar" is standard, and a string like "show main
toolbar" is being used). Most open source projects don't really care
about documenting their use of standard strings, as the implementation
should be clear enough.

In the past, I have done some minor translation work for ATO. This is an
international organisation of translators of Amiga software (the Amiga
Translation Organisation). They were pretty well organised (but being an
Internet development newbie, it took me some time to get known with the
organisation). One of the best parts of the organisation (of the Dutch
division anyway), was a document that described the translation process,
and also contained a list of common Amiga terms and their translations.

Because I want my solution to be global, and not e.g. Amiga-specific, I
think it is not a good idea to provide a procedure for the translation
process. Different projects may have different standards. I also don't
think that a small list of common terms will do the trick. Again, these
terms may vary slightly from one project to another, and if we are going
to sum up only a few general words, the result wouldn't be really
useful.

Solution:

I was thinking that it would be nice to have a web-accessible database
being set up to tackle this problem. The "database" (or just a simple
file) would initially be empty, but it would be available for
modification through a CGI script. This service should be neutral, so
that we wouldn't get duplicate attempts to solve this global problem.
(E.g. hosting it at gnome.org wouldn't make it very neutral to KDE folks
;-).

The interesting part is how the database should look and behave. I only
have given this part little attention as of yet. There are, however, a
few schemes one could follow, and I imagine that one of these schemes
would be more or less ideal.

The Economy Scheme:
Simply feed the database a list of words and their translations, per
language. This would be the scheme of preference if it turns out that my
time, help and knowledge are really low.

The Business Scheme:
Same as above, but now with even more features! ;-), including:

- an argument-based history of the translation. Example:
 
  "English: 'file', Dutch: 'bestand'
   Previous translation 'bestant' is wrong because of a misspelling
   Previous translation 'document' is inaccurate"

- a project-specific translation. Example:

   "English: 'edit', Dutch:
  'Bewerken' (KDE standard)
  'Bewerk' (GNOME standard)"

- per-project tips and guidelines. Example:

  "English: 'Are you sure you want to ...',
   KDE tip: doubting the user is not friendly. Please use 'Please
confirm ...' instead."

- per-language (and per-project?) tips. Example:

  "English: edit, Dutch: bewerk
  Dutch language tip (GNOME): always use infinitive[*]"

- automatic parsing of your .po files??
- automatic updating of a few registered .po files??

So this is my plan for a "translation bazaar". As said, the idea is that
it is empty at start, and then maybe someone would dump a few GNOME and
KDE .po files into this database, and the initial revision process can
kick off. But the real idea is that folks supply their own strings they
want to have translated, and the database would slowly get filled, while
translations grow to be more accurate over time because of revisions.

But actually I've no idea if this would become a success. I know that I
myself have only little time and resources, so I'd be happy already if I
only managed to get the Economy scheme. I also never worked with .po
files and stuff. But I did do some CGI and Perl stuff recently, then
again I can't say that I have a good cgi-bin place to put this. It would
be really cool if folks could just file their (not too specific) .po or
similar files into the system, and that the system automatically keeps
these files translated and up to date. But as said, I don't know
anything of this .po stuff, so that really is beyond my potential. But
if someone thinks "yeah, this is a really neat idea, and I can do it!",
I would be delighted to form some kind of team, of course. It may also
take some not-me expertise to support languages with different
alphabets.

So in fact, it will kind of depend on what you guys think of this idea.
Can it succeed? Will it be popular? Will this system become a standard
part of e.g. the rules for GNOME translation, if it works? Do you feel
like working on it? Do you have a good CGI space?

I must say, I don't know if this is a good idea, or if it is only a nice
theory with no practical value. So I really look forward to any
feedback.

Greets,

Stefan

[*] I'm not sure if this is the correct term because it's been a while
since I had to learn it. But the problem Dutch translators have to face
is that in English, in "I edit", the word "edit" is the same as in "to
edit" and "you edit", while in Dutch it is not. So when translating to
Dutch, you need to know which one to choose.
Follow-Ups:
- Re: open translations database
  - From: Jeff Waugh
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]