Proposal for declinations in gettext

From: Danilo Segan <dsegan gmx net>
To: translation iro umontreal ca
Cc: linux-utf8 nl linux org, translation-i18n lists sourceforge net,GNOME I18N List <gnome-i18n gnome org>
Subject: Proposal for declinations in gettext
Date: Fri, 13 Jun 2003 22:14:25 +0200

Hi,
first, sorry for cross-posting (some of you will receive multiple 
messages :-().

I'd like to propose a simple gettext extension which would work at least 
for Serbian, but I hope it would work for many other languages.

*Background:*
Serbian language has 7 declinations of a word (nouns, pronouns, and 
similar words), in recent discussions on gnome-i18n list I found out 
that Finnish has 15, etc. This becomes a major problem when translating 
"composed" strings, as in "move %s", where "%s" might be any of "queen", 
"king",...

The usual scenario is this (Serbian latin transliteration used for 
examples):
msgid "queen"
msgstr "kraljica"

msgid "king"
msgstr "kralj"

msgid "move %s"
msgstr "premesti %s"

msgid "go with %s"
msgstr "idi sa %s"

It's unfortunate (or is it?) that we'll get the form of "premesti 
kraljica" which is incorrect (it ought to be "premesti kraljicu"), or 
"idi sa kralj" instead of "idi sa kraljem".

The solution is simple, and I guess that it will work for at least all 
Slavic languages, but probably many more.

*Solution:*
# in the header, 7 is a sample for Serbian
"PO-Number-of-noun-forms: 7\n"

msgid "queen"
msgstr<0> "kraljica"
msgstr<3> "kraljicu"
msgstr<5> "kraljicom"

msgid "king"
msgstr<0> "kralj"
msgstr<3> "kralja"
msgstr<5> "kraljem"

msgid "move %s"
msgstr "premesti %<3>s"

msgid "go with %s"
msgstr "idi sa %<5>s"

<i>, where i=0 .. (PO-Number-of-noun-forms)-1, is the index of the form 
required, and it depends on the sentence construction. It is determined 
by the verb, or perhaps words like "with", "whom", ... Some of 
msgstr<i>'s can be omitted if it's known not to be used in composition 
(most are highly unlikely to be ever used in translations, like the 
"vocative" form of "hey %s").


The good side of this approach (the syntactic elements are arbitrary, 
don't comment on those) is that programs that use gettext for l10n would 
need no change: everything would be done on the gettext library side and 
by translators (it's even better than plural-forms in that manner). Of 
course, care should be taken to allow also combination of these and 
plural forms, as in:
msgid "king"
msgid_plural "kings"
msgstr[0]<0> "kralj"
msgstr[0]<5> "kraljem"
msgstr[2]<0> "kraljevi"
msgstr[2]<5> "kraljevima"


Before diving into gettext code, it'd be nice to hear if this kind of 
approach would work for any language other than Serbian (I repeat, I 
find it likely to work for Slavic languages, and German, those being the 
languages I'm at least a bit familiar with).

In any case, looking forward to hearing from all of you.


Again, sorry for crossposting, but I just wanted to reach the widest 
possible audience, so as to get some *real* insight into the problem.

Cheers,
Danilo

Follow-Ups:
- Re: Proposal for declinations in gettext
  - From: Miloslav Trmac
- Re: Proposal for declinations in gettext
  - From: Edward H Trager

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]