Re: database format et al.



Fatih Demir <kabalak@gmx.net> writes:

> As we're trying to get a message database format based
>  on xml for gtranslator ( and  I hope, also for KBabel),
>   it's time to ask you who'll have to fill this.

Lately, we've had this discussion on different lists (I corrected 1
typo: "format_disk_de;" -> "&format_disk_de;"):

From: Karl Eichwalder <ke@gnu.franken.de>
Subject: Re: PO file format (Re: parted pt_BR potfile)
To: team-leaders@IRO.UMontreal.CA
Date: 05 Jun 2000 05:41:41 +0200
Reply-To: team-leaders@IRO.UMontreal.CA

Ivo Timmermans <zarq@icicle.yi.org> writes:

> Using XML is a nice idea.  It parses easily, but perhaps it is too
> heavy for use in PO files.  It would be nice to see where it ends
> though.

Yes.  Whatever will be I'll try to write a DTD for PO files.

> > <!DOCTYPE collection [
> 
> The DTD should be in a different (system-wide) file.

Of course -- I put it in the document for easy testing.

> > <!ELEMENT collection (head, message+)>
> > <!ELEMENT head       (project, version, translator)>
> 
> <!ELEMENT head (project, version, translator, team)>

Yes, and the ordinary elements: potdate (= POT-Creation-Date), podate
(= PO-Revision-Date).  And the MIME info?  All these info could go into
attributes.

> > <!ELEMENT message    (comment?, msgid, msgstr)>
> 
> <!ELEMENT message (comment*, option*, msgid, msgstr)>

comment* -- you're right.  I'd like to replace option with status and
make it an attribute.

<!ELEMENT collection (head, message+)>
<!ELEMENT head       (project, version, potdate, podate, translator, team)>
<!ELEMENT project    (#PCDATA)>
<!ELEMENT version    (#PCDATA)>
<!ELEMENT potdate    (#PCDATA)>
<!ELEMENT podate     (#PCDATA)>
<!ELEMENT translator (#PCDATA)>
<!ELEMENT team       (#PCDATA)>
<!ELEMENT message    (comment*, reference?, msgid, msgstr)>
<!ATTLIST message
          status     (valid|draft|fuzzy|obsolete) "valid"
          type       (text|html|xml|c|ycp)            "text">
<!ELEMENT comment    (#PCDATA)>
<!ATTLIST comment
          date       CDATA              #REQUIRED
          type       (code|prog|trans)  "trans"
          who        CDATA              #IMPLIED>
<!ELEMENT reference  EMPTY>
<!ATTLIST reference
          file       CDATA              #REQUIRED
          line       CDATA              #REQUIRED>
<!ELEMENT msgid      (#PCDATA)>
<!ELEMENT msgstr     (#PCDATA)>

> Can Emacs hide the XML tags, so that you see something that looks like
> the old PO format?

Yes, no problem.  The overhead come into the game if we'll try to define
the workflow:

    sources -> POT file (XML) -> PO file (XML) -> MO file (old format?)

msgmerge has to be rewritten as an XML application.

Using XML we'd gain macro expansion for free:

<!DOCTYPE collection [
<!ENTITY format_disk_de "Festplatte formatieren">
]>
...
 <!-- button label -->
 <message>
  <msgid>
Format disk
</msgid>
  <msgstr>
&format_disk_de;
</msgstr>
 </message>

 <!-- help text which refers to the button label -->
 <message>
  <msgid>
Press "Format disk" to create a file system...
</msgid>
  <msgstr>
Drücken Sie "»&format_disk_de;«, um ein Dateisystem anzulegen...
</msgstr>
 </message>

And one day, programmers will learn to use XML in their source files,
too.

> If we are going to redo everything, perhaps we can finally find a way
> to make the ordering of arguments irrelevant?  I'm sure this will
> require a great deal of rewriting things.  At least printf et al.
> won't work directly with the output of gettext().

IIRC, a special printf() exists that allows to number arguments; also a
special notation exists to make msgfmt happy (gettext manual, "Special
Comments preceding Keywords"):

    printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));

    "%2$d Zeichen lang ist die Zeichenkette `%1$s'"

-=-=-=-=-=-=-=-=-=-=-=-=-=- cut here -=-=-=-=-=-=-=-=-=-=-=-=-=-

From: Karl Eichwalder <ke@gnu.franken.de>
Subject: Re: new PO file format
To: po-utils-forum@IRO.UMontreal.CA
Date: 08 Jun 2000 07:58:29 +0200
Reply-To: po-utils-forum@IRO.UMontreal.CA

Ivo Timmermans <zarq@icicle.yi.org> writes:

> 2)  The XML way:
> 
> <msgid>blah
> </msgid>

Of course, this is the right way.  At least, using SGML these both
examples are treated as equal:

<msgid>blah
</msgid>

<msgid>
blah
</msgid>

For readability (without colorings) I prefer the second method.  If
whitespace is to be preserved, a attribute should say so:

<msgid xml:space="preserve">
blah
</msgid>

(Terminologie from http://www.w3.org/TR/1999/REC-xslt-19991116.)

> Hm, I've never been a real fan of verbose languages, so I hope this
> will last at least until I found something better to do with my life
> :)

Using a proper entity set will reduce the verbosity in the end; for a
start is was nice to have all texts in the source files but in the long
run '--help' and real help text have to be separated.  But we can
postpone this issue for now.

> 
> >     printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
> > 
> >     "%2$d Zeichen lang ist die Zeichenkette `%1$s'"
> 
> cool.  How portable is this?  Does some standard require this to be
> implemented?  i.e.: can we safely use this feature for all platforms
> that gettext supports?

SINAP -- sorry, I'm not a programmer ;-)

-=-=-=-=-=-=-=-=-=-=-=-=-=- cut here -=-=-=-=-=-=-=-=-=-=-=-=-=-

From: Karl Eichwalder <ke@gnu.franken.de>
Subject: Re: new PO file format
To: po-utils-forum@IRO.UMontreal.CA
Date: 09 Jun 2000 06:49:03 +0200
Reply-To: po-utils-forum@IRO.UMontreal.CA

Ivo Timmermans <zarq@icicle.yi.org> writes:

> Karl Eichwalder wrote:
> > <!ELEMENT message    (comment*, reference?, msgid, msgstr)>
> 
> Wouldn't `reference*' be cleaner?

Yes, or better 'reference+" what I've had in mind ("at least one"); I
was confused -- I should write DTDs more often ;)

-- 
work : ke@suse.de                          |                   ,__o
     : http://www.suse.de/~ke/             |                 _-\_<,
home : keichwa@gmx.net                     |                (*)/'(*)




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]