Re: Translation tools for documentation



Hi there Danilo,

On Mon, 2004-03-29 at 14:08, Danilo Segan wrote:
> > (which I'll be at, if anyone wants to assault me in person :-)
> 
> Nice invitation, thank you very much :)

No worries - I'm pretty easy to beat up!

> A concrete question about this.  While it sounds nice to translate
> per-sentence (promoting better reuse of translations), I wonder what
> happens with things like following DocBook snippets?

<snip snippet>

We get "You can do:", "First thing to do", "Second thing to do", "Other
thing to do", "Any of these things will achieve something" as the
'sentences'.

In general, we use the structure of the input format when possible to
define what are blocks of translatable text : things like paragraphs
with inline <emphasis> tags would make up a contiguous block of text.
The Docbook spec pretty much defines what's a block-level tag and what's
inline (has it on page 33 of my copy of the 1st O'Reilly duck book).

Then we run a sentence segmentation algorithm on that block, resulting
in segments. That's over simplifying a little, but you get the general
idea.

> Indeed, XLIFF is probably much better container format for
> localisation.  The problem with it is the lack of free tools (and one
> excellent editor which might become available doesn't solve that
> issue).  I blame that on the complexity of XLIFF and the overwhelming
> number of features.

I agree - that's the main thing getting in it's way. Many of the
companies involved in XLIFF have their main business as developing
translation tools, so it's understandable that there hasn't been much in
the way of progress there (not that I like that situation very much :-(

Regarding features, you don't have to use all of them - the spec is
pretty daunting when you look at it first, but when you see a "minimal"
XLIFF file, it's not too bad.

> Translators working on free software translations rarely have the
> time to dedicate to learning all about PO files themselves, and PO
> files are very simple compared to XLIFF.

Yes, that's where .po has the advantage - I agree with you.

> Of course, the tool you're talking about releasing as free software
> would make knowing XLIFF unnecessary for translators, but we'd need
> to re-educate a bunch of those supporting PO files currently :)

Yeah, I know : as I say, it should be possible to write stuff to convert
to and from po file format, though there may be data loss in at least
one direction. The bit the spec doesn't mention though, is how to store
the formatting portions of the file : everything that isn't
translatable, so we've come up with a way of doing that. If converting
to .po files, we'd need to make sure that the file containing the
formatting is carried around with the .po file in order to be able to
back-convert to the original file format when the translations are
complete.

> I still believe PO files are the way to go _now_, because converting
> them into XLIFF should be simple once the tools and infrastracture are
> ready.

Absolutely. Here's my main worry (and maybe it's not such a big deal,
hard to tell) - if you start with translation tools that work at a
paragraph level, you're kinda stuck there. That is, you've got .po files
with one paragraph per msg from one release. If you change to sentence
level segmentation in the next release, you're not going to get the same
level of reuse from those old po files with paragraph level segments
without doing some pretty hairy alignment-procesing between them (that
is, finding which sentences in each source paragraph correspond to which
sentences in each translated paragraph, in order to build a useful TM.)

Sort of a chicken & egg thing.

> > That said, you're free to use which ever tools work for you - I'm only
> > trying to help :-)
> 
> I, as well, hope many people will make use of whatever suits them
> best.  Since it's easy to establish a map from PO-to-XLIFF (the
> latter is sort of a superset-in-features of PO files, right?), there
> should be no problems in later migrating to XLIFF, provided that is
> considered the best option at some point in the future.

Yep, dead right, with the one caveat about segmentation level as
mentioned above.

> The one big advantage of PO files currently is that translators know
> how to work with them, and need not learn anything.  And yeah, there
> are many *easily-available* free-software tools that work with them.

Absolutely - I agree. For now, you're doing the right thing with po
files I think : just need to try not to get locked into one way of doing
things and keep in mind the best way.

> It looks very nice (or rather, the features exposed in this
> screenshot look very nice, like highlighting changes between
> sentences -- I'm not that fond of Java GUI [at least that's how it
> looks to me, I may be wrong, I don't use Java apps very much], but
> that's understandable: Gtk+ is much better looking :) and I'm looking 
> forward to having a chance to try it out.

That is indeed a Java gui, and it's definitely not pretty, but there's
another bunch of people working on that problem thankfully :-) Other
features in the editor that you probably can't see there include

* format checking - if you're missing a tag in the translation that was
in the source, it'll tell you.
* integrated "mini" TM - if you've already translated something, and
come across the same or similar translation further down the file, it'll
suggest that as a possible translation
* uses aspell - has spel checqing too :-)
* status of translation - can mark a translation as "needing review" or
other status types
* easy navigation - can easily jump to next untranslated, unreviewed,
fuzzy match or 100% match string
* translator comments - can add a comment to any translation

There's lots of other stuff though that I'd like to add to it, in
particular around the "mini" TM feature (having the editor automagically
share it's mini TM amongst other online translators in your language
might be quite neat - something like jxta might allow that to work very
nicely) maybe these are unnecessary, I dunno -  Feetch Feetch!

Some more supported file formats would be nice too... (it's terrible
that we haven't had time to write .sxw filters yet, for example :-(

	cheers,
			tim


-- 
Tim Foster - Translation Technology Engineer, Software Globalization
http://sunweb.ireland/~timf
http://www.netsoc.ucd.ie/~timf




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]