> A related example. Recently the ta tree was discovered to be infected
> with non UTF-8 strings. It was never documented so basically ended up
> wasting many people's enthusiasm and broke the flow of the l10n effort
> until discovered through painful analysis. 

Well, in the case of Tamil it is due to historic reasons (proper utf-8 support
is only recent, while using tscii (well, not real tscii, but tscii in disguise,
claiming it is cp1252) allowed for good display rendering, so it has been
widely used for almost two years, in a lot and a lot of free programs
(in particular, with gtk1 it was simply the only choice);
it wasn't the only language starting with a non-utf-8 encoding; however
it has been the only one using an encoding that is not 1-to-1 compatible
with unicode codepoints, making it complex to do the conversion.

Now that utf-8 for tamil works quite well all those problems will hopefully
be a thing of the past.

