Re: Low memory hacks



On Thu, 2008-03-13 at 18:38 +0000, Bastien Nocera wrote:
> On Thu, 2008-03-13 at 18:26 +0000, Simos Xenitellis wrote:
> <snip>
> > Some messages do not get translated, so you see in the translations
> > files things like
> > 
> > msgid "DEFAULT"
> > msgstr "DEFAULT"
> > 
> > This is convenient, because in the statistics you can see the app
> > fully translated.
> > 
> > This also means that the MO file will still have an entry for these
> > two. Going through the MO files and removing these few occurrences
> > will free a bit of disk space, but also memory.
> > For some locales such as en_GB, this can account to about 14MB of MO
> > files while the real space should be close to 2MB.
> 
> We do this to look good in the translation stats (and know when to
> update the translations, which is a pretty big point).

That is good, and many other teams (non-English) do the same thing. The
PO format does not have a facility to tag messages as "I reviewed this
translation, and it should appear as the original". A move to XILFF may
help here.

But the end-result is that one can optimise by stripping off identical
messages, when creating the distribution packages.

> I'll like to see some real numbers on the memory usage instead of
> numbers being thrown around.

In Ubuntu 7.10, the PO files for en_GB are
$ du
-h /usr/share/locale/en_GB/LC_MESSAGES /usr/share/locale-langpack/en_GB/LC_MESSAGES/
2.3M    /usr/share/locale/en_GB/LC_MESSAGES
17M     /usr/share/locale-langpack/en_GB/LC_MESSAGES/
$_ 

In Ubuntu 8.04 (alpha 6), the PO files for en_GB are
$ du
-h /usr/share/locale/en_GB/LC_MESSAGES /usr/share/locale-langpack/en_GB/LC_MESSAGES/
84K    /usr/share/locale/en_GB/LC_MESSAGES
2.2M     /usr/share/locale-langpack/en_GB/LC_MESSAGES/
$_ 

What I am missing here is that I do not know when/how Ubuntu adds this
functionality. It would benefit other distros as well. Did Debian
introduce with feature? Danilo, any links?

>From the 2.3M + 17M MO files in Ubuntu 7.10, a typical GNOME session
loads up a subset of the MO files,

# lsof | grep \.mo\$ | awk '{print $7,$9}' | sort -n | uniq

At this moment, my 7.10 is a bit messed up (I have en_GB.UTF-8 but most
apps have en_US?!?). The figures for 8.04 with el_GR should be
comparative of what you get now with 7.10 and en_GB:

# lsof | grep \.mo\$ | awk '{print $7,$9}' | sort -n | uniq | awk
'{printf "%d+",$1}' > /tmp/bc_sums

Using "bc" with /tmp/bc_sums gives the figure
3.6M (3624412) for a standard session. This figure is a bit
conservative, because en_GB probably did more work than el.

With Ubuntu 8.04 (alpha6) and en_GB, the figure for the MO files is
less than 600K (585375).
Bastien, could you provide the proper figure for your system?

That is a saving of at least 3M in memory.

The stripping of "unneeded" messages is good, and should happen at the
package generation level (not in GNOME, or when creating tarballs). 

Simos




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]