gimp-help-2 r2587 - in branches/xml2po-support: . tools

From: romanofski svn gnome org
To: svn-commits-list gnome org
Subject: gimp-help-2 r2587 - in branches/xml2po-support: . tools
Date: Mon, 20 Oct 2008 20:01:07 +0000 (UTC)
Author: romanofski
Date: Mon Oct 20 20:01:07 2008
New Revision: 2587
URL: http://svn.gnome.org/viewvc/gimp-help-2?rev=2587&view=rev

Log:

2008-10-20  Roman Joost  <romanofski gimp org>

	* tools/split_xml_multi_lang.py: self.languages is now using
	set(), which makes it easier handling languages

	* tools/split_xml_multi_lang.test: enhanced test


Modified:
   branches/xml2po-support/ChangeLog
   branches/xml2po-support/tools/split_xml_multi_lang.py
   branches/xml2po-support/tools/split_xml_multi_lang.test

Modified: branches/xml2po-support/tools/split_xml_multi_lang.py
==============================================================================
--- branches/xml2po-support/tools/split_xml_multi_lang.py	(original)
+++ branches/xml2po-support/tools/split_xml_multi_lang.py	Mon Oct 20 20:01:07 2008
@@ -140,7 +140,8 @@
         """Multi-language XML document"""
 
         self.filename = filename
-        self.dest = {}	# destination documents
+        self.dest = {}  # destination documents
+        self.languages = set(['en'])
 
         self.logger = logging.getLogger("splitxml")
         self.logger.info("Parsing %s" % filename)
@@ -184,9 +185,7 @@
         document recursively, starting with the document element,
         """
         self.logger.debug("process(%s)" % self.doc.documentElement.nodeName)
-        self.languages = languages
-        if 'en' not in languages:
-            self.languages.insert(0, "en")
+        self.languages = self.languages.union(set(languages))
 
         impl = xml.dom.minidom.getDOMImplementation()
         for lang in self.languages:
@@ -541,8 +540,7 @@
             elem = elem.parentNode
             langs = elem.getAttribute("lang")
         langs = langs.strip(';').split(';')
-        # use "set(langs)" since "langs" may contain identical entries:
-        return [lang for lang in set(langs) if lang in self.languages]
+        return self.languages.intersection(set(langs))
 
 
 ################################################################
@@ -600,9 +598,6 @@
         Logger.setLevel(logging.DEBUG)
 
     options.languages = re.split('[, ]+', options.languages)
-    try: options.languages.remove('en')
-    except ValueError: pass
-    options.languages.insert(0, 'en')
 
     doc = MultiLangDoc(options.filename)
     doc.process(options.languages)

Modified: branches/xml2po-support/tools/split_xml_multi_lang.test
==============================================================================
--- branches/xml2po-support/tools/split_xml_multi_lang.test	(original)
+++ branches/xml2po-support/tools/split_xml_multi_lang.test	Mon Oct 20 20:01:07 2008
@@ -16,22 +16,32 @@
 
 The class provides functionality for reading and parsing multi-lang
 DocBook source files. It has the ability to split the parsed document
-into a by-language sorted single language documents. To test the class,
-we need an XML testfile first, which we grab out of the GIMP Manual
-source tree:
+into a by-language sorted single language documents.
 
->>> import os
->>> import os.path
->>> roothelpdir = os.path.dirname(os.getcwd())
->>> testxmlfile = os.path.join(roothelpdir,
-...     'src', 'toolbox', 'toolbox-color-area.xml')
-
-We also need a destination directory, which we create temporary:
+To test the split file, we need our own XML file.  We need a destination
+directory, which we create temporary:
 
 >>> import tempfile
 >>> destdir = tempfile.mkdtemp()
 
-Now we can create the multilangdoc object:
+To test the
+class, we need an XML testfile first, which we grab out of the GIMP
+Manual source tree:
+
+>>> import os.path
+>>> testxml = ("<sect1 id='gimp-help-test' lang='en;cs;de'>"
+...   "<title><phrase lang='en'>Gaussian Blur</phrase>"
+...   "<phrase lang='de'>GauÃscher Weichzeichner</phrase>"
+...   "<phrase lang='cs'>Gaussovo rozostÅenÃ</phrase></title>"
+...   "</sect1>")
+>>> testxmlfile = os.path.join(destdir, 'test.xml')
+>>> open(testxmlfile, 'w').write(testxml)
+
+
+Processing
+----------
+
+We can create the multilangdoc object:
 
 >>> from split_xml_multi_lang import MultiLangDoc
 >>> mld = MultiLangDoc(testxmlfile)
@@ -40,29 +50,62 @@
 >>> mld.filename == testxmlfile
 True
 
-
-Processing
-----------
-
 Once the xmlfile is parsed, we can process it. That means, we provide in
 which languages the document should be splitted.
 
 The default language which is used by processing is English. The
-attribute is set after processing:
+languages are internally handled as a set:
 
 >>> mld.languages
-Traceback (most recent call last):
-AttributeError: 'MultiLangDoc' object has no attribute 'languages'
+set(['en'])
 
 If the languages parameter is an empty list, the processing is only done
 for English:
 
+>>> mld.languages
+set(['en'])
 >>> mld.process([])
-{u'en': <DOM Element: sect1 at -0x...>}
+{'en': <DOM Element: sect1 at -0x...>}
+
+We can define more languages we want to split the document into:
+
+>>> result = mld.process(['de', 'cs'])
+>>> ['cs', 'de', 'en'] == sorted(result.keys())
+True
+>>> len(result.keys()) == 3
+True
 
 The destination directory (actually a template) will be specified when
 printing the resulting files for every language:
 
 >>> mld.printfiles(destdir)
-FIXME
 
+We can check what has been written to our directory:
+
+>>> import os
+>>> os.listdir(destdir)
+['test.xml', 'en', 'de', 'cs']
+>>> result = os.listdir(os.path.join(destdir, 'en'))
+>>> result
+['test.xml']
+>>> open(os.path.join(destdir, 'en', result[0])).read()
+'<?xml version...<sect1 id="gimp-help-test"><title><phrase>Gaussian Blur</phrase></title></sect1>'
+>>> open(os.path.join(destdir, 'de', result[0])).read()
+'<?xml version...<sect1 id="gimp-help-test"><title><phrase>Gau\xc3\x9fscher Weichzeichner</phrase></title></sect1>'
+
+
+Empty XML files
+---------------
+
+If we process empty xml files the split throws an error:
+
+>>> testxmlfile = os.path.join(destdir, 'test_empty.xml')
+>>> open(testxmlfile, 'w').write("<sect1></sect1>")
+>>> mld = MultiLangDoc(testxmlfile)
+
+It was not possible to process the document and the split script exits
+with error number 74 (No English document element):
+
+>>> mld.process([])
+Traceback (most recent call last):
+SystemExit: 74
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]