gimp-help-2 r2587 - in branches/xml2po-support: . tools
- From: romanofski svn gnome org
- To: svn-commits-list gnome org
- Subject: gimp-help-2 r2587 - in branches/xml2po-support: . tools
- Date: Mon, 20 Oct 2008 20:01:07 +0000 (UTC)
Author: romanofski
Date: Mon Oct 20 20:01:07 2008
New Revision: 2587
URL: http://svn.gnome.org/viewvc/gimp-help-2?rev=2587&view=rev
Log:
2008-10-20 Roman Joost <romanofski gimp org>
* tools/split_xml_multi_lang.py: self.languages is now using
set(), which makes it easier handling languages
* tools/split_xml_multi_lang.test: enhanced test
Modified:
branches/xml2po-support/ChangeLog
branches/xml2po-support/tools/split_xml_multi_lang.py
branches/xml2po-support/tools/split_xml_multi_lang.test
Modified: branches/xml2po-support/tools/split_xml_multi_lang.py
==============================================================================
--- branches/xml2po-support/tools/split_xml_multi_lang.py (original)
+++ branches/xml2po-support/tools/split_xml_multi_lang.py Mon Oct 20 20:01:07 2008
@@ -140,7 +140,8 @@
"""Multi-language XML document"""
self.filename = filename
- self.dest = {} # destination documents
+ self.dest = {} # destination documents
+ self.languages = set(['en'])
self.logger = logging.getLogger("splitxml")
self.logger.info("Parsing %s" % filename)
@@ -184,9 +185,7 @@
document recursively, starting with the document element,
"""
self.logger.debug("process(%s)" % self.doc.documentElement.nodeName)
- self.languages = languages
- if 'en' not in languages:
- self.languages.insert(0, "en")
+ self.languages = self.languages.union(set(languages))
impl = xml.dom.minidom.getDOMImplementation()
for lang in self.languages:
@@ -541,8 +540,7 @@
elem = elem.parentNode
langs = elem.getAttribute("lang")
langs = langs.strip(';').split(';')
- # use "set(langs)" since "langs" may contain identical entries:
- return [lang for lang in set(langs) if lang in self.languages]
+ return self.languages.intersection(set(langs))
################################################################
@@ -600,9 +598,6 @@
Logger.setLevel(logging.DEBUG)
options.languages = re.split('[, ]+', options.languages)
- try: options.languages.remove('en')
- except ValueError: pass
- options.languages.insert(0, 'en')
doc = MultiLangDoc(options.filename)
doc.process(options.languages)
Modified: branches/xml2po-support/tools/split_xml_multi_lang.test
==============================================================================
--- branches/xml2po-support/tools/split_xml_multi_lang.test (original)
+++ branches/xml2po-support/tools/split_xml_multi_lang.test Mon Oct 20 20:01:07 2008
@@ -16,22 +16,32 @@
The class provides functionality for reading and parsing multi-lang
DocBook source files. It has the ability to split the parsed document
-into a by-language sorted single language documents. To test the class,
-we need an XML testfile first, which we grab out of the GIMP Manual
-source tree:
+into a by-language sorted single language documents.
->>> import os
->>> import os.path
->>> roothelpdir = os.path.dirname(os.getcwd())
->>> testxmlfile = os.path.join(roothelpdir,
-... 'src', 'toolbox', 'toolbox-color-area.xml')
-
-We also need a destination directory, which we create temporary:
+To test the split file, we need our own XML file. We need a destination
+directory, which we create temporary:
>>> import tempfile
>>> destdir = tempfile.mkdtemp()
-Now we can create the multilangdoc object:
+To test the
+class, we need an XML testfile first, which we grab out of the GIMP
+Manual source tree:
+
+>>> import os.path
+>>> testxml = ("<sect1 id='gimp-help-test' lang='en;cs;de'>"
+... "<title><phrase lang='en'>Gaussian Blur</phrase>"
+... "<phrase lang='de'>GauÃscher Weichzeichner</phrase>"
+... "<phrase lang='cs'>Gaussovo rozostÅenÃ</phrase></title>"
+... "</sect1>")
+>>> testxmlfile = os.path.join(destdir, 'test.xml')
+>>> open(testxmlfile, 'w').write(testxml)
+
+
+Processing
+----------
+
+We can create the multilangdoc object:
>>> from split_xml_multi_lang import MultiLangDoc
>>> mld = MultiLangDoc(testxmlfile)
@@ -40,29 +50,62 @@
>>> mld.filename == testxmlfile
True
-
-Processing
-----------
-
Once the xmlfile is parsed, we can process it. That means, we provide in
which languages the document should be splitted.
The default language which is used by processing is English. The
-attribute is set after processing:
+languages are internally handled as a set:
>>> mld.languages
-Traceback (most recent call last):
-AttributeError: 'MultiLangDoc' object has no attribute 'languages'
+set(['en'])
If the languages parameter is an empty list, the processing is only done
for English:
+>>> mld.languages
+set(['en'])
>>> mld.process([])
-{u'en': <DOM Element: sect1 at -0x...>}
+{'en': <DOM Element: sect1 at -0x...>}
+
+We can define more languages we want to split the document into:
+
+>>> result = mld.process(['de', 'cs'])
+>>> ['cs', 'de', 'en'] == sorted(result.keys())
+True
+>>> len(result.keys()) == 3
+True
The destination directory (actually a template) will be specified when
printing the resulting files for every language:
>>> mld.printfiles(destdir)
-FIXME
+We can check what has been written to our directory:
+
+>>> import os
+>>> os.listdir(destdir)
+['test.xml', 'en', 'de', 'cs']
+>>> result = os.listdir(os.path.join(destdir, 'en'))
+>>> result
+['test.xml']
+>>> open(os.path.join(destdir, 'en', result[0])).read()
+'<?xml version...<sect1 id="gimp-help-test"><title><phrase>Gaussian Blur</phrase></title></sect1>'
+>>> open(os.path.join(destdir, 'de', result[0])).read()
+'<?xml version...<sect1 id="gimp-help-test"><title><phrase>Gau\xc3\x9fscher Weichzeichner</phrase></title></sect1>'
+
+
+Empty XML files
+---------------
+
+If we process empty xml files the split throws an error:
+
+>>> testxmlfile = os.path.join(destdir, 'test_empty.xml')
+>>> open(testxmlfile, 'w').write("<sect1></sect1>")
+>>> mld = MultiLangDoc(testxmlfile)
+
+It was not possible to process the document and the split script exits
+with error number 74 (No English document element):
+
+>>> mld.process([])
+Traceback (most recent call last):
+SystemExit: 74
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]