DV said I should email the list, so here goes.
I'm trying to create and register a xpath function re_contains that
works the same way as the normal contains function except that it
accepts a regular expression as its second argument.
I have two problems, one with function arguments, another with return
values.
Here's the code:
#!/usr/bin/python
import libxml2
import sys
import re
def re_contains(context, s, p):
print "s:", s, ",", len(s), ", p:", p
for ss in s:
print "ss: ", ss
print dir(ss)
if re.search(p, s):
return 1
return 0
def find_matches(pattern, files):
matches = []
for f in files:
doc = libxml2.parseFile(f)
ctxt = doc.xpathNewContext()
libxml2.registerXPathFunction(ctxt._o, "re_contains", None,
re_contains)
res = ctxt.xpathEval(pattern)
if res:
matches.append((f, res))
return matches
if __name__ == '__main__':
pattern = sys.argv[1]
files = sys.argv[2:]
matches = find_matches(pattern, files)
for file, nodes in matches:
print "---", file
for node in nodes:
print node.serialize()
print "--"
The script works a bit like grep: it accepts as its first argument an
xpath expression, and after that a list of files. It prints out the
matching parts of the files.
When I try to invoke it with an xpath expression like //foo/bar
[re_contains(.,'as?df')], to search the contents of element bar, the
value assigned to s in re_contents is a PyCObject that looks like a list
with one item. The item is another PyCObject; taking dir() of it returns
an empty list.
$cat test.xml
<foo><bar>baz</bar></foo>
$./xpathgrep.py "//bar[re_contains(.,'ba')]" test.xml
s: [<PyCObject object at 0x401a74d0>] , 1 , p: ba
ss: <PyCObject object at 0x401a74d0>
[]
/usr/lib/python2.3/site-packages/libxml2.py:511: RuntimeWarning:
tp_compare didn't return -1 or -2 for exception
if type(o) == type([]) or type(o) == type(()):
Traceback (most recent call last):
[... snip an exception from re]
Using the xpath function name() instead of . works out better:
$./xpathgrep.py "//foo[re_contains(name(),'ba')]" test.xml
s: foo , 3 , p: ba
ss: f
[... snip iterating f, o and o ]
So should I do something magic when the user has passed in .? Or is this
a bug?
Using name() shows the second problem: what to return? True and False
aren't the answer, apparently, because it says Unable to convert Python
Object to XPath. The same with 1 and 0. I see contains calls a function
called valuePush to store the value, but I don't think that's available
in Python. Apparently the Python bindings call a function called
libxml_xmlXPathObjectPtrConvert to convert the return value to something
that can be used as an argument to valuePush, but I can't see anything
that would indicate it could deal with boolean values.
This is libxml2 2.6.11.
--
[ Juri Pakaste | juri iki fi | http://www.iki.fi/juri/ ]
Attachment:
signature.asc
Description: This is a digitally signed message part