[xml] extremely long xslt transformation times



Hello list,

[reason for putting libxml mailing list: seems to be an xpath problem]

we using libxml2 and libxslt to do xml to xml transformations. Part of it is
the identity transformation:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
        <xsl:output method="xml" indent="yes"/>
        
        <!-- identity transformation -->
        <xsl:template match="@*|node()">
                <xsl:copy>
                        <xsl:apply-templates select="@*|node()"/>
                </xsl:copy>
        </xsl:template>
</xsl:stylesheet>


We got some surprising results when comparing the transformation times for
files of identical size (5MB). One file transforms in 11 seconds and the
other in 283 seconds.

We did some ananlysis of the problem and found that the doc taking so long
to transform contains a node list of 22,000 elements that are all child
elements of a single element. 

We used Quantify to track down the bottleneck. The result is when applying
the template

        <xsl:apply-templates select="@*|node()"/>

a long time is spent in evaluating the XPath "@*|node()".

The compiled XPath is
SORT
        UNION
                COLLECT attrs all
                COLLECT nodes all

The call stack when evaluating is:
xmlXPathCompOpEval
        xmlXPathNodeSetSort
                xmlXPathCmpNodes

The sort operation performs worst case in this scenario since the two
collected node sets are already sorted. xmlXPathNodeSetSort uses Shell sort,
which is O(log N * N) number of compares. xmlXPathCmpNodes itself is O(N)
since it has to traverse the list of all child nodes to determine the order.
Altogether this is O(N * N * log N)

Now we see three different ways to improve performance in this case:

1) optimize the XPath compilation not to sort since the collected node sets
are already sorted and the union will not change that
2) Improve xmlXPathCmpNodes by adding an child index to xmlNode, thus
quickly comparing just the index and not traversing the list
3) Avoid sorting a node set by adding a flag to the xmlXPathNodeSet
structure if the set is sorted already

We'd like to ask the list what approache we should take for this case?

Regards
Thomas



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]