get_child() APIs



Hey Emmanuele,

Here's a follow up to our IRC discussion. For everyone reading along,
we discussed about
http://wiki.clutter-project.org/wiki/Clutter/Apocalypses/Apocalypse1#Actor_hierarchy
and I wanted to poke the Webkit guys to figure out how Webkit handles
children in the DOM. I wanted to figure out if there's a way that
avoids performance pitfalls in any of the common iteration one wants
to do when iterating over children. The code is in
  http://trac.webkit.org/browser/trunk/Source/WebCore/dom/ContainerNode.h
  http://trac.webkit.org/browser/trunk/Source/WebCore/dom/ContainerNode.cpp
(and is inherited via ContainerNode => Element => StyledElement =>
HTMLElement if anyone cares.)
And a node basically has these fields:
class Node {
  Node parent;
  Node previous_sibling;
  Node next_sibling;
  Node first_child;
  Node last_child;
};

This gives you highest possible performance (usually O(1)) for:
- all the obvious getters - parent, first and last child, next and
previous sibling
- node.remove ()
- node.append (new_child)
- sibling.insert_before (node)
- iterating over all children both forward and backward
- traversing the whole tree both forward and backward

The following functions are O(N) however:
- node.count_children ()
- node.get_child_at (index)
(see http://trac.webkit.org/browser/trunk/Source/WebCore/dom/ContainerNode.cpp#L1072
)
Interestingly, those functions are rarely ever called by WebKit itself.

If you were to use this for ClutterActor or GtkContainer, you'd have
to maintain the list yourself and can't use GList, GSequence or
GArray. It also means you have to use a different API to the one you
proposed on the wiki. But it also means you don't need to keep a
private data structure and you have pretty much optimal performance
for the interesting operations - foreach here is O(N) while it's O(N *
log (N)) on a GSequence, insert is O(N) on a GList and O(log(N)) for a
GSequence while it's O(1) here.

However, there is also a pretty complicated machinery for accessing
nodes as an array (using caches and whatnot) to support HTMLs
childNodes array performantly:
  http://trac.webkit.org/browser/trunk/Source/WebCore/dom/DynamicNodeList.h
  http://trac.webkit.org/browser/trunk/Source/WebCore/dom/DynamicNodeList.cpp
That class basically just caches the count of children and the last
item and its index (and invalidates both when the list of children
changes). Note that this is just the base implementation and it can be
overridden by subclasses. I have not yet investigated how many
subclasses actually do that.

So I am not sure what is the best approach for coming up with a good
API for working with child actors/widgets. I think this API is good
enough for what you want to do usually, but if you were to decide to
use it, you should probably not support a native get_child_at() API
because then people will write:
for (i = 0;
    i < node.count_children();
    i++)
  {
    child = node.get_child_at (i);
    /* do stuff */
  }
which is O(2 * N^2) instead of
for (child = node.first_child();
    child;
    child = child.next_sibling())
  {
    /* do stuff */
  }
which is O(N).
If you want to support a get_child_at() API, you need to at least
support what DynamicNodeList does, which will restore the O(N)
behavior for the for loop above.

Benjamin


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]