iterator resolution

From: Havoc Pennington <hp redhat com>
To: gtk-devel-list redhat com
Subject: iterator resolution
Date: 03 Mar 2000 12:43:00 -0500

Hi,

For people that care this is what I ended up doing.

First, some definitions:

 - an "indexable segment" is a segment in the btree that can 
   be referenced by a character index. i.e. if the tree were
   an array, an indexable segment occupies one or more array 
   cells. Indexable segments include characters, pixmaps, and 
   embedded widgets (basically FrooTkxt treats widgets and pixmaps
   as characters with special glyphs)

 - non-indexable segments are segments that are conceptually
   placed _between_ two cells in the array of text. Examples
   are tag toggles and marks. 

 - a mark is a place holder between two cells that can't be deleted;
   if you delete the cells on either side of a mark, the mark is 
   simply left in between the remaining valid cells on the left 
   and right. The insertion point (cursor) is an example of a mark;
   all other marks are invisible, however.

So I did the following with iterators:

 - iterators are static objects you can place on the stack; 
   no memory management

 - iterators become invalid if any indexable segments are added 
   or removed (i.e. if the number of valid indexes changes). 
   A magic stamp integer as described by Karl is incremented on 
   indexable segment modification; this is used to reliably 
   detect invalid iterators. Invalid iterators should generate
   a g_warning() 100% of the time, they will never simply segfault.

 - This means that non-indexable segments like tags and marks can be
   applied/removed at will without breaking your iterators.

 - the iterator uses a second magic stamp integer to cache a pointer
   to the segment being indexed; this avoids a lot of linear searching
   into lines and makes the iterators quite fast. I also have the 
   iterators caching some other stuff right now, such as their 
   character index into the btree; we can tune exactly what gets
   cached and what gets calculated as we get more empirical use
   data.

   Anyway, the upshot is that operator++ and operator* should both
   be O(1) operations. operator== continues to be O(1). operator-- 
   is perhaps a bit sluggish since everything is singly-linked, 
   but I don't think it's worth the memory overhead to make it 
   faster. It is O(1) as long as you are moving back within a 
   single character segment, it just requires a search from the
   front of the line when you need to go to the previous segment.

 - iterator objects are kind of large, something like 40 bytes;
   I don't think this matters, usually you only have 1 or 2 of them
   at a time.

 - Another suggestion from Karl: froo_tkxt_buffer_insert(),
   froo_tkxt_buffer_delete(), and other functions when appropriate
   will have the convenience feature that they re-initialize
   the iterators they have just invalidated.

   That is, say you do this:

    FrooTkxtIter iter;
    froo_tkxt_buffer_get_iter_at_char(buffer, &iter, 0);
    froo_tkxt_buffer_insert(buffer, &iter, "blah", -1);
    
   The insertion operation invalidates iter because the magic stamp
   was incremented when you changed some indexable segments. However,
   the insert function resets iter so it's still valid (other 
   outstanding iterators have all become invalid, though). 
   
   Basically this allows you to insert a bunch of stuff at a single
   point:
     froo_tkxt_buffer_insert(buffer, &iter, "blah", -1);
     froo_tkxt_buffer_insert(buffer, &iter, "blah blah", -1);
     froo_tkxt_buffer_insert(buffer, &iter, "blah blah blah", -1);
  
   It's very easy to efficiently re-init the iterator after the 
   insert, so there's no real cost to this feature.

   An outstanding issue is what we re-init the iterator to point to
   (start or end of inserted text, end seems like it might be more
   useful but start seems more intuitive).

 - I am going to make marks easier to use, specifically by allowing
   unnamed anonymous marks (right now the Tcl legacy requires you 
   to refer to marks by a string name). Thus there are at least 
   three ways to keep a text position that persists across buffer
   mutations:
     - setting a mark
     - keep the character index (like GtkText)
     - keep the line number and character offset into the line
 
   There are easy functions to convert iterators to/from any of these
   representations. So if you need a persistent position reference
   it should never be a big problem.

 - The iterator does contain a pointer to the buffer, and you can 
   froo_tkxt_iter_get_buffer(). Right now I still have some functions
   that are buffer methods like this:

     froo_tkxt_buffer_insert(buffer, &iter, "blah");

   These could be:
    
     froo_tkxt_iter_insert(&iter, "blah");

   I consider this irrelevant for language bindings (language binders
   could easily wrap things differently, whichever way we choose to 
   do it), but it is an outstanding C API issue.

   The iter_insert() is shorter, but perhaps confusing, since it 
   will result in a signal being emitted on the buffer, and is 
   basically a buffer operation.

Havoc
Follow-Ups:
- Re: iterator resolution
  - From: Karl Nelson
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]