On Tue, 2005-11-15 at 09:50 +1100, Peter Harvey wrote: > Hi all, > > I think people are generally getting tired of EphyNode and some of it's > issues. In many ways the basic *idea* is nice, but it's just become too > complicated. A proposal to replace the bookmarks/history databases > (which are EphyNode collections) with SQLite was floated. I'm going to > take a shot at defining the functions of an SQLite-backed database for > history/bookmarks. Sounds like a lot of fun! > Second, memory usage can be a pain. I believe we currently keep in a > copy of all the information for each bookmark, and *then* make a copy > for each widget which uses that information. This is a GTK+ thing, and > unavoidable. So using SQLite doesn't make memory usage any worse. In fact, since it's disk-based, we're essentially determining an upper bound on memory usage at compile-time (not counting GTK's memory usage). For large databases, memory usage would be better than with EphyNode. (The break-even point might be at several thousand nodes, depending on our choice of cache size, though.) > Database layout (warning: I am not a database person). > ------------------------------------------------------ (I am) > Each 'node' of information is identified by a unique integer. We store > the statement "node X belongs to a set Y" as "node X is a child of node > Y". > > table 'set' > int parent # index on this field > int child > > table 'pages' > int id # primary key > string url # url > string title # title from last visit > string icon # icon from last visit > string btitle # NULL if not bookmarked otherwise it's bookmark title > string bicon # NULL if should default to icon > date visit # date of visit > > table 'sites' > int id # primary key > string name # things like 'slashdot.org' or 'abc.com' > > table 'topics' > int id # primary key > string title # title of the topic You want 'sites' and 'topics' to share a primary key: this means they could share the same table, which would provide more extensibility. table 'groupings' int id # primary key short int group_type_id # type of grouping: enum for 'sites'/'topics' int name # topic or site name (To me, this type of group_type_id is reminiscent of EphyNode's database IDs.) If we could make group_type_id more abstract, we might even be able to let extensions define arbitrary group types. Assigning IDs could be tricky, though (just as it is now with EphyNode DB IDs). This may not be necessary in a preliminary API, but it's a problem we've already had with EphyNode, so it's something to keep in mind. Somebody mentioned metadata. Maybe we should take a page out of Storage's book? There is an effective design pattern in Storage's source code on GNOME CVS, but I'd definitely leave that out of any initial implementation. (It would be added in separate tables keyed by the same ID as in 'pages'.) The ephy_db_query_* API would need to be modified (maybe using varargs, like g_object_get()) if we want to go down that route. I'm not sure I like the "pages.b*" fields (they're mostly NULL, which is icky), but I can't think of a better method right now. > Basic query mechanisms > ---------------------- > An example query for ephy_db_query_sites is: > SELECT <fields from data bitmask> FROM > (SELECT DISTINCT child FROM set WHERE parent=<parent>) set, > pages > WHERE pages.id = set.child > GROUP BY <fields from group bitmask>; 1. You're missing ORDER BY -- one of the key advantages of using SQLite is that it can help us order stuff very quickly. 2. By my understanding, that DISTINCT is superfluous. Am I missing something? 3. That's a rather wonky syntax (though it may be faster, I don't know). A more standard one would avoid subselects: SELECT <fields from data> FROM pages INNER JOIN set ON pages.id = set.child WHERE set.parent = <parent> GROUP BY <fields from group> ORDER BY <fields from group> Though your query may be faster: with my previous experience with SQLite, I've noticed it handles subselects a lot more easily than joins. Maybe my proposal is a step backwards. If I recall correctly, my optimized queries looked like this on my last project: SELECT <fields from data> FROM pages WHERE id IN (SELECT child FROM set WHERE parent = <parent>) GROUP BY <fields from group> ORDER BY <fields from group> I recommend trying all three and seeing which works best :). > Basic data mechanisms > --------------------- > > A simple caching scheme will allow multiple ephy_db_query_page calls > (made in quick session to the same node) to be performed very fast. A > simple getting, setting and signaling scheme will also make life easy. SQLite already does caching. We could save a lot of work by taking advantage of it. All we have to do is specify a cache size. (This would work extremely well with the structure above, but maybe BLOB metadata would be a problem....) > Now, the callbacks are called whenever a node or the child of a parent > node changes (respectively). The callback is passed a bitmask of the > data fields that changed. > > To know what parents to signal we will need to execute a query on the > database. This is slow, but very rare, and will be very fast if we just > cache information for the most recently touched/created node. "SELECT parent_id FROM groupings WHERE child_id = <child>" -- this will be *blazingly* fast, considering every modification of a child requires user interaction. > We may need some specialist functions, but they would become trivial to > implement. In particular a "this child was added to a parent" would be a > nice signal and data combo. Yeah, this would be a really awesome hook for extensions. > OK, that's it. Let me know what you think. I, for one, love it :). Adam
Attachment:
signature.asc
Description: This is a digitally signed message part