Re: [Tracker] HitsAdded, HitsRemoved and HitsModified for Xesam
- From: Philip Van Hoof <spam pvanhoof be>
- To: Jamie McCracken <jamiemcc blueyonder co uk>
- Cc: Tracker List <tracker-list gnome org>
- Subject: Re: [Tracker] HitsAdded, HitsRemoved and HitsModified for Xesam
- Date: Wed, 30 Apr 2008 16:25:17 +0200
On Wed, 2008-04-30 at 10:06 -0400, Jamie McCracken wrote:
Im still a little confused by this
due to the indexer split, the non-indexer daemon knows when a file has
changed already (via inotify) but the code you changed is part of the
indexer
That's correct. We can use this to know whether or not we should start
periodically updating the live searches. For example if after a few
frequencies we don't see a new inotify nor does the indexer start
indexing ... we can shut the check-per-frequency down.
What we don't want to do is to ignite checking all the live queries each
time any new piece of material arrives.
Then we might as well just make a trigger that inserts into a virtual
table, and a sqlite-vtable implementation that acts on the ON-INSERT
that checks each live-query and emits a HitsAdded, HitsRemoved or
HitsModified (which I realise would be a significant performance hit).
Hence collecting them in a journal, and periodically handling them.
I would have thought having a GSList in the non-indexer daemon would
suffice (the list would store an Info struct with details about the file
changed - EG mime and service)
Then periodically for all live queries simply iterate over that list and
determine if live query needs refreshing and emit signals if results
have changed
Determining that requires evaluating the query. So we need a mechanism
to evaluate whether the live query is affected.
The best mechanism is simply reusing the same mechanism that was used
initially. And that is the query that we converted from the Xesam XML
stuff into the SQL query used to get the GetHits/GetHitsData and fed to
us during the NewSearch.
Else we'll be throwing all HitsAdded, HitsRemoved and HitsModified to
all live queries (because there's no way to determine whether or not the
live query we're currently evaluating was affected by a specific event).
So the "iterate over that list and determine if live query needs
refreshing" is a hard problem to solve ;-), it's not simply.
Does the above make sense?
It does.
Thanks
On Wed, 2008-04-30 at 15:39 +0200, Philip Van Hoof wrote:
FYI,
The diff contains a first look at the tracker-db-sqlite.c file, I added
some comments that illustrate how a journal table "Events" will be
filled up.
Note that the table will most likely become a sqlite memory table.
The reason why I don't think a GHashTable in the C code is as good is
because we want to repeat the query in the TrackerXesamLiveSearch on
this "Events" table (for example with an INNERT JOIN with Services).
If it where a GHashTable, that query would either need a lot of OR
clauses (each ServiceID in one OR) or we'd need to do a query for each
item in the table to check whether the items affect a live search.
/me is the master of pseudo code, here I go again! 
For each query in live-search-queries do
  // This one sounds like the best to me. It requires a In-Sqlite
  // In-Memory table called "Events"
  SELECT ... FROM Events, Services ... 
    WHERE   Events.ServiceID = Services.ID 
    AND     the live-search-query 
    AND     (ServiceID is in the table)
  // Pro: short arguments list, easy query
  // Con: JOIN (although the cartesian product is relatively small)
or
  // This one doesn't need a "Events" table in sqlite but does need a
  // In-C In-Memory GHashTable holding all the affected ServiceIDs
  SELECT ... FROM Services ... 
    WHERE   the live-search-query 
    AND     (
                       ServiceID = hashtable[0].key
                    OR ServiceID = hashtable[1].key 
                    OR ServiceID = hashtable[2].key
                    OR ServiceID = hashtable[n].key
                    ...
            )
  // Pro: no JOIN
  // Con: long arguments list
done
On Tue, 2008-04-29 at 17:56 +0200, Philip Van Hoof wrote:
Pre note: 
This is about the Xesam support being done (since this week) in the
indexer-split.
About:
Xesam requires notifying live searches about changes that affect them.
We plan to implement this with a "events" table that journals all
creates, deletes and updates that the indexer causes.
Periodically we will handle and then flush the items in that events
table.
I made a cracktasty diagram that contains the from-a-high-distance
abstract proposal that we have in mind for this.
This is pseudo code that illustrates the periodic handler:
bool periodic_handler (...) 
{
  lock indexer
  update eventstable set beinghandled=1 where 1=1 (all items)
  unlock indexer
  foreach query in all livequeries
     added, modified, removed = query.execute-on (eventstable)
     query.emit_added (added)
     query.emit_removed (removed)
     query.emit_modified (modified)
  done
  lock indexer
  delete from eventstable where beinghandled = 1
  unlock indexer
  return (!stopping)
}
Here's a piece of IRC log between me and jamiecc about the proposal:
pvanhoof ping jamiemcc 
pvanhoof same thing
pvanhoof I'll make a pdf
jamiemcc oh ok
pvanhoof Sending
pvanhoof ok
pvanhoof so
pvanhoof it's about the hitsadded, hitsremoved and hitsmodified signals for xesam
pvanhoof What we have in mind is using a "events" table that is a journal for all creates, deletes and 
updates
pvanhoof Periodically we will flush that table, each create (insert), update and each delete we add a 
record in that table
pvanhoof We'll make sure the table is queryable in a similar fashion as how the Xesam query will execute
pvanhoof In the periodical handler we'll for each live search check whether it got affected by the 
items in the events table
pvanhoof In pseudo, the handler:
jamiemcc sounds feasible
pvanhoof gboolean periodic_handler (void data) {
pvanhoof   lock indexer
pvanhoof   update eventstable set beinghandled=1 where 1=1 (all items)
pvanhoof   unlock indexer
pvanhoof   foreach query in all live queries
pvanhoof      added, modified, removed = query.execute-on (eventstable)
pvanhoof      query.emit_added (added)
pvanhoof      query.emit_removed (removed)
pvanhoof      query.emit_modified (modified)
pvanhoof   done
pvanhoof   lock indexer
pvanhoof   delete from eventstable where beinghandled = 1
pvanhoof   unlock indexer
pvanhoof }
pvanhoof I've send you a diagram that you can look at as if it's a state-activity one, a ERD and a 
class diagram :) now how cool is that?? :)
pvanhoof it's just three columns, although the ERD is quite simplistic of course
jamiemcc yeah just go tit
* fritschy (~fritschy 84 19 173 195) has left #tracker
pvanhoof so, the current idea is to adapt those stored procedures into transactions that will also add 
this record to the "events" table
* fritschy (~fritschy 84 19 173 195) has joined #tracker
pvanhoof Which might not be sufficient, and we kinda lack the in-depth know-how of all the db handling 
of tracker
pvanhoof So that's a first issue we want to discuss with you
pvanhoof The other is stopping the indexing, restarting it (locking it, in the pseudo code): what you 
think about that
jamiemcc ok I will need to think about it - I iwll probably reply later tonight and we can discuss 
tomorrow
pvanhoof I adapted my initial proposal to have two short critical sections rather than letting the 
entire periodic handler be one critical section
pvanhoof that way the lock is smaller
jamiemcc the indexer will be seaparte process so will need to be locked via dbus signals
pvanhoof by just adding a column to the events table
pvanhoof yes but I guess we want any such locking to be short
jamiemcc well yes 
pvanhoof then once the items that are to be handled are identified, we for each live-search check 
whether the live-search is affected
pvanhoof and we perform the necessary hitsadded, hitsremoved and hitsmodified signals if needed
pvanhoof if all is done, we simply purge the handled items from the events table
jamiemcc the query results will be store din temp tables
pvanhoof which is the second location where we want the indexer to be locked-out
jamiemcc remember a query may be a cursor so wont include entire result set
pvanhoof No okay, but that's something the check needs to worry about 
pvanhoof so ottela is working on a query for the live-search
jamiemcc ok cool
pvanhoof and if we only want to update if the client has the affected item visible, due to cursor-usage
pvanhoof then i guess we'll somehow need to get that info into trackerd
jamiemcc any reason we dont store whats change din memory rather than sqlite table?
pvanhoof oh, that's abstract right now
jamiemcc o
jamiemcc ok
pvanhoof "tracker's event table" can also be a hashtable for me ..
jamiemcc yeah fine
pvanhoof implementation detail
pvanhoof since it doesn't need to be persistent ...
pvanhoof difference is that either we use a memory table and still a transaction for the three stored 
procedures
pvanhoof or we adapt code
jamiemcc prefer hashtable as amount of data will be small
jamiemcc can even be a list
pvanhoof ok, your comments/ideas on this would of course be very useful btw
jamiemcc yeah I will think about it more tonight and get back to you
pvanhoof sounds great
pvanhoof I'll make a mail about this to the mailing list? or I await your ideas tomorrow?
pvanhoof I'll just wait for now
jamiemcc you cna mail if you like
jamiemcc I will reply to it
_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list
_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list
-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be
[
Date Prev][
Date Next]   [
Thread Prev][
Thread Next]   
[
Thread Index]
[
Date Index]
[
Author Index]