Re: New module proposal: tracker



On 06/11/09 13:08, Alexander Larsson wrote:
On Fri, 2009-11-06 at 11:55 +0000, Martyn Russell wrote:
On 06/11/09 10:54, Alexander Larsson wrote:
On Fri, 2009-11-06 at 10:15 +0000, Martyn Russell wrote:
Thats cool. Although I don't think CPU load per-se is the main problem.
CPU scheduling is pretty easy to control such that a process only runs
when nothing else runs. The main problem is i/o costs (increased amount
of seeks causing degradation of application i/o) and general VM
behaviour (filling buffer caches, bumping out other apps from memory,
etc). These things are much much harder to control and measure.

Yea, I agree. This doesn't happen that often though. We only crawl once
on start up (which is quite cheap in my experience) and we are playing
with the idea of only doing it initially for first time indexes. About
this idea, the question is, how can we guarantee when the computer is
being shutdown to not miss file updates before the next boot? There is
also the case where we restart the monitoring daemon and we miss updates.

There has been discussions on the lkml about having persistant recursive
mtimes. That would solve this on filesystems that support it. Without
this all we can do is crawl on each startup, although if its not the
first indexing time such crawling can be done on an even lower prio
(with timeouts now and then perhaps). It is also less likely to cause
i/o starvation, because we mainly read directory entires, not files.
However, it would still read all inodes for all files in your homedir
which is a bunch of HD seeks and may use a fair amount of the buffer
cache.

Yea, we would love to see recursive mtimes. We would still have to do FULL crawling on Windows file systems though :/ I guess that can't be avoided though. This really only impacts removable devices.

The seeking is hard to avoid, but there may be ways to readdir stuff
without having the result be persistant in the buffer cache, although
i'm not sure how posix_fadvise(POSIX_FADV_DONTNEED) can be applied when
reading a directory...

Yea. We need to research this a bit more to make sure we are doing everything we can. We did try the posix_fadvise() call and it was fine on the desktop, but for small devices like the n900, it caused a slow down which meant we dropped it.

I should add, we have significantly reduced the number of times we call stat() (from 0.6->0.7). Previously we used stat in trackerd, then again in tracker-indexer, then did something in tracker-extract (which usually involved opening the file). I think we did it several times in some cases too. Just looking at our code again, I think I have found a way to reduce the calls again :)

Also, there are other tricks you can use like sorting readdir() result
by inode before stating to minimize seeking. Also, one could look at
struct dirent->d_type if its DT_DIR to avoid stating directories when
crawling (for the systems that support this).

Thanks for the input here Alex, very useful stuff.

--
Regards,
Martyn


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]