Re: New module proposal: tracker



> at low I/O priority, without unpleasantly degrading system performance.
> I imagine the sheer seek cost of pulling all those dentries, inodes into
> memory, and evicting all the other useful data you had around - is a big
> part of the plague. Hopefully btrfs will improve the situation somewhat
> here, but wrt. inode / dentry management I suspect there is no really
> good solution.

On rotating media its seek and access times. This is amplified on most
older systems by the fact ATA devices had no queueing interface so the
drive couldn't do any smart re-ordering to extract further parallelism.
SSD is more important here than btrfs. Filesystems can try to be clever
and hide the fact rotating media sucks for latency versus processing
power, but only SSD actually fixes the problem properly.

> 	Unfortunately, as soon as we have this, it is only a small
> feature-creep step to "lets index all .c/.h files to extract comments in
> the API documentation" - which (I suspect) then commits you to the
> disaster of irritating a lot of developers - so they turn it off, and
> getting bogged down indexing things no-one is ever going to want indexed
> by tracker (?).

I think there lies a misassumption. The actual indexing has a fairly high
cost. The cost of extracting metadata while indexing ought to be
relatively low in comparison. That argues that allowing stuff to plug
into the indexing based on file type is useful. It's not really function
creep either given the only interface the indexer needs is

	- who is associated with this file type (which exists)
	- give me your metadata for this file content

and if there is nobody wanting to do so then who cares. If apps provide
the interface for metadata extraction (into a tag soup or something) then
if you don't have the app installed you won't index for it. Document to
tag ought to be fast.

> 	Personally, I'd start by ignoring any directory tree with a configure*
> script in the top-level, or perhaps a .git / .svn directory - that
> should reduce the inotify pain :-)
> 
> 	So - my point is: are the devs fetching source code at the console -
> that you are concerned about above, really in the target audience for
> tracker ? and if so why ? 

How about "who sent that patch, what are the related emails and when were
they last on irc" - a classic developer query. Possibly bundled in with
"do I have a picture of them" (conferences) and "who are their close
friends" (other ways to get hold of and see connections), "where are they
right now" (irc connecting address, email headers and geodata for IP
addresses). Or in short - developers are not different. A lawyer wants to
do the same thing within a firm for a case note, an CAD designer for a
design change, a secretary for letters, etc.

Physical indexing (the file walking side), extracting meaning and query
processing are three unrelated tasks. In your developer case if I've got
various git helpers installed it would be nice that the indexer bothered
to talk to the git plugins about source code and git trees. If I don't
have them installed it doesn't need to - its a modular problem.

Maybe you also need to learn what types of metadata people use the most
for presentation (eg by what links they follow) but thats another story
in the UI anyway.

Alan


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]