On 11/01/2013 05:37 AM, Martyn Russell wrote:
So there are a few things... first, I would check that the file is indexed before searching ... if it isn't then you won't find those words.
I thought I confirmed it was indexed with my previous example tracker-search that returned a result for it -- albeit by it's name, not it's contents.
$ tracker-search --disable-color -l 1000 pdf Results: ... file:///home/brian/tmp/2013-10-26-3.pdf 2013-10-26-3.pdf
Note that tracker-extract does not index the file, it just extracts the information,
Yes, understood. I just wanted to confirm that the extracter was finding something for the indexer to index.
usually tracker-miner-fs calls APIs to talk to tracker-extract. The example above is really just a way to see what we find in a file you specify on the command line.
Indeed. That's exactly what I wanted. The first step in debugging the processing chain.
Is the file above file:///home/brian/tmp/2013-10-26-3.pdf ?
Yes. With the real (and confidential) content replaced in the example with the "list\nof\nwords\nseparated\nby\ncarriage-returns\n". I guess you will just have to trust that I used one of the words from the real nie:plainTextContent in my search query. :-/
To make sure the file is indexed, you can use tracker-control -f $FILENAME and it should take care of that for you.
OK. Let me give that a whirl. $ tracker-control -f tmp/2013-10-26-3.pdf (Re)indexing file was successful And a search for strings in that file were successful. So was it always there or did the "tracker-control -f ..." cause it to be there? Let's find another example file to work on... I had another copy of the same file in the same directory named 2013-10-26-2.pdf (-2 instead of -3) and the search for the string "RT0001" only returned the -3.pdf result. Then I ran: $ tracker-control -f tmp/2013-10-26-2.pdf and then magically (well, not really so much magic) the -2.pdf file was being included in the results. So, there are definitely PDF files with text on my filesystem that are not indexed but will get indexed when "tracker-control -f" is pointed at them. I guess there is not much to do here but wipe the whole database and start a brand new scan from fresh, right? I mean otherwise who knows what's been properly indexed and what has not. Just to make sure I am doing it correctly, what method would you like me to use to do that complete wipe/rescan?
We are on IRC if you need more help... let us know :)
Will do, thanks. b.
Attachment:
signature.asc
Description: OpenPGP digital signature