Indexing PDF

>>>>> David Wheeler writes:

 > David C Sterratt <david c sterratt ed ac uk> proclaimed:
 >> * Content indexers for pdf, openoffice and word docuements.  If it
 >> really is quite simple to add support for these (e.g. by
 >> specifying command line tools to call) and someone could tell me
 >> where to look in the code, I'd be happy to work on this.

 > For PDF, take a peek at "pdftotext".  On Red Hat Linux it's part of
 > the xpdf package; the URL given is "";.
 > Probably sufficient for indexing, at least a first cut.

Thanks for the tip.

I perhaps didn't make it clear, but I was really wondering about where
in the medusa codebase programs like pdftotext could be hooked in.  If
I understand my quick look today at the medusa code, at least a c-file
or two are needed to implement each type of converter...


 > --- David A. Wheeler
 >      dwheeler ida org

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]