Indexing PDF



David C Sterratt <david c sterratt ed ac uk> proclaimed:
>* Content indexers for pdf, openoffice and word docuements.  If it
>  really is quite simple to add support for these (e.g. by specifying
>  command line tools to call) and someone could tell me where to look
>  in the code, I'd be happy to work on this.

For PDF, take a peek at "pdftotext".  On Red Hat Linux it's part of the
xpdf package; the URL given is "http://www.foolabs.com/xpdf";.
Probably sufficient for indexing, at least a first cut.


--- David A. Wheeler
    dwheeler ida org





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]