Re: Two questions about medusa



>>>>> Curtis Hovey writes:

 > On Tue, 2003-05-06 at 07:00, David C Sterratt wrote:
 >> > I've toyed with the plugin idea as it might get some things done
 >> > quickly.  I'd like to bring some intelligence to what is
 >> > indexed, and the plain text indexer cannot handle that.  XML
 >> > content like OpenOffice is very rich and it would loose it some
 >> > of it's meaning and relevance if it were crudely converted to
 >> > plain text.  PDFs don't have any meaning. They would be fine in
 >> > your solution.  We need to weigh the capability of adding ad hoc
 >> > indexers verses their potential dependencies.
 >>
 >> How would one use the rich semantics in openoffice files as search
 >> terms?  At the moment, the searching semantics only allow for
 >> "contatins any or all of".  I suppose you could extend them to
 >> things like "author matches", but that might be confusing if some
 >> of the other documents you're searching don't have the rich
 >> semantic information, since you wouldn't be able to retrieve (say)
 >> PDF files written by a particular author, but you would get OO
 >> files written by them.

 > I wasn't think about search terms, but ranking of relevance.
 > Medusa returns the matches, but there is no attempt to locate the
 > best match, or order rank from best to worst.  One method to
 > accomplish this is to record not just the words in a document, but
 > their incidence too.

If you're thinking of relevance ranking, were you also thinking of
presenting the results in a format similar to Google or htdig with
some context of the search terms?  Although it's not a feature that
I'm desparate for, I've heard at least one person say that they'd like
"Google for their filespace".  Of course, part of that is the ranking
as well, but it might be nice to have the context too.

David




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]