Re: Proposed module: tracker



Danilo Šegan wrote:
Hi Jamie,

Hi,


Yesterday at 20:13, Jamie McCracken wrote:

Also I would add that the sophisticated snowball stemmers that tracker uses allow for more accurate searches - EG "Fly" and "Flies" will match against "Flying"

Now we're getting to an interesting I18N point (and you should take
into account that I am in no way NLP expert):  how do these stemmers cope
with languages other than English?

see http://snowball.tartarus.org/

tracker has stemmers and stopword lists for all the languages listed there.

the chosen stemmer is currently determined by either --language param passed to trackerd or by your locale.

We will also look at the stuff mentioned in bug:
http://bugzilla.gnome.org/show_bug.cgi?id=377891

so supporting language detection and applying appropriate stemmer is something we will be looking into.


And can we add subtle features like Google search has had with
script transliteration mappings (for example, it will search both
Serbian Cyrillic and Latin texts if you input Serbian Latin; for
GNOME, I'd like the search to work both ways in Serbian locale)?


We would first need the code to do the translation (or an optional library we could use to do it).

If above code is reasonable then I dont have a problem supporting it.

--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]