Re: Stemmed search configuration



> Is it posible to configure the stemmed search feture to other languages
> than english (e.g. danish)?

Not yet. The main problem seems to be how to decide the language of the 
data/metadata for each document. Only very few data sources (some html files, 
emails probably) specify the language of the data.

Beagle has the means to use a different stemmer for each document but not with 
different metadata of a document. For most documents, only some data/metadata 
fields are in a different language and the others are generally in English. 
It will be hard to get it right everytime, so currently we just default to 
English.

If you are using 0.3.x and you are willing to modify the source then change in 
beagled/LuceneCommon.cs:
  DEFAULT_STEMMER = "English";
to
  DEFAULT_STEMMER = "Danish";
Beware that this will use the Danish stemmer for "every" data/metadata 
indexed.

- dBera

-- 
-----------------------------------------------------
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE / Mandriva / Inspiron-1100


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]