Question to beagled & character encodings



Hi all,

I have indexed a mix of English and German html documents downloaded
as "Web page complete" via Firefox. Everything works fine and the pages
get indexed by beagled.

However, there seems to be a problem with German Umlauts or in general
with character encodings, e.g.:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta name="Author" content="Thomas Frenzel">
<meta name="Generator" content="NetObjects Fusion 4.0.1 für Windows">
<meta name="Keywords" content="DH, Downhill, Enduro, Enduro - Zschopau, Mountainbike,
Mountainbiketouren, geführte Touren, Stülpner, "><title>Löwenkopftrails</title></head>
...
...

Even when the page is marked clearly with "charset=ISO-8859-1" the term
"Löwenkopftrails" is displayed in "Beagle-Best" only as "Lwenkopftrails
- the German "ö" is missing. Also only a search for "Lwenkopftrails"
brings out a result, "Löwenkopftrails" returns nothing.

I'm running a Novell/SuSE 10.0 system, English as primary language (env:
LANG=en_US.UTF-8). Is there a way to prevent / workaround/ configure this ?

Thanks & kind regards,
  Stephan.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]