Re: Problems with beagle-0.3.1 with .htm- files



> I am running beagle 0.3.1 (compiled manually) on a openSuSE 10.3 system. I
> have a directory that contains the online articles of a german computer
> magazine named "ct".  The files contained are mostly named ".htm" and contain
> plain html text.
>
> When I try to index the "ct"  directory with beagle it complains with the
> following error message in file current-IndexHelper:
>
> 20080104 09:49:03.9370 17841 IndexH DEBUG: No filter for
> file:///opt/zeitschriften/ct/html/05/19/220/art.htm
> (/opt/zeitschriften/ct/html/05/19/220/art.htm)
> [application/x-mozilla-bookmarks]

This is a problem in recognizing mimetypes of the files. Beagle uses
freedesktop.org spec shared-mime-info and implementation xdgmime to
determine the type of a file. In this case, shared-mime-info diagnosed
those files are mozilla-bookmark files. Mozilla-bookmark files have a
slightly different structure, so beagle does not index them. :(

Technically this is a problem with shared-mime-info. They should have
better rules for deciding the right mimetypes. However, determining
mimetype correctly 100% of the time is impossible; you can try to file
against shared-mime-info and if they fix it, good and fine. Most
likely the .htm files have some incorrect line in the beginning which
make them look like mozilla-bookmark files.

There is one last resort in beagle; needs a bit work. If a file has an
extended attribute "user.mimetype" then beagle will use its value
instead of trying to determine the mimetype. If you can manage to set
the extended attribute to "text/html" for all those files, then beagle
will index them.

I am sorry thats the best solution I have right now.

- dBera

-- 
-----------------------------------------------------
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]