Re: [Fwd: Re: GNOME and advanced search indexes viability]



On Wed, 2003-04-02 at 18:54, Manuel Amador (Rudd-O) wrote:
> >
> >
> >Why not work on Medusa?  It is a gnome2 application now.  It's security
> >is largely address.  Adding content indexers is simple and quick.  It is
> >integrated with gnome-vfs so nfs/smb is a non issue.  It's command line
> >tool can be extended in a few days to impersonate locate/find+grep to
> >update the gnome search tool.
> >
> I initially thought of extending medusa, as witnessed by people on this 
> mailing list.  What turned the decision against initially using medusa 
> or basing code on it were several issues:
> * we couldn't dual-license products based on medusa - GPL.  we do intend 
> to GPL our work, but we will dual-license it as well, à là Qt.

Good point and if your certain that is what you need, then it certainly
rules out medusa.  As your interested in writing a service, then writing
a dual licensed service that uses the GPLed medusa as a
datasource/backend is feasible.

> * documentation for medusa is ZERO

So true.  I've spent most of my time learning the code instead of
extending it.

> * it's written in C, making development slow and making it hard to get 
> people around here to work on it

A matter of perspective.

> * the implementation is per-user, instead of being per-system.  that 
> means several medusa indexers and several indexes, instead of one master 
> index.

There is nothing stopping anyone from setting up a master index.  I do
as a matter of fact using symbolic links.  I've put thought to a more
practical implementation of users and system indexes and sharing them. 
I don't think there will be a real solution to this, or even your own
system, until a content search tool has a user base large enough for the
community to really get what it offers.  

> * we don't want a hundred PCs indexing the NFS server each.  we want the 
> search service to delegate queries to NFS servers, so as to avoid 
> network load and wasted disk space

Truth.  Either by mining the old service code from the attic, or by
creating a new service that you can dual license solve the problem.  I'd
be in favor of building a new service.  I agree with your approach--a
service is needed.

> * as there is no documentation, we don't know if Medusa can index 
> gigabytes of files, extract their data and metadata, and provide 
> less-than-10 second query response.  Our initial prospect for the 
> database, PostgreSQL, can indeed provide less-than-10 second response 
> for queries, provided the proper indexes are applied to the proper tables.

I believe it can. My index of is made from 7 gigs (I have a 1.6ghz
Celeron).  this is the output a a query:

msearch -i Home -u 'gnome-search:[file:///]content includes_all_of
chovey medusa'
Took 0 seconds, and 3 milliseconds
Begin location for word chovey is 255360
End location for word chovey is 255902
Begin location for word medusa is 2055001
End location for word medusa is 2055363
Took 0 seconds, and 399 milliseconds

I'm sure PostgreSQL can exceed that.  Medusa uses db1, taking to the
21st century (db4) might make it faster.

> But if you could help me work through these issues, we would be glad 
> (after all, we'd be saving work) to do this. 
> 
> Trust me, what we want to do is much bigger than just medusa.  We want 
> to bring enterprise-class full text indexing and search to Linux, *and* 
> open-source it.  We also will be looking into data mining, to provide 
> document maps and the like.  This all when the basic technology is ready.

Indexers are easy to write

If your really committed, to your vision then go for it.  Building a
enterprise level application take a lot of time and labor, possibly more
than your estimating.  Medusa was funded by Eazil and had full-time
developers working on it for more than a year; it's not a small
undertaking.  Medusa is very powerful, poorly understood for what it is,
and has has been neglected even though it is very near usable. 

I agree that free software needs a good content search service, and I
think you can get to your stated goal faster by championing a new vision
for medusa.  Just building a service that calls a medusa backend via
gnomevfs will get you something to measure where to go next.  If medusa
falls short, replace it. If it works, then enhance your service and
extend medusa.

I haven't been neglecting medusa.  I was side tracked by a false
indexing problems for 3 months (in the end the hard-drive died).  Now
that I have a new laptop, I'm focusing on connecting nautilus to
medusa.  The way nautilus connects could be the same means your service
would connect.  Medusa works right now, but few people know it.  There
wont be a big demand for content search enough users experience it and
understand that it compliments object and navigation data storage (the
desktop).  

The same goes for the enterprise world--I am employed by my company
(Time Life Inc. sadly) to know all thing about e-commerce, and anything
it might relate too.  I use medusa to get fast answers.  Getting the
other employees to search (or get the courage to read the help file) is
difficult because that have little experience with it, and even less
success.  When the users get experience, they will want a content search
service, and they will want a good one.

-- 
__C U R T I S  C.  H O V E Y____________________
sinzui cox net
Guilty of stealing everything I am.

_______________________________________________
gnome-list mailing list
gnome-list gnome org
http://mail.gnome.org/mailman/listinfo/gnome-list



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]