Re: Followup: opinions on Search services



You guys probably know better than me, how does Beagle do to watch so
many directories at once?  I've resorted to using inotify, of course, as
well, but what I do is rather kludgy: once setting a dir watch fails, I
try and duplicate the inotify limit via /sysfs, then go on and retry the
operation.  Unfortunately, while this actually lets me watch tons of
dirs (167.000 at last count, primarily due to my mp3 collection), I am
not sure whether this is actually a "bright idea" (TM).

El jue, 07-04-2005 a las 12:13 -0400, Miguel de Icaza escribió:
> Hello,
> 
> > > Adding on to this, if one designs their programs correctly the actual
> > > call overhead is negligible.  The only reason one would optimize by
> > > using a lower level language is if a block of code, usually in some sort
> > > of long running loop, is taking too long to finish.  In that case most
> > > of the time is spent in the call itself rendering the overhead of making
> > > the call negligible.
> > 
> > The issue here is memory and the garbage collector rather than loops. 
> > The Boehm GC is particularly slow at allocating large objects on the 
> > managed heap and the resulting fragmentation causes both poor 
> > performance (the GC spends an inordinate amount of CPU time searching 
> > for free blocks) and excessive memory consumption.
> 
> Those statements in general make sense, but they do not apply to Mono or
> Java using Boehm GC.
> 
> The reason why this is not an issue with Mono/Java is because we use the
> "precise" framework of Boehm GC, where we explicitly register the types
> and layouts of objects allocated with it, so Boehm only scans the parts
> that actually can contain pointers instead of all the blocks (the
> default mode of execution).   
> 
> This has huge performance implications.  You are correct that naive use
> of Boehm is in general an underperformer, but the situation changes
> drastically when employed as a precise GC. 
> 
> Boehm still presents problems, the major one is the lack of a
> compacting GC.  This leads to a situation where you can fragment the
> heap.  Very much in the same way that every C++ and C applications
> fragment the heap today.
> 
> The situation could get bad if you allocate large blocks (multi-megabyte
> blocks) that you do not use and depend on the GC to free them.  This
> problem can be fixed problem by assisting the GC (clear your variables:
> a = null) or use the Dispose pattern for large objects (this in fact was
> the major source of issues in Beagle). 
> 
> > Indexing large files requires dynamic allocation of large amounts of 
> > memory hence my opinion that garbage collected languages are not optimal 
> > for this situation. Im not a luddite and I do like both python and C# 
> 
> The above is not true, you only need a few buffers to index it.
> 
> Let me illustrate with an example:
> 
> 	"To index a 1 gigabyte file, do I need 1 gigabyte of memory?"
> 
> Clearly if your answer is `yes', then you are not the most astute
> programmer, nor the sharpest knife in the drawer.
> 
> > and I would certainly use them for GUI stuff over C anyday. However for 
> > a back end service that is  both CPU and memory intensive I maintain 
> > that IMHO C in this particular case is a better choice.
> 
> Luckily, your ideology does not match reality.
> 
> As Beagle and the extensive set of applications built with Lucene in
> Java and .NET prove they are adequate languages for the task (and there
> is now this distributed open source search engine built with Java as
> well).
> 
> Miguel.
> 
> Miguel
-- 
Manuel Amador <rudd-o amautacorp com>
Amauta



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]