Hi Joe, If you run a debian based distro: apt-get build-dep tracker. Else in this case you need to install a package which is usually called gobject-introspection. ps. for the office files you'll need GFS, libgfs-1-dev or something. Kind regards, Philip On Mon, 2016-01-11 at 21:33 -0500, Joe Rhodes wrote:
Carlos, et. al., I'm sorry, but I cannot seem to build the master branch right now. I ran the autogen.sh script and then configure dies on me with this: checking for pkg-config... /usr/bin/pkg-config checking pkg-config is at least version 0.16... yes ./configure: line 19136: syntax error near unexpected token `0.9.5' ./configure: line 19136: `GOBJECT_INTROSPECTION_CHECK(0.9.5)' I'm not entirely sure what's going on there. (Sorry, programming is not my forte.) I'll have to wait for 1.7.2 and give that a try. I can only work on this in the evenings when I'm not at work and the server thats housing all of this data is otherwise not terribly busy. Cheers! -Joe RhodesOn Jan 11, 2016, at 5:21 AM, Philip Van Hoof <philip codeminded be> wrote: Hi Carlos, Looks like my git-account has been closed on GNOME, so here is a patch for one of the issues in that valgrind. Kind regards, Philip On Sun, 2016-01-10 at 16:05 -0500, Joe Rhodes wrote:Carlos: Yes, there are a LOT of files on this volume. The makeup of the 5 TB of data is PDFs, Photoshop files, Word docs, InDesign & Illustrator docs. There are very few large files like MP3's or videos. If I disable all the extractors and just build an index based on file names, I get an index of about 3 GB. I did notice that I was possibly indexing all of my snapshots of my volumes. I'm using ZFS and they're available under "/volume/.zfs". I've added that folder to my list of excluded directories: org.freedesktop.Tracker.Miner.Files ignored-directories ['.zfs', 'ZZZ Snapshots', 'po', 'CVS', 'core-dumps', 'lost+found'] I'll see if that makes any difference. If it was digging into those, that would greatly increase the number of files. I'm not entirely sure how to start tracker with the valgrind command. Tracker is currently started automatically by the Netatalk file server process. In order to run the tracker processes, I have to execute the following: PREFIX="/main-storage" export XDG_DATA_HOME="$PREFIX/var/netatalk/" export XDG_CACHE_HOME="$PREFIX/var/netatalk/" export DBUS_SESSION_BUS_ADDRESS="unix:path=$PREFIX/var/netatalk/spotlight.ipc" /usr/local/bin/tracker daemon -t So after stopping the daemon, I just started tried the following: valgrind --leak-check=full --log-file=valgrind-tracker-extract-log --num-callers=30 /usr/local/libexec/tracker-extract valgrind --leak-check=full --log-file=valgrind-tracker-miner-fs-log --num-callers=30 /usr/local/libexec/tracker-miner-fs Hopefully that will get you want you want? I've uploaded the log files files to DropBox. Hopefully you can easily grab those without having to jump through too many hoops. https://www.dropbox.com/s/o3w10hnaa6ikvn3/valgrind-tracker-extract-log.gz?dl=0 https://www.dropbox.com/s/5s4vqk0owrf5gjd/valgrind-tracker-miner-fs-log.gz?dl=0 I let them run for a bit. I could definitely see RAM usage start to climb. I didn't bother to let it go to GB's in size. I think I was about about 300MB when I hit Ctl-C. Cheers! -Joe RhodesOn Jan 10, 2016, at 2:25 PM, Carlos Garnacho <carlosg gnome org> wrote: Hi Joe, On Sun, Jan 10, 2016 at 6:40 PM, Joe Rhodes <lists joerhodes com> wrote:I have just compiled and installed tracker-1.7.1 on a CentOS 7.1 box. I just used the default configuration ("./configure" with no additional options). I'm indexing around 5 TB of data. I'm noticing that both the tracker-extract and tracker-miner-fs processes are using a large amount of RAM. The tracker-extract process is currently using 11 GB of RAM (RES not VIRT as reported by top), while the tracker-miner-fs is sitting at 4.5 GB. Both processes start out modestly, but continue to grow as they do their work. The tracker-miner-fs levels off at 4.5 GB once it appears to have finished crawling the entire volume. (Once the CPU usage goes back down to near 0.) The tracker-extract process also continues to grow as it works. Once it is done, it levels off. Last time it stayed at about 9 GB. If I restart tracker (with: 'tracker daemon -t' followed by 'tracker daemon -s') a similar thing will happen with tracker-miner-fs. It will grow back to 4.5 GB as it crawls its way across the entire volume. The tracker-extract process though, because all of the files were just indexed and it doesn't need to do much, uses a very modest amount of RAM. I don't have that number right now because I'm re-indexing the entire volume, but it's well below 100 MB. Is this expected behaviour? Or is there a memory leak? Or perhaps tracker just isn't designed to operate on this large of a volume?It totally sounds like a memory leak, although it sounds strange that it hits both tracker-miner-fs and tracker-extract. There is obviously an impact to running Tracker on large directory trees, such as: - Possibly exhausted inotify handles, the directories we fail to create a monitor for would just be checked/updated on next miner startup - More (longer, rather) IO/CPU usage during startup, because the miner has to check mtimes for all directories and files - The miner also needs to keep an in-memory representation of the directory tree for accounting purposes (file monitors, etc). Regular files are represented in this model only as long as they're being checked/processed, and disappear soon after. This might account for a memory peak at startup, if there's many items left to process, because Tracker dumps files into processing queues ASAP, but I think the memory usage should be nowhere as big. So I think nothing accounts for such memory usage in tracker-miner-fs, the only known source of unbound memory growth is the number of directories (and regular files for the peak at startup) to be indexed, but you would need millions of those to have tracker-miner-fs grow up to 4.5GB. And tracker-extract has a much shorter memory, it just checks the files that need extraction in small batches, and processes those one by one before querying the next batch. 9GB shout memory leak, we've had other memory leak situations in tracker-extract, and the culprit most often is in the various libraries we're using in our extract modules, if many files end up triggering that module (and the leaky code path in the specific library), the effect will accumulate over time. The downside of this situation is that most often we Tracker developers can't reproduce unless we have a file that triggers the leak so we can fix it or channel to the appropriate maintainers, so it would be great if you could provide valgrind logs, just run as: valgrind --leak-check=full --log-file=valgrind-log --num-callers=30 /path/to/built/tracker-extract Hit ctrl-C when enough time has passed, and send back the valgrind-log file. Same applies to tracker-miner-fs.My tracker meta.db file is about 13 GB right now, though still growing. I suspect it's close to indexed though.This is also suspicious, you again need either a hideous amount of files to have meta.db grow as large, or an equally hideous amount of plain text content that gets indexed. Out of curiosity, how many directories/files does that partition contain? is the content primarily video/documents/etc? Cheers, Carlos_______________________________________________ tracker-list mailing list tracker-list gnome org https://mail.gnome.org/mailman/listinfo/tracker-list<0001-Fix-small-memory-leak.patch>
Attachment:
signature.asc
Description: This is a digitally signed message part