Basic terminology:
o. A summary contains n summary blocks
o. A summary block contains ~1000 summary items
o. A summary item has flags, cc, uid, from, subject, etc etc
New:
o. Writing the data (persisting it)
o. Freeze and Thawing
o. Keeping state about flag-changes, expunges and appends
Missing:
o. Defining the filenames of summary blocks. Right now will only one
summary block be created for all items (number 0), resulting in three
files: data_0.mmap, index_0.idx and flags_0.idx
In future the idea is to have data_n.mmap, index_n.idx and
flags_n.idx files (the three per summary block created).
50,000 items will indeed and effectively result in 50 data_n.mmap
files being mmap()ed if the entire folder is needed. If a search
result of a search on the summary's data caused hits in 10 of the 50
files, then 10 files will be mmap()ed.
Hence why grouping them together on sequence number: most searches
yield results that are close together in time. Sequence numbers on
IMAP servers are usually grouped in time too (relatively grouped in
time, depending on various things).
o. Error strategy: what if there's not enough space to write the summary
files? What if a file's gone missing?
o. A flock(): what if a second process tries to access the same mapped
files? I think by just flock()-ing the persisting functions, we are
relatively save already (I just wonder what happens with my read-only
mapping if a rename()-overwrite happens on a mapped file).
Of course is the advise for application developers to either use a
new cache-dir per process or to have a service that gives the data to
both applications over an IPC system (but that's not the point of
protecting the processes from influencing each other: what if still
the app developer did it wrong? What can we do about that?)
-- I know this is hard to cope with, perhaps just a g_critical and an
abort() if we detect this situation? (how do we detect it?) --
Writing strategy:
o. I keep a "has_flagchg", a "has_expunges" and a "has_appends". These
are the tree types of changes that are possible for a summary. I keep
these booleans per summary block
o. The functions summary_item_set_flags, summary_add_item and the
functions summary_expunge_item_by_uid and summary_expunge_item_by_seq
will modify the values of those booleans.
o. The summary_freeze function will make a function called
summary_block_persist refrain from actually writing for every summary
block in the summary passed as parameter to summary_freeze
o. The summary_thaw function will unset the freeze on each summary block
in the summary passed as parameter to summary_thaw. On top of that
will it call the summary_block_persist for each summary block in that
summary.
o. The summary_block_persist function checks what the best write
strategy will be by evaluating the booleans has_flagchg, has_expunges
and has_appends.
- If has_flagchg but not has_expunges and not has_appends then a
function that just writes the flags-file is utilised to perform
persisting the summary block.
- Else if either has_expunges or has_appends then a function that
writes the flags-file, the index-file and the data-file is utilised
to perform persisting the summary block.
o. The persisting of a summary-block happens by first sorting all
strings by occurrence and making them unique. Then the unique strings
are in that sort-order written to the data-file and offsets to the
strings are updated into the summary item pointers using ftell().
The index file is written using the pointers of the summary items,
meanwhile the flags file is written using the flags of the summary
items.
The data-file is now mapped and the summary items re-prepared.
The summary block is persisted in a VmRss friendly way.
o. When adding summary items to the summary (which will select a summary
block where the item will be added to using the requested sequence
number), the caller must attempt to avoid string duplicates for the
CC and TO fields of the items by sorting the addresses in the comma
separated strings of the items. Currently will the experimental
example do this for you. This further reduces VmRss as you'll have
singled-out more data as duplicate and made more data unique in
memory this way.
Please test :)
--
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be
gnome: pvanhoof at gnome dot org
http://pvanhoof.be/blog
http://codeminded.be
Attachment:
mytest9.tar.gz
Description: application/compressed-tar