lør, 2002-09-14 kl. 07:00 skrev Jeffrey Stedfast:
Ever stop to think how many non-identical messages you've wiped out that way? Message-Ids are not guarenteed to be unique. Theoretically removing duplicate messages based on message-id is not much better than removing duplicate messages based on the Subject header (it's only better because it is assumed that msg-id is generated using at least a somewhat random sequence of characters... but how random is random? If you've ever played with rand() you know that it is a pretty poor random number generator as it will often spew out the same sequence over and over again - can you guarentee that your client doesn't use rand()?)
The man who wrote what follows is Exim's (smtp mailserver/MTA) daddy, Philip Hazel, and a doctor of applied math at Cambridge University: <quote> 3.3 Message identification Every message handled by Exim is given a "message id" which is sixteen characters long. It is divided into three parts, separated by hyphens, for example "16VDhn-0001bo-00". Each part is a sequence of letters and digits, normally representing a number in base 62. However, in the Darwin operating system (Mac OS X) and when Exim is compiled to run under Cygwin, base 36 is used instead, because the names of files in those systems are not case-sensitive. The first six characters are the time the message was received, as a number in seconds - the normal Unix way of representing a time of day. If the clock goes backwards (due to resetting) in a process that is receiving more than one message, the later time is retained. After the first hyphen, the next six characters are the id of the process that received the message. The final two characters, after the second hyphen, are used to ensure uniqueness of the id. There are two different formats: (a) If the "localhost_number" option is not set, uniqueness is required only within the local host. This portion of the id is "00" except when a process receives more than one message in a single second, when the number is incremented for each additional message. (b) If the "localhost_number" option is set, uniqueness among a set of hosts is required. This portion of the id is set to the base 62 encoding of <sequence number> * 256 + <host number> where <sequence number> is the count of messages received by the current process within the current second. As the maximum value of the host number is 255, this allows for a maximum value of 14 for the sequence number. If this limit is reached, a delay of one second is imposed before reading the next message, in order to allow the clock to tick and the sequence number to get reset. </quote> Jeff and NotZed, together with a couple of less-frequently contributing Ximian hackers, write a lot of sense, and I'd put this list at the top of my "listening people" list. But every now and again they write something that makes me raise my palms to the air and go and do something else. Like the answer to the request for a user ID on the printout, where the statement (more or less), that "Ximian is not aware of any competition" is uttered as gospel. Maybe Jeff and NotZed are over-worked. Best, Tony -- Tony Earnshaw Tha can allway tell a Yorkshireman, but tha canna tell 'im much. e-post: tonni billy demon nl www: http://www.billy.demon.nl gpg public key: http://www.billy.demon.nl/tonni.armor Telefoon: (+31) (0)172 530428 Mobiel: (+31) (0)6 51153356 GPG Fingerprint = 3924 6BF8 A755 DE1A 4AD6 FA2B F7D7 6051 3BE7 B981 3BE7B981
Attachment:
signature.asc
Description: Dette er en digitalt signert meldingsdel