Re: Natural sort order for Nautilus?



Andrew Kerr wrote:
> A good algorithm (linear performance, zlib license) and explanation is
here:
>  http://sourcefrog.net/projects/natsort/

Alexander Larsson wrote:
>While its a good idea, it doesn't really work for nautilus as is. To
>correctly sort in the locale of the user we sort by collation keys, not
>the normal name. I'm not sure how easy it would be to implement natural
>sorting with collation.

I've experimented with something similar some time ago. Sorry, only
rotten perl code. Basically I ended up with a set of regular
expressions; each one matching would break up the string in several
elements with some precedence assigned to each element. Finally the
strings were sorted from highest ranking element to lowest, and each
element sorted according to the locale collating order.

Precedence rules are relative to the remaining residue of the string.
That is, some parts of the string isn't matched and this is given a
precedence of 0. A precedence of 1 is "one position to the left" and -1
is "one position to the right".

To make an example, it would be possible to sort
  Something-Monday.txt
  Something-Sunday.txt
  Something-Saturday.txt
And get
  Something-Monday.txt
  Something-Saturday.txt
  Something-Sunday.txt

(given that you defined an element weekdays with its own collating
order)

If you defined rules to set precedence equal for weekdays no matter if
the weekdays were in front or after the residue you could sort
  Something-Monday.txt
  Sunday-Something.txt
  Something-Saturday.txt
And get
  Something-Monday.txt
  Something-Saturday.txt
  Sunday-Something.txt

By defining an element separator it would be possible to sort without
regards to the actual separator in use
  Something-Monday.txt
  Something_Sunday.txt
  Something Saturday.txt
And get
  Something-Monday.txt
  Something Saturday.txt
  Something_Sunday.txt
 
Probably you would want to define rules given the nature of the
directory you try to list, that is in some situations you would like to
use something like qr[\s+-\s+] to separate elements in a music
collection while in an other directory this might give an erroneous
listing.

It seems like it is possible to inspect a directory and check for
matches against the rules, and then only use the rules that matches
several times. The problem this rises is that you can't expect a single
sort order within a directory.

One problem is that there is no defined collation order for some of the
elements you might want to use so you end up with something close but
not quite right.

An other problem is that the collation order defined in the locale might
be wrong for a similar construct used in a filename, most typically
dates and currency. Especially dates are troublesome.

An example, assume you mix date systems
  Unplugged 2.12.2004
  Unplugged 4.12.2004
  Unplugged 12.3.2004

To sort something like that reliably is nearly impossible.

Now, assume you could rearrange subelements like the day-month-year
ordering. This would imply that there are something unique about them
which makes it possible to distinguish them. It is clearly not possible
to do this in the previous example but if we have something like
  Unplugged 2.12.2004
  Unplugged 4.12 2004
  Unplugged 2004-12-03

It should be possible to sort this without failing. I do that by
building a hash from the elements inner parts, with elements within a
named context using the same hash keys.

Now, it is possible to go one step further, named entities can be given
a numeric representation. A date 2004-12-03 can then be written as 3.
nov 2004 and still be sorted correct.

John



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]