Re: normalizing filenames and strings



On Wed, 2007-03-28 at 16:55 +0200, Xavier Bestel wrote:
> On Wed, 2007-03-28 at 16:46 +0200, Alexander Larsson wrote:
> > On Tue, 2007-03-27 at 13:15 -0400, Dr. Michael J. Chudobiak wrote:
> > > > Filenames could also be NFC normalized when created, although that's
> > > > not absolutely necessary.
> > > 
> > > It would be nice if gnome mandated a standard approach for 
> > > normalization. Does everyone like NFC? (http://unicode.org/reports/tr15 
> > > for info.)
> > > 
> > > > This could be fixed at a low level, in gtk filechooser for some cases
> > > > or in apps. Gnome-vfs should handle that too.
> > > 
> > > It would be nice if gnome-vfs could handle this in the background, so 
> > > coders don't have to worry about uri escaping and normalization at the 
> > > same time. (The existing normalization functions have to be used on 
> > > unescaped URIs. It's already tricky enough keeping track of gnome-vfs 
> > > escaping issues...)
> > 
> > Its very hard and quite expensive to handle normalization automatically
> > at the low level. You have to intercept every i/o operation, and it can
> > introduce very strange behaviour (since we can't control whats already
> > on the disk). We have to accept that unix filenames are strings of bytes
> > and that we just cannot enforce any meaning on them (although we can do
> > our best to try to make them some normalized form of utf8).
> > 
> > For uri escaping I'm doing my best to make it not an issue in the new
> > GVFS API that is to replace gnome-vfs. (By not using uris much in the
> > API.)
> > 
> > In practice i don't think there is an enourmous problem. Most files are
> > either selected in the fileselector/filemanager (so we don't care about
> > normalization, just the filename bytestring that was selected) or for
> > new files, typed into the file selector.If the fileselector can do some
> > normalization for typed-in names we shouldn't really in normal use cause
> > any "duplicate" unnormalized filenames.
> 
> IMHO the only work needed to handle this is in all filename-selection
> widgets, which should do completion based on similar unicode names (like
> the fileselector does already for names differing only by case).

Most applications that operate on files will accept file name
arguments when invoked.  What are we supposed to do with these?
Bear in mind that the argument isn't only used by shell junkies.
It's also used when, for example, you double-click a JPG to open
EOG.  Nautilus passes the file name to EOG.

If we don't normalize, users might have a hard time opening
files from the command line.

If we do normalize, then people will pretty much never be able
to open files that have unnormalized file names, which seems
like a much more serious problem.

--
Shaun





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]