Re: Using "user.mime_type" xattr for MIME guessing



On Thu, Aug 9, 2018 at 2:47 PM Bastien Nocera <hadess hadess net> wrote:
On Thu, 2018-08-09 at 13:09 -0400, Colin Atkinson wrote:
> On Thu, Aug 9, 2018 at 10:47 AM Bastien Nocera <hadess hadess net>
> wrote:
> > On Thu, 2018-08-09 at 10:35 -0400, Colin Atkinson via gtk-devel-
> > list
> > wrote:
> > > Hi everyone,
> > > I'm working on a FUSE file system which makes network requests
> > > whenever a file is read. So obviously, I would like to avoid
> > excess
> > > read requests to files.
> > >
> > > The current implementation [0] of gio's MIME guessing seems to
> > check
> > > the file extension, and then immediately fall back on reading
> > magic
> > > bytes when this is not possible (i.e. files with an ambiguous or
> > no
> > > extension). In my situation, this can potentially lead to many
> > > network requests any time a user opens a directory in Nautilus or
> > a
> > > file selection dialog.
> > >
> > > According to the FreeDesktop specs [1], implementations may query
> > the
> > > user.mime_type xattr for a given file's MIME. But the current
> > version
> > > of glib seems not to make use of this.
> > >
> > > Would there be any interest in a patch to add this functionality?
> > If
> > > so, I would be happy to work on it.
> > >
> > > Please let me know if there's anything I've missed/misunderstood.
> >
> > It's probably something interesting to add to GIO, though checking
> > xattrs also has a cost, especially on local disks.
> >
> > Depending on what your FUSE is, you might want to consider writing
> > a
> > gvfs backend instead, where the backend is responsible for
> > providing
> > the mime-type/content-type (and all the other metadata), so you can
> > use
> > whichever method is the most useful to you, with no added costs
> > because
> > the metadata and the enumeration can be done in one go.
> >
> > Cheers
>
> While there is an overhead for getting the value of an xattr, it also
> potentially prevents the expense of doing glob lookups

A glob look is essentially free. There's an mmapped cache with those,
and it takes microseconds to do a lookup.

>  and magic guessing. It was added partially to help avoid those very
> operations, as they were deemed expensive.

magic guessing is only going to be expensive because it needs to do
I/O. So would looking up the xattr.

> I'm hesitant to commit to writing a full gvfs backend. Correct me if
> I'm wrong, but from reading through some of the backends in the gvfs
> repo, it seems like writing one would require essentially duplicating
> all of that effort.

Well, yes and no. We've not seen your code, we just know that it's for
a network filesystem. A gvfs backend will integrate better with GNOME
in general, but most of the code should be trivial if you're using a
library to hide all the intricacies of the underlying protocol.

As for the benefits of writing a gvfs backend, there's an "afc" (Apple
File Conduit, for iOS devices) FUSE backend, but we also wrote a gvfs
backend. The gvfs backend integrates with a separate backend that
watches for plugged in devices, and uses that to mount the filesystems.
There's integration with GNOME because it knows how to tell you about
unlocking your device, it can set thumbnails or icons, and mime-types
on files without doing extra I/O.

Using a file manager that speaks gvfs on top of a gvfs backend is just
going to be more efficient. A FUSE backend is nice when you're
prototyping, and want people to test out the code, and kick the tires.
It's just not a long-term solution for a lot of use cases. (Though it's
plenty fine if the filesystem is a local one, and the format matches
POSIX expectations, such as local filesystems that are unsupported by
the kernel, you'd then teach udisks how to mount them and integration
would be good enough).

>  And then duplicating it again for KDE. All while maintaining support
> for the FUSE system so that it is usable on Windows/OS X.

Depends what your target is. I doubt that Windows will read xattrs, but
then again, I don't know anything about FUSE under Windows.

> This also isn't an isolated situation. There are tons of situations
> where excess reads should be avoided (e.g. slow disk with lots of
> files in a directory),

Again, xattr reads are not free. And there might not even be a
user.mime_type xattr in there!

>  or where a FUSE or application has explicitly set the MIME (e.g.
> curl setting user.mime_type based on the Content-Type header).
>
> I may try to whip up a minimal proof-of-concept patch sometime in the
> next week (unless there's strong opposition to it). From there, it
> should be feasible to see how checking user.mime_type affects
> performance.

Alex Larsson looked into that, and reading xattrs when most of them
didn't have the user.mime_type xattr will just waste I/O. It might need
to be a special case for specific FUSE filesystems to avoid every
directory read being slower.

Cheers

Ok, so it seems like user.mime_type is probably a no-go at the moment. If it adds significant overhead as you suggest, it's probably best to avoid it in the general case.

Given that, what's the feeling towards a more configuration-based solution, either to enable the user.mime_type behavior, or just to disable the byte magic behavior? I'm thinking either a per-tree or per-directory option stored either by GSettings or in a plaintext config file (along the lines of KDE's .directory files). This gives both users and developers the ability to control this behavior, and the only overhead would be an extra dconf read per application (no idea the cost of that), or a single extra stat call per directory.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]