Re: Discouraging use of sync APIs



On Tue, 2015-02-10 at 11:20 +0100, Lennart Poettering wrote:
On Mon, 09.02.15 10:47, Philip Withnall (philip tecnocode co uk) wrote:

Hi all,

From some feedback from real-world users of GLib/GIO (see the ‘Feedback
from downstreams’ thread), and just from looking at our own code in
GNOME, it seems that people persistently use sync versions of APIs in
the main thread when they should be using async ones.*

[...]

* There are definitely legitimate uses of sync APIs, but not from the
main thread, ignoring trivial command line utilities (and even then, if
they want to handle signals properly, they should be running a main
loop).

Uh, oh. This is really oversimplifying things. Note that on Linux disk
IO is generally synchronous, and it's good that way, and you cannot
really avoid it. I mean, never forget that your executable and its
shared libraries are all mapped into memory and access to their pages
results in synchronous IO. Even if you wanted you couldn't make that
async... 

The difference there is that you cannot do _anything_ until the code you
want to execute is paged in. For I/O operations on files, you can be
redrawing the UI, accessing other files, etc.

I am pretty sure if you do async IO like gio does for every single
file access you'll just complicate your program and make it
substantially slower. For small files normal, synchronous disk access
is a ton faster than dispatching things to background threads, and
back... 

The problem is that GIO can’t know which accesses are to small, local
files, and which aren’t. It already optimises reads from pollable
streams (sockets) by keeping them in the main thread and adding them
into the main poll() call.

How about using the distinction between GIO and gstdio.h? Functions like
g_file_get_contents(), or g_open() + read() + g_close() which can safely
be used on small, local files, can continue to be called from the main
thread? That would be fine for system utilities which _know_ they will
operate on a local file system.

For typical desktop applications, though, the home directory could be an
NFS share and all the ~/.config files could be hidden behind noticeable
latency. For those applications, I think GIO should continue to be used,
and used asynchronously.

Also, glib has wrappers for making mmaping available to programs, to
improve seldom-accessed sparse databases efficient, do you want to
prohibit that too?

No, mmap() is clearly a tool for a different kind of problem. If you’re
accessing an mmap()ed file, you need to be sure it’s local anyway, I
think? GMappedFile doesn’t have async versions of its methods,
presumably for this reason.

Moreover on Linux file systems like /proc, /sys, /run, /tmp are known
to not be backed by slow physical IO, hence its really pointless
accessing them via async IO...

I suggest gstdio.h + normal POSIX read() (as above).

Then, during start-up of your app, when you need to read some config
file or so before you can do your first steps, why bother with async
stuff? You need to wait for for reading/parsing that file anyway
before you can proceed?

This seems to be the only use case where sync I/O calls still seem, to
me, to be reasonable. But in my opinion we could suffer the loss of that
convenience if doing so means we can easily detect other sync calls from
the main thread which _will_ cause problems.

Hence, my recommendation would be to draw the line somewhere between:
"potentially unbounded user payload" data and "configuration/control"
data. For the former it would be wise to encourage async IO but for
the latter certainly not. If you follow what I want to say...

As above, how about making that line the distinction between calling
functions from gstdio.h and using GIO? In the former case, you know
you’re operating on local files. In the latter, you could be operating
on files from the moon.

Philip

Attachment: signature.asc
Description: This is a digitally signed message part



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]