Re: fsync in glib/gio



Behdad Esfahbod wrote:
Its well explained in the various discussions about this. Essentially,
the metadata for the rename is written to disk, but the data in the file
is not (yet, due to delayed allocation) and then the system crashes. On
fsck we discover the file is broken (no data) and set the file size to
0.

That's clearly a broken filesystem (screw standards. If it doesn't do what users expect, it's broken). Why work around it in gio? Have the filesystem guys "fix" it for whatever that means.

All we need the few major distros handling it properly.

There are different definitions of broken. I might consider it "broken" if my glib/gio application which writes out thousands of little files and suddenly starts taking twice as long as my Perl program. fsync() has a real measurable cost.

The documentation for the file systems is usually quite clear, although people may not understand it. When a file system such as ext3 offers different journal modes, it's presumed that the user understands the effect of their choice. It the user must be absolutely safe - they should use 'journal' mode - but this may be slow, as all data is written to disk at least twice. If 'ordered' mode is used, this means they are willing to accept lesser guarantees for increased performance. The ext3 'ordered' mode works pretty well - data before metadata, and metadata updates are ordered. But - it's not perfect! What order is the data written in?

Do you intend to patch glib/gio if somebody reports that glib/gio used write() to one part of a file, then another part of a file, then pulls the plug, and is able to prove to you that it is possible for the second write to finish while the first write hasn't started? Will you call it absolutely broken and demand a file system fix?

The ext3 'writeback' mode provides even fewer guarantees. Meta data is ordered, data is not. The behaviour you are talking about right now seems to be 'writeback' mode (not sure - I guess ext4 is doing 'writeback' mode by default otherwise I don't understand the complaint?).

Tell your users if they expect the right thing to use 'journal' mode. glib/gio cannot and should not be second guessing the file system choice of the user. Taking this argument to its extreme, you may as well run fsync() after every single I/O operations that performs a modification. This would be horrible for performance, and the user has this capability already by defining the file system as completely journalled and using synchronous writes. They don't need glib/gio to simulate this.

My opinion is that glib/gio shouldn't be doing this stuff. The problem is not with glib/gio. glib/gio should offer an fsync() wrapper (not sure if it does or not - I don't use it), such that applications with special requirements such as a database application, can use fsync() at strategic points where the application wishes to make greater promises than the file system. A database file applies here. For databases specifically, fsync() on close() is not good enough. fsync() needs to be done at any point that the data needs to be consistent and written to disk before the application continues to do another write(). glib/gio cannot guess where these points are.

Putting fsync() on close() is a hack.

Just my opinion. :-)

Cheers,
mark

--
Mark Mielke <mark mielke cc>



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]