Re: New GIOChannel patch
- From: Ron Steinke <rsteinke w-link net>
- To: gtk-devel-list gnome org
- Subject: Re: New GIOChannel patch
- Date: Tue, 17 Jul 2001 15:21:16 -0700
> From: Owen Taylor <otaylor redhat com>
>
> [ This mail is just a mail responding to the text of your
> mail ... haven't looked at the patch yet ]
>
> Ron Steinke <rsteinke w-link net> writes:
>
> > Owen,
> >
> > I've got a new patch which addresses most of your concerns.
> > Bugzilla corrupted the patch again, so it's up on
> >
> > http://www.elfhame.net/~rsteinke/giochannel-july-15.diff
> >
> > The things which still need to be examined are discussed below.
> > Anything marked "FIXME efficiency" is something noncritical
> > which I haven't gotten to yet.
>
> Some procedural stuff:
>
> * This bug report probably has reached the end of its useful
> life -- you probably should file separate bug reports
> for the remaining FIXME items.
I've created the new bugs 57689-57692,57694-57695. Most of these
are for win32. I'll mark 52811 as closed once I get the next major patch
in.
> * I'd like for as much discussion on issues like this to
> go on gtk-devel-list as possible ... people may or may
> not care, but at least it's publically archived.
Okay. I'm replying to your mail on the list to move the discussion there.
> * If you send me a login and crypted password,
> (http://people.redhat.com/otaylor/cryptit) I'd be happy
> to create a CVS account for you with write access. (Assuming
> you don't already have one.)
That'd be really helpful. I'll send the login and password under
separate cover.
> I'd like to be kept in the loop for major/API changes,
> but I don't see any reason why you can't commit bug
> fixes directly.
>
> > > * lseek can't return EINTR, EGAIN, neither can fcntl, fstat.
> >
> > The Debian man page for fcntl lists EAGAIN and EINTR as possible errors.
> > EINTR can only be returned for F_SETLKW, F_GETLK, and F_SETLK,
> > so we can ignore it, but no such restriction is given for EAGAIN.
>
> The Linux man pages really shouldn't be considered much of an
> authority - they are pretty random in what they include and
> dubiously accurate at times.
>
> The man page I have mentions EAGAIN only in the section on
> F_SETLK, and the Unix98 standard says authoritatively that
> it can only occur for F_SETLK.
>
> > > * What should be the handling of EPIPE: is it really an
> > > error not a status? (should EOF by recycled?). Would
> > > it make sense to use send() instead of write to
> > > avoid SIGPIPE? Should that be optional?
> >
> > I'm not sure what to do with EPIPE. In my previous mail,
> > which may have been somewhat confusing, I was actually
> > thinking of ESPIPE, which I've removed in favor
> > of using g_return_val_with_fail() in the seek functions.
>
> > > * The set of errors in GIOChannelError seems a bit random -
> > > contains obscure errors like EPERM, ENOLCK, EDEADLK, but certainly
> > > not nearly every error that could occur from open().
> >
> > Fixed as per our discussion. Looking at the win32 code,
> > I think the error in win32_msg_[read,write] which was
> > previously returned as G_IO_ERROR_INVAL may be closer
> > to EMSGSIZE, but comments from people who actually
> > understand that code are necessary before we do anything
> > about the error message. Any win32 developers care to have
> > an opinion?
>
> Hmmm, no Win32 developers here... ;-) You might want to
> send that question in a separate mail to gtk-devel-list.
>
> I do think that any GIOChannel needs to look like a
> stream ... if there is a limit on the size of a write(),
> then ->io_write() needs to simply make a short write if
> called with more bytes than allowed.
I'm not completely sure what the error means in this case, though,
since it was originally G_IO_ERROR_INVAL, which could be a number
of things. Comments from win32 people?
This question is now bug #57691.
> > > * Need to make a read-only-returns-whole-characters guarantee
> > > for reading for UTF-8, and validate incoming data.
> >
> > Did this, excluding the encodings NULL (guaranteed to be binary safe)
> > and GIOChannelEncodeRaw.
>
> So, we have for special encodings:
>
> - "UTF-8" works just like it wasn't a special case, but doesn't
> involve opening a converter from UTF-8 to UTF-8.
>
> - NULL. No conversion, byte-by-byte.
>
> - GIOChannelEncodeRaw. Like NULL but no buffering?
>
> The NULL / GIOChannelEncodeRaw distinction strikes me as something
> that shouldn't be controlled by the encoding - since there
> is no difference in encoding. Rather, there should be a
> separate g_io_channel_set_unbuffered(), or something.
Mabye we should just rename g_io_channel_set_encoding() to
something more general.
> Also, if this is the setup, I believe the default should probably
> be UTF-8, not NULL.
I'll change it, then.
> > > * I think we need to hide partial chars when writing from
> > > the user - the total possible partial char size is only
> > > known and is 5 bytes because it is UTF-8. (4 if we assume
> > > that it UTF-8 encoding valid unicode characters)
> >
> > Did this. However, it's not really possible to differentiate
> > partial characters from invalid characters in this case,
> > and failing to do so can get you wedged. Consider the case:
> >
> > <- valid UTF-8 data->xa<- valid UTF-8 data ->
> >
> > where 'x' is the first byte of a multibytes UTF-8 character,
> > and 'a' is an ascii character (<= 0x7f). The sequence is
> > clearly invalid, but if the data is sent to write_chars() in
> > two parts,
> >
> > <-valid UTF-8 data->x
> > a<-valid UTF-8 data->
> >
> > then 'x' will sit in the partial chars buffer. When the second set
> > of data is written, we will return the error G_CONVERT_ERROR_ILLEGAL_SEQUENCE.
> > On checking the second set of data, however, no illegal sequence will
> > be found.
>
> Doesn't matter. G_CONVERT_ERROR_ILLLEGAL_SEQUENCE isn't recoverable
> in any meaningful way except for displaying an error dialog to
> the user.
>
> > > * g_io_channel_get_buffer_condition should not return
> > > G_IO_IN if there isn't at least a whole character.
> > > G_IO_OUT is completely wrong - should be something
> > > like channel->write_buf->len < channel->buf_size.
> >
> > This is somewhat saner now, but I'm still not happy with the
> > result. Since we have a buffered channel, G_IO_OUT should always be
> > set. However, we don't want to do that without polling,
> > since G_IO_IN might never get set.
>
> Well, that's handled right by:
>
> /* Only return TRUE here if _all_ bits in watch->condition will be set
> */
> return ((watch->condition & buffer_condition) == watch->condition);
>
> Right? Actually, I don't think we should set the G_IO_OUT bit
> unless we have space in the buffer ... if I'm trying to write
> a 30 meg file to a socket, and am using G_IO_OUT, and I don't
> want to create a 30 meg local bugger.
So the current approach seems to be correct. The only question I
have left is if we need some sort of automatic flush watch.
> > > Question - how do you detect EOF on a buffered channel?
> >
> > You check that the buffer is empty _and_ try a read on the
> > channel. That is why g_io_channel_fill_buffer() sometimes
> > returns G_IO_STATUS_EOF; it indicates nothing was read on
> > the last read, so if the buffer is empty the appropriate
> > read function can return an EOF. If g_io_channel_fill_buffer()
> > returns G_IO_STATUS_EOF and the buffer terminates in a partial
> > character, that's an error.
>
> I think this question was not about the internals, but rather
> about how you would implement what you used to implement
> as:
>
> gboolean
> my_watch (GIOChannel *channel, GIOCondition condition, gpointer data)
> {
> char buf[1024];
> size_t count;
>
> g_io_channel_read (channel, buf, sizeof (buf), &count);
> if (count == 0)
> {
> g_io_channel_close (channel);
> return FALSE;
> }
> else
> {
> /* Process buf */
> return TRUE;
> }
> }
>
> I think the answer to that would be:
>
> gboolean
> my_watch (GIOChannel *channel, GIOCondition condition, gpointer data)
> {
> char buf[1024];
> GIOStatus status;
> GError *err = NULL;
>
> g_io_channel_read_chars (channel, buf, sizeof (buf), &count, &err);
> if (status == G_IO_CHANNEL_EOF)
> {
> g_io_channel_close (channel);
> return FALSE;
> }
> else if (status == G_IO_CHANNEL_ERROR)
> {
> g_warning ("Error reading from channel: %s\n", err->message);
> g_error_free (err;
> return FALSE;
> }
> else
> {
> /* Process buf */
> return TRUE;
> }
> }
>
> > > * g_io_unix_get_flags() doesn't look close to right for
> > > anything but files.
> >
> > Why? fcntl() should be fine for sockets
>
> I think my concern was the use of fstat() and checking the
> access bits and trying to use those to guess whether the
> channel was readable or writeable.
>
> Or perhaps it was the older issue of using fstat() on sockets,
> though I think that should be OK, certainyl if we check for errors.
>
> > > * Should lazily open read and write converters? (Error
> > > reporting a bit of a problem - you could just always open
> > > _one_ (say the read converter)
> >
> > 1. As long as we've got to open one, why not both?
>
> Efficiency. Opening a converter can be quite expensive for
> some encodings, and there could well be separate forward and
> reverse mapping tables for encoding <=> unicode.
>
> Perhaps we can use the flags to detect whether the stream is
> readable / writeable before opening converters. I think
> almost all uses of encoding conversions will be for files,
> and the channel will be unidirectional.
>
> > 2. Does conversion guarantee reverse conversion for non-GNU iconv?
>
> No, there is no guarantee (even for GNU iconv, in theory). I guess
> it's probably better to open both. On consideration, opening
> just the reader, and then opening the writer later, even if
> we only needed the writer is rather to begin with is dubious.
>
> Opening the converters lazily would be best, but isn't possible
> because of error reporting.
>
> So, I think the best thing to do would be to see if we can
> open just the converters we need from the start.
>
> > > * There is no reason (IMO) to expand the write buffer arbitrarily
> > > when writing characters - just flush out each chunk as
> > > it converted.
> >
> > Writing before conversion is complete would cause problems in dealing
> > with error conditions. Currently, if an error is detected in
> > writing, the write buffer is returned to its previous condition
> > and bytes_written is set to zero. If we start writing before we're
> > sure there aren't any conversion errors, we can't do this.
>
> I think this is fine. We see the same behavior with out of space
> errors - we write the part that fits, we get a short write,
> we try again, we get a ENOSPACE error for the remainder,
> and return that as a GError.
>
> If we wanted to preserve the "error returned only if nothing
> written" aspect of write(), we'd have to make sure we only
> ever did one write() per call of g_io_channel_write_chars()
> which is neither easy to do nor desirable.
So, should I then return the number of characters actually written
in *bytes_written even on error?
> >
> > We might try setting it on NULL and raw encoding (because they
> > should be binary safe), and clearing it on all others (because
> > they're text). Do some encodings use \r and \n bytes inside
> > other characters?
>
> Not-setting O_BINARY is a disaster for UTF-16 ... \n is representted
> as \n\0, and changing that into \r\n\0 corrupts the entire stream.
>
> Anyways, I think making this depend on the encoding would be a
> mistake. It's confusing. Not to mention the fact, that I don't think
> we can actual _change_ it for a file once we have it open.
>
> > This would also require making the default encoding "UTF-8"
> > (instead of NULL) to preserve backward compatibility with
> > the old functions, which didn't (I believe) set O_BINARY.
>
> Remember, the old functions had no way of opening a file as
> a channel, so whether this was set was up to the app.
>
> The lazy thing to do would be simply to make the new_file() function
> accept the "b" flag and leave it up to the app. Or perhaps
> we could make the default "b" and have a "t" flag, though that
> would be quit non-standard.
>
> I'd really like to get input from the Win32 people on this,
> but haven't managed to do so yet.
This question is marked as bug #57695
> > > Things I haven't really fully reviewed yet
> > > ==========================================
> > >
> > > * Interaction of all the code with non-blocking streams
> > >
> > > * Interaction of the code with IO watches - what would
> > > you do for instance, if you wanted to set up
> > > a calllback to be called on each incoming line
> > > as it arrived.
> > >
> > > * Handling recovery of some sort on encoding failure
> > > in the middle of a text file.
> >
> > Two other things:
> >
> > When trying to flush a write buffer containing partial chars,
> > the new error condition G_IO_CHANNEL_PCHAR_FLUSH is set.
> > I didn't use G_CONVERT_ERROR_PARTIAL_INPUT for this case,
> > since all the read functions use it for partial chars on a read.
> > The read functions can also return G_IO_CHANNEL_PCHAR_FLUSH
> > when calls to read and write functions are mixed.
>
> Well, the number of error conditions is really not that important
> since I suspect programmers will distinguish between them only
> in rare conditions. The important thing is clear text
> in error->message.
>
> I do consider "PCHAR_FLUSH" to be an dubious abbrevation.
> (The general rule is that abbreviations should be avoided unless
> they are obvious. G_IO_CHANNEL_PARTIAL_CHAR_FLUSH is certainly
> looong, but how often would someone need to type it?
>
> However, do we need this error?
>
> When should the user be informed that a "partial character
> flush occurred". What would you do if that appeared in a log
> file or on the screen?
>
> I think there are three possibilities:
>
> * It's a symptom of a mis-encoded internal data. This is essentially
> a programming bug. Internal streams should not be misencoded.
>
> * It's a programming error. g_io_channel_flush() was called
> at some point when it "shouldn't" be called.
>
> * It's not an error at all. The programmer simply wanted
> to encourage data to the stream but didn't care if
> it all went or not. (After all, on a non-blocking stream
> flush() won't flush everything either.)
>
> If there is a possibility that it isn't a bug, I think we
> should just keep quiet.
>
> For mixed reads/writes:
>
> * If it's a socket, we should probably just keep quiet.
> The reason for flush() before read() is to make sure
> that any complete output is sent to the other side
> before we wait for input.
If it's a socket, we don't call flush() from read()
(since channel->is_seekable == FALSE)
> * If it's a file, it's most likely a programming error
> and/or a misencoded internal stream,
>
> I don't see any disaster in not reporting this error, and
> it seems to me that making this an error could occasionally
> be quite annoying.
Okay. I'll change it to a g_warning(), and clear the
contents of the partial character buffer on flush.
This is now bug #57694.
> > What is the channel_flags field in GIOChannel used for?
> > The only place I see it being use is in g_io_channel_init(),
> > where it's set to zero.
>
> My guess is that it was there originally so things like whether the
> channel was blocking or not could be checked efficiently. But
> yes it looks like dead code now. We can always add it back
> if we need it.
I'll clobber it then.
Ron
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]