Comments on last iteration of GIOChannel patch



Here are some comments from looking through the last iteration of your
GIOChannel patch. I think the partial-char and UTF-8 portions probably
need some more refinement, but why don't you go ahead and commit what
you have here. (with possible changes commented below.) 

I think it will be easier to do that rather than keep a big patch
separate, and that will get the API changes for things like
g_io_channel_new_file() sooner. (Make sure to make a ChangeLog
entry and use that as the text of your commit message.)

Could you also file a bug or bugs for the efficiency/buffer-handling
FIXMEs ?

Thanks,
                                        Owen


  -      g_io_channel_flush (channel, err);
  +      g_io_channel_set_flags (channel, flags & ~G_IO_FLAG_NONBLOCK, NULL);
  +      /* Ignore any errors here, they're irrelevant */

I'd move this comment up above the call to set_flags() to make it
clear what it is referring to.

     if ([bytes to flush]) 
       {
	 if (err)
	   result = g_io_channel_flush (channel, &tmperr);
	 else
	   result = g_io_channel_flush (channel, NULL);
       }
     else
       result = G_IO_STATUS_NORMAL;

     status = channel->funcs->io_close (channel, err);

     if (result != G_IO_STATUS_NORMAL)
       {
	 if (status == G_IO_STATUS_NORMAL)
	   {
	     if (err)
	       *err = tmperr;
	     return result;
	   }
	 else if (tmperr)
	   g_error_free (tmperr);
       } 

     return status;
   }

I find this code quite confusing, and it has a bug in it: if err 
is NULL, tmperr is leaked. Suggest rewriting as:

    if ([bytes to flush]) 
      {
	if (err)
	  result = g_io_channel_flush (channel, &tmperr);
	else
	  result = g_io_channel_flush (channel, NULL);
      }
    else
      result = G_IO_STATUS_NORMAL;

    status = channel->funcs->io_close (channel, err);

    if (status != G_IO_STATUS_NORMAL)
      {
	g_clear_error (tmperr);
	return status;
      }
    else if (result != G_IO_STATUS_NORMAL)
      {
	g_propagate_error (err, tmperr);
	return result;
      }
    else
      return G_IO_STATUS_NORMAL;
  }

It could be compressed to 
if (status != G_IO_STATUS_NORMAL) {} else {}, but I think the
extra case improves clarity.

  @@ -435,19 +468,26 @@
   GIOCondition
   g_io_channel_get_buffer_condition (GIOChannel *channel)
   {
  -  return
  -    ((channel->read_buf && channel->read_buf->len > 0) ? G_IO_IN : 0) |
  -    ((channel->write_buf && channel->write_buf->len <= channel->buf_size) ? G_IO_OUT : 0);
  +  GIOCondition condition = 0;
  +
  +  if ((channel->read_buf && (channel->read_buf->len > 0)) /* FIXME full chars how? */
  +    || (channel->encoded_read_buf && (channel->encoded_read_buf->len > 0)))
  +    condition &= G_IO_IN;

Remember to fix this (one of my previous mails had two suggestions
as to how to do this. Perhaps a bug should be filed if you
aren't going to do it immediately.

  @@ -572,6 +607,8 @@
   {
     g_return_if_fail (channel != NULL);
     g_return_if_fail (!line_term || line_term[0]); /* Disallow "" */
  +  g_return_if_fail (!line_term || g_utf8_validate (line_term, strlen (line_term),
  +                    NULL)); /* Require valid UTF-8 */

You can replace strlen (line_term), with -1.

Two things in g_io_channel_flush() that I think we've already
covered:
 
 * a return length of 0 means AGAIN or EOF, so instead of handling
   it, you should just:

      if (status != G_IO_STATUS_NORMAL)
        return status;

      g_assert (this_time > 0);

   [ If we are allowing people to write their own IO channels, which  
     is probably a lost cause, then we might want a more explicit
     g_warning saying what is wrong ]

 * G_IO_CHANNEL_ERROR_PCHAR_FLUSH

  @@ -822,26 +873,32 @@
     g_return_val_if_fail (channel != NULL, G_IO_STATUS_ERROR);
     g_return_val_if_fail ((error == NULL) || (*error == NULL), G_IO_STATUS_ERROR);

  -  /* Make sure the buffers are empty */
  +  /* Make sure the encoded buffers are empty */

  -  g_return_val_if_fail (channel->read_buf->len == 0, G_IO_STATUS_ERROR);
  -  g_return_val_if_fail (channel->write_buf->len == 0, G_IO_STATUS_ERROR);
  -  g_return_val_if_fail (channel->encoded_read_buf == NULL ||
  +  g_return_val_if_fail (!channel->encoded_read_buf ||
			  channel->encoded_read_buf->len == 0, G_IO_STATUS_ERROR);
  +  g_return_val_if_fail (channel->partial_write_buf[0] == '\0', G_IO_STATUS_ERROR);

Isn't what we need to check here that encoded_read_buf and write_buf
are empty? What we care about is that buffers including the _external_
encoding are empty. partial_write_buf[] is always in UTF-8.

[ I guess in the case of going to raw, we might want to require
  it to be empty, so perhaps just always requiring it to be empty
  is OK. It's probably an error in any case if it isn't empty.  ]

     did_encode = channel->do_encode;

     if (!encoding || strcmp (encoding, "UTF8") == 0 || strcmp (encoding, "UTF-8") == 0)
       {

Just noticed this ... We either need to require "UTF-8" exactly or
be even more liberal and do g_ascii_strcasecmp(). I'd incline
to requiring UTF-8 exactly. 


  +      /* Make sure the file buffers are empty */
  +
  +      g_return_val_if_fail (!channel->read_buf || channel->read_buf->len == 0,
  +                            G_IO_STATUS_ERROR);
  +      g_return_val_if_fail (!channel->read_buf || channel->write_buf->len == 0,

I think you meant !channel->write_buf, though see note above.

  @@ -947,22 +994,17 @@
     gsize read_size, cur_len, oldlen;
     GIOStatus status;

  -  if (!channel->ready_to_read)
  +  if (channel->is_seekable && ((channel->write_buf && channel->write_buf->len > 0)
  +    || (channel->partial_write_buf[0] != '\0')))

Wrapping of expressions should follow parentheses for indentation. 
(More of these elsewhere.)

The UTF-8 code needs a fair bit of refinement. Looking, for instance
at g_io_channel_read_chars()

 * I would avoid validating the output of iconv() ... either we have a broken
   system, which we can't do anything about, or we are just wasting cycles.

   The validation only needs to occur on read, since GTK+ assumes that 
   UTF-8 from the user is valid. So, I'd just put it in fill_buffer()
   along side the iconv() when you are doing real conversion.

 * I didn't expect you to use _both_ g_utf8_get_char_extended and
   g_utf8_validate.

 * Couple of style notes

      for (start = use_buf->str; g_utf8_validate (start, use_buf->str
        + *bytes_read - start, &end) == FALSE && *end == '\0'; start = end + 1);
        /* Continue validate past intermediate nulls, no body */

   - Comparison == FALSE is to be avoided. 
 
   -  It would be better to avoid such convoluted use of a for loop and write
      this with a while.

        {
          gunichar try_char;

          if (status == G_IO_STATUS_NORMAL && count >= 6)
            try_char = (gunichar) -1; /* avoid function call, see asserts below */
          else
            try_char = g_utf8_get_char_extended (use_buf->str, use_buf->len);

          switch (try_char)
            {
              case -2:
                g_assert (use_buf->len < 6);
                if (status == G_IO_STATUS_AGAIN)
                  return G_IO_STATUS_AGAIN;
                g_assert (status != G_IO_STATUS_NORMAL);
                g_set_error (error, G_CONVERT_ERROR, G_CONVERT_ERROR_PARTIAL_INPUT,
                             "Channel terminates in a partial character");
                break;
              case -1:
                g_set_error (error, G_CONVERT_ERROR, G_CONVERT_ERROR_ILLEGAL_SEQUENCE,
                             "Illegal UTF-8 sequence");
                break;
              default:
                g_assert (count < 6); /* buffer too small */
                return G_IO_STATUS_NORMAL;
            }

          return G_IO_STATUS_ERROR;
        }
 
 * Try something like:

   const char *p = use_buffer->str + old_len;
   while (p < use_buffer->str + use_buffer->len) 
     {
       gunichar ch = g_utf8_get_char_extended (use_buffer->str + i, use_buffer->len - i);
       if (ch == (gunichar)-1) 
         {
           g_set_error (error, G_CONVERT_ERROR, G_CONVERT_ERROR_ILLEGAL_SEQUENCE,
                        "Illegal UTF-8 sequence");
           break;
         }
       else if (ch == (gunichar)-2)
         {
           g_set_error (error, G_CONVERT_ERROR, G_CONVERT_ERROR_PARTIAL_INPUT,
	                "Channel terminates in a partial character");
           break;
         }
       else
         {
           ch = g_utf8_next_char (ch);
         }
     } 

 * I get the sense that you are making some efforts to try and let people recover
   when they get a invalid character in the input stream. While I think its
   certainly bad if a single misencoded byte makes the entire file unreadable,
   I don't see how to really avoid this problem.

   Once we have hit a misencoded character in the input stream, we have no way
   in general to recover to a sane state and resynchronize. If the input stream
   is in a known stateless encoding like UTF-8 we are better off ... we can
   simply advance along until we get another valid character. But in general
   no.

  -    return g_io_channel_flush (channel, error);
  -  else
  -    return G_IO_STATUS_NORMAL;
  +    {
  +      gsize wrote_bytes;
  +
  +      status = channel->funcs->io_write (channel, channel->write_buf->str,
  +                                         channel->write_buf->len, &wrote_bytes, NULL);
  +
  +      if (status == G_IO_STATUS_NORMAL)
  +        g_string_erase (channel->write_buf, 0, wrote_bytes);
  +    }
  +
  +  return G_IO_STATUS_NORMAL;
   }

shouldn't we be returning an error if the io_write encounters an error?

  Index: glib/glib/giochannel.h
  ===================================================================
  RCS file: /cvs/gnome/glib/glib/giochannel.h,v
  retrieving revision 1.7
  diff -u -r1.7 giochannel.h
  --- glib/glib/giochannel.h	2001/06/30 15:22:03	1.7
  +++ glib/glib/giochannel.h	2001/07/15 17:26:23
  @@ -39,6 +39,7 @@
   typedef struct _GIOChannel	GIOChannel;
   typedef struct _GIOFuncs        GIOFuncs;

  +#ifndef G_DISABLE_DEPRICATED
                         ^
Spelling. DEPRECATED.

   typedef enum
   {
     G_IO_ERROR_NONE,
  @@ -46,27 +47,24 @@
     G_IO_ERROR_INVAL,
     G_IO_ERROR_UNKNOWN
   } GIOError;
  +#endif /* G_DISABLE_DEPRICATED */
                           ^  
Same here.

   struct _GIOChannel
   {

Question:

 * Are we allowing people to create their own GIOChannel
   types? It's pretty darn tricky, and if we disallow 
   it, we give ourselves a lot more wiggle room for
   future changes without breaking bin compat.

If the answer to that questions is yes, then we need to 
stick /*< private >*/ and /*< public >*/ into the
structure to define what parts are publically accessible.  

If the answer to that question is no, then we should
move the GIOChannel structure definition and 
g_io_channel_init() into a giochannel-private.h header
file.

  Index: glib/glib/giounix.c
  ===================================================================
  RCS file: /cvs/gnome/glib/glib/giounix.c,v
  retrieving revision 1.15
  diff -u -r1.15 giounix.c
  --- glib/glib/giounix.c	2001/07/02 20:26:37	1.15
  +++ glib/glib/giounix.c	2001/07/15 17:26:23
  @@ -177,6 +177,9 @@
     GIOUnixChannel *unix_channel = (GIOUnixChannel *)channel;
     gssize result;

  +  if (count > SSIZE_MAX) /* At least according to the Debian manpage for read */
  +    count = SSIZE_MAX;

This qualification is also part of the Unix98 standard, so I think
you can say something like   /* Effect of passing larger values is undefined */

    retry:
     result = read (unix_channel->fd, buf, count);

SSIZE_MAX is probably not sufficiently portable. I think we need to add
definitions of G_MAXSIZE, G_MAXSSIZE to the glibconfig.h.

  @@ -397,80 +390,102 @@
     glong fcntl_flags;
     gint loop;
     GIOUnixChannel *unix_channel = (GIOUnixChannel *) channel;
  -  struct stat buffer;

  -  fcntl_flags = fcntl (unix_channel->fd, F_GETFL);
  +  do
  +    fcntl_flags = fcntl (unix_channel->fd, F_GETFL);
  +  while ((fcntl_flags == -1) && (errno == EAGAIN));
  +
     if (fcntl_flags == -1)
       {
	 g_warning (G_STRLOC "Error while getting flags for FD: %s (%d)\n",
		   g_strerror (errno), errno);
	 return 0;
       }

I think getting EAGAIN for fcntl (fd, F_GETFL) is bizarre enough
so that I'd rather find out about if it occurs for someone than
try to handle it from the start ;-)

  +  if (flags != -1)
  +    {
  +      /* Don't know if fcntl flags overlap, be careful */
  +
  +      if (flags & O_WRONLY)
  +        {
  +          channel->is_readable = FALSE;
  +          channel->is_writeable = TRUE;
  +        }
  +      else if (flags & O_RDWR)
  +        channel->is_readable = channel->is_writeable = TRUE;
  +#if O_RDONLY == 0
  +      else /* O_RDONLY defined as zero on linux (elsewhere?) */
  +        {
  +          channel->is_readable = TRUE;
  +          channel->is_writeable = FALSE;
  +        }
  +#else /* O_RDONLY == 0 */
  +      else if (flags & O_RDONLY)
  +        {
  +          channel->is_readable = TRUE;
  +          channel->is_writeable = FALSE;
  +        }
  +      else
  +        channel->is_readable = channel->is_writeable = FALSE;
  +#endif /* O_RDONLY == 0 */
 
If O_RDONLY == 0 is common, this implies to me that we really should
probably restrict is_readable/writeable based on the values passed to
g_io_channel_new_file() when we are creating channels in that
fashion.

  Index: glib/glib/gunicode.h
  ===================================================================
  RCS file: /cvs/gnome/glib/glib/gunicode.h,v
  retrieving revision 1.22
  diff -u -r1.22 gunicode.h
  --- glib/glib/gunicode.h	2001/07/02 00:49:20	1.22
  +++ glib/glib/gunicode.h	2001/07/15 17:26:27
  @@ -168,6 +168,8 @@
   #define g_utf8_next_char(p) (char *)((p) + g_utf8_skip[*(guchar *)(p)])

   gunichar g_utf8_get_char          (const gchar *p);
  +gunichar g_utf8_get_char_extended (const gchar *p,
  +                                   gsize        max_len);
   gchar*   g_utf8_offset_to_pointer (const gchar *str,
				      glong        offset);  
   glong    g_utf8_pointer_to_offset (const gchar *str,      
  Index: glib/glib/gutf8.c
  ===================================================================
  RCS file: /cvs/gnome/glib/glib/gutf8.c,v
  retrieving revision 1.21
  diff -u -r1.21 gutf8.c
  --- glib/glib/gutf8.c	2001/06/23 13:55:07	1.21
  +++ glib/glib/gutf8.c	2001/07/15 17:26:28
  @@ -549,7 +549,7 @@
   /* Like g_utf8_get_char, but take a maximum length
    * and return (gunichar)-2 on incomplete trailing character
    */
  -static inline gunichar
  +gunichar
   g_utf8_get_char_extended (const gchar *p, gsize max_len)  
   {
     guint i, len;

I have a patch locally that does this just a bit differently; I'll
post/apply that here as soon as possible. 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]