Re: IOChannel and dodgy encodings



On Wed, Aug 01, 2001 at 11:26:46PM -0700, Ron Steinke wrote:
> It doesn't matter where in the file the error you were talking about
> occurs. What I was talking about doing was making the error
> handling for g_io_channel_read_chars() only return an ILLEGAL_SEQUENCE
> error if the illegal character was the next one to be read, instead
> of just somewhere in the GIOChannel internal buffer. This would
> allow you to seek past it, or convert to another encoding
> and read it in.

	I actually liked the 'super technical section', though I had
seen some of the code already :-)
	I was thinking about my particulary situation.  I am capturing
the output of commands on a regular system.  So I can just set the
channel encoding to the system encoding.  Bing, it works.  Except for
POSIX C, which is encoded as ANSI_X3.4-1968 according to
g_get_charset().  Which still requires handling.
	My 'compare output' program is less(1), which happily displays
highlighted invalid characters.  In en_US ISO-8859-1, less and
GIOChannel work great.  In C ANSI_X3 4-1968, both have a problem with
the latin-1 characters.  But ANSI_X3.4-1968 is just a subset of
ISO-8859-1, so I can just special case that, right?  Hmm.
	This is, however, just one special case.  Invalid characters can
happen in other, not-so-predictable environments.
	Forgive me, I'm thinking out loud, I do have a point to get to.
	With GIOChannel error handling, I think the ability to act like
less(1) would be very worthwhile.  That is, get all the text before the
invalid character, then walk the bytestream until we find the next
character, and put whatever I want in my UI for "invalid character was
here".  Then continue with valid input.

Joel

    

-- 

print STDOUT q
Just another Perl hacker,
unless $spring
	-Larry Wall

			http://www.jlbec.org/
			jlbec evilplan org




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]