Re: g_utf8_validate() and NUL characters



Freddie Unpenstein wrote:
> From: "Nikolai Weibull", 09/10/2008 02:01:
>> On Wed, Oct 8, 2008 at 13:20, Havoc Pennington <hp pobox com>; wrote:
>>> Another way to put it, I don't think nul bytes are a user-explainable
>>> concept. If anybody who isn't a programmer sees (how? what's the
>>> glyph?) a nul byte in a _text_ file, that's just bizarre.
>> How is "oh, you can't open /that/ file in a text editor because it has
>> a character in it that isn't a user-explainable concept" (I'm not
>> trying to make a straw man argument) better than simply opening the
>> file, displaying the NUL as a box with 0000 in it (like Pango does for
>> other characters it can't render) and be done with it? I don't see
>> how it's the programs responsibility to state what can and what cannot
>> be in a file the user wants to open, as long as the file is valid in
>> the chosen encoding.
> 
> Why not just adopt the old thing of encoding NULLs and other non-UTF-8
> characters as safe UTF-8 equivelants...?

Because they are not valid UTF-8?  And the moment we give up dealing with
valid UTF-8 a whole other can of worms opens up.

behdad

> I've seen the practice of
> representing \0 as \UC080 (or however it's specified) recommended in a
> secure programming document as a measure for avoiding accidents
> (especially when you're using someone else's libraries), and plenty of
> other softwares and toolkits do it. C's use of NULLs is an
> implementation detail of C, it shouldn't be inflicted on everything else.
> 
> There's no need for every API function taking a text string (as opposed
> to Glib functions that may well be storing binary strings) to also have
> a version that takes a length, and for every string value throughout GTK
> to carry around a length value and all the extra work needed to work
> with length/buffer pairs over simple NULL-terminated strings. Especially
> when most of them don't handle binary anyhow.
> 
> Still doesn't answer the rendering issue, but personally, a NULL
> shouldn't have any special meaning in a string to be displayed. Whether
> it gets rendered as a box with 0's, or a zero width solid space, or
> whatever else, is another issue entirely. But it shouldn't require extra
> effort to handle it... Simply label it a binary character, and encode it
> up in the binary-to-UTF-8 functions. It can then be displayed however
> someone else decides, and be converted back into the original NUL by a
> UTF-8-to-binary function later on.
> 
> 
> Fredderic
> ------------------------------------------------------------------------
>    	Landscape Lighting
> <http://tagline.excite.com/fc/JkJQPTgKhAAQBYgxgxy2oD1M3LPkz5uJQ0mtmB1vbsKRJa7ZxY6GmP/>
> Click here to save on landscape lighting. Top brands.
> <http://tagline.excite.com/fc/JkJQPTgKhAAQBYgxgxy2oD1M3LPkz5uJQ0mtmB1vbsKRJa7ZxY6GmP/>
> Click here for more information
> <http://tagline.excite.com/fc/JkJQPTgKhAAQBYgxgxy2oD1M3LPkz5uJQ0mtmB1vbsKRJa7ZxY6GmP/>
> 
>  
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> gtk-devel-list mailing list
> gtk-devel-list gnome org
> http://mail.gnome.org/mailman/listinfo/gtk-devel-list


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]