Re: g_utf8_validate() and NUL characters



Dan Winship wrote:

> because g_utf8_validate() works on strings, and strings end at NUL.

You are confusing strings with nul-terminated strings.  Which of the following
two statements is true:

  - "Zero is not an integer"

  - "Zero is not a non-zero integers"

>> Lemme repeat again: When dealing with UTF-8 text, a max-length makes zero
>> sense without inspecting the string first.  So the strncpy, etc behavior is
>> not relevant.
> 
> Not always true. You might do something like:
> 
>     if (!g_utf8_validate (string, strcspn (string, "/"))) {
> 
> (to validate to the end of the string, or the first "/", whichever comes
> first).

How does "strcspn (string, "/")" is not "inspecting the string first"?

By the way, uses like the above work regardless of whether g_utf8_validate()
stops on nul, because you have already stopped on the nul in your strcspn.

Really, the only argument for the current behavior of length can be the same
as strncpy's.  That is, to use as the buffer length:

  char buf[1024];

  fill_buf(buf, sizeof (buf));

  g_utf8_validate (buf, sizeof (buf));

But thirty years of accumulated Unix code has shown that it's simply easier to
write:

  char buf[1024];

  fill_buf(buf, sizeof (buf)-1);
  buf[sizeof (buf) - 1] = '\0'; /* just in case */
  g_utf8_validate (buf, -1);


> But at any rate, even if it was true, that would be an argument for
> "g_utf8_validate() shouldn't have a length argument at all", not
> "g_utf8_validate() should behave differently from other string methods
> when you pass it a length".

It's quite common in glib already that: "length of @string, or -1 if
nul-terminated".

> -- Dan

behdad


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]