Re: Rationale for change in behavior of g_strsplit when passed empty string?



Darin Adler <darin bentspoon com> writes:

> When g_strsplit in glib 1.2 was passed an empty string, it returned an
> empty vector.
> 
> When g_strsplit in glib 1.3 is passed an empty string, it returns a
> vector with a single empty string in it.
> 
> Is there a good reason for this change? We'll have to make some small
> changes to code in eel and nautilus because of the change, and I just
> wanted to make sure it was no accident.

The change probably came from:

 2000-10-26  Sebastian Wilhelmi  <wilhelmi ira uka de>
 
	 * gstrfuncs.c (g_strsplit): When the string is ended by a
 	 delimiter, return an extra empty string just like for a delimiter
	 at the start of the string. This makes the function behave more
	 consistent and also fixes Bug #15026.

[
  The old behavior actually sort of matches that of Perl:

       split   Splits a string into a list of strings and returns
               that list.  By default, empty leading fields are
               preserved, and empty trailing ones are deleted.

               [...]

               If LIMIT is specified and positive, splits into no
               more than that many fields (though it may split
               into fewer).  If LIMIT is unspecified or zero,
               trailing null fields are stripped (which potential
               users of `pop' would do well to remember).  If
               LIMIT is negative, it is treated as if an arbi­
               trarily large LIMIT had been specified.

 but the Perl behavior, with dependence on LIMIT, would have to be 
 considered somewhat demented. Python's string.split() behavior
 is the 1.3 behavior.
]


The _right_ behavior is a tricky question. Logic says:
Empty elements are allowed internally:

    g_strsplit ("a:b:c")  => "a", "b", "c"
    g_strsplit ("a::c")  => "a", "", "c"

So, if we we treat empty elements as valid, we can't treat "" as meaning
an empty list. 

    g_strsplit ("b")  => "b"
    g_strsplit ("")  => ""

But practicality says there should be _some_ way of representing an
empty list.

The problem here is that the conception of strsplit is as a inverse to
strjoin(), but strjoin() isnt one-to-one if empty elements and
empty lists are allowed.

In absence of a clear "right way" we may need to stick to the 1.2
behavior to avoid breaking existing code in tricky ways.

Finally, using g_strjoin(), g_strsplit() to flatten and unflatten
vectors is fundementally broken since they don't do escaping.

Regards,
                                        Owen
 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]