Re: A tale of waiting



On Wed, Jun 24, 2009 at 4:39 AM, Matthias Clasen
<matthias clasen gmail com> wrote:
> As Bastien already pointed out, this is just not true. Getting the
> mime type does not use sniffing in most cases. /usr/bin is a the worst
> case where extension-based mimetype detection breaks down, but it
> should do fine in your homedir, unless you are weird.
>
I'm sorry. I just assumed it always used content sniffing, because it
was prominently showing up in sysprof's output.
It also turns out I'm weird. I have a lot of log files in my home
directory with a custom extension and that triggers sniffing.

So I set out to measure it again, comparing the output of these two
commands in various directories:
time gvfs-ls -a
standard::name,standard::type,standard::display-name,standard::is-hidden,standard::is-backup,standard::size,time::modified
> /dev/null
time gvfs-ls -a
standard::name,standard::type,standard::display-name,standard::is-hidden,standard::is-backup,standard::size,time::modified,standard::content-type
> /dev/null
Those are the attributes used by my current file chooser, comparing
those two commands with a warm cache. Then I ran the same commands
again with a preceeding "echo 1 > /proc/sys/vm/drop_caches" command:
 * my home dir: 0.2s vs 0.45s
                1.6s vs 12.9s
This has the aforementioned log files that trigger caching.
 * /usr/bin: 0.15s vs 0.65s
             1.3s vs 18.2s
This is the worst case scenario.
 * my test dir: 1.8s vs 2.2s
                7.4s vs 7.9s
This dir is interesting, because there's only .txt and .png files, so
no sniffing happens.

So it seems content-type sniffing incurs a 20% penalty for
g_file_query_info() if it takes the fast path, and is devastating if
it doesn't. Both of that is not nice and it'd be nice if populating
the file chooser would not be blocked by it, but it seems in most
cases it's not noticable. So I can just add the content-type back to
the file chooser and have one problem less to think about.

> Not sure why you keep repeating this, it is not true. GSequence is
> backed by a balanced tree, so sorting by insertion gives you O(n log
> n), so at least asymptotically,
> it is fine. The implementation may of course still be slower than qsort().
>
The biggest problem of course is calling the comparison function. And
I was under the impression that this number was significantly lower
for qsort. So I wrote a little program that counted them (attached).
And it turns out that:
- g_qsort_with_data() and g_sequence_sort() use roughly the same
number of comparisons
- glibc's qsort() takes 30% less comparisons than the other two.
- g_sequence_sort() has quite some overhead, though probably ignorable
for the file chooser

So I guess I will
a) file a bug about updating glib's qsort implementation
b) stop complaining about g_sequence_sort() being slow
The slowness I was seeing was probably related to the comparison
function using gtk_tree_model_get() (see previous mail).

> > And now the obvious question: Should I just merge it to master when
> > I'm done with the regressions?
>
> No, of course not. This needs to be reviewed and merged incrementally.
> For one thing,
> we don't know what other aspects of file chooser behaviour have been
> negatively affected by your optimization towards a single goal. Is
> scrolling still ok ? Does changing the sort column work reasonably ?
> Etc.
>
Yeah, the question was of course tongue-in-cheek.
I was wondering how to best proceed to get this code merged as sane as
possible, since it is a somewhat big chunk.


Benjamin
#include <glib.h>
#include <stdlib.h>

static guint sort;

static int
libc_compare_and_count (gconstpointer a, gconstpointer b)
{
  sort++;

  return *(gpointer **) a - *(gpointer **) b;
}

static int
compare_and_count (gconstpointer a, gconstpointer b, gpointer data)
{
  guint *count = data;

  (*count)++;

  return *(gpointer **) a - *(gpointer **) b;
}

static int
seq_compare_and_count (gconstpointer a, gconstpointer b, gpointer data)
{
  guint *count = data;

  (*count)++;

  return a - b;
}

int
main (int argc, char **argv)
{
  guint i;
  gpointer *list, *glist;
  GSequence *seq;
  guint N = 40000;
  GTimer *timer;

  if (argc > 1)
    N = g_ascii_strtoull (argv[1], NULL, 0);

  list = g_new (gpointer, N);
  seq = g_sequence_new (NULL);
  for (i = 0; i < N; i++)
    {
      list[i] = GINT_TO_POINTER (g_random_int ());
      g_sequence_append (seq, list[i]);
    }
  glist = g_memdup (list, sizeof (gpointer) * N);

  timer = g_timer_new ();
  g_timer_start (timer);
  sort = 0;
  qsort (list, N, sizeof(gpointer), libc_compare_and_count);
  g_print ("qsort(): %u - %gs\n", sort, g_timer_elapsed (timer, NULL));

  g_timer_reset (timer);
  sort = 0;
  g_qsort_with_data (glist, N, sizeof(gpointer), compare_and_count, &sort);
  g_print ("g_qsort_with_data(): %u - %gs\n", sort, g_timer_elapsed (timer, NULL));
  
  g_timer_reset (timer);
  sort = 0;
  g_sequence_sort (seq, seq_compare_and_count, &sort);
  g_print ("g_sequence_sort(): %u - %gs\n", sort, g_timer_elapsed (timer, NULL));

  return 0;
}



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]