Re: nasty bug in multithreaded g_main_loop...




Sebastian Wilhelmi <wilhelmi@ira.uka.de> writes:

> Hi,
> 
> here's a deadlock due to a nonrecursive mutex, which is tried to be
> obtained twice. The backtrace is below. I've looked in the code, but
> haven't seen any good way to solve that yet, apart from doing tricks there
> with an own implementation of a recursive mutex, we would only need to add
> g_thread_self, which isn't much, but obviously a API-changel, so actually
> not wanted (It is also possible to do it with thread private data, that
> doesn't feel clean, though). However things can't remain like they are,
> because thats simply a bug. It happens in my multithreaded ORBit-code,
> where I use the glib main loop abstraction. It will be in CVS soon, if you
> would like to see, where it is happening. It simply is, that a hook gets
> removed and therefore, while holding the mutex the corresponding FD is
> removed.
> 
> #4  0xef679ac0 in _mutex_lock ()
> #5  0xef775bdc in g_main_remove_poll (fd=0x7a388) at gmain.c:761
> #6  0xef77346c in g_io_unix_destroy (source_data=0x7a388) at giounix.c:134
> #7  0xef7746e4 in g_source_free_func (hook_list=0xef799744, hook=0x7c488)
> at gmain.c:246
> #8  0xef77176c in g_hook_free (hook_list=0xef799744, hook=0x7c488) at
> ghook.c:116
> #9  0xef771b8c in g_hook_unref (hook_list=0xef799744, hook=0x7c488) at
> ghook.c:188
> #10 0xef774e90 in g_main_dispatch (current_time=0xef20bba8) at gmain.c:404
> #11 0xef7753b4 in g_main_iterate (block=1144, dispatch=1) at gmain.c:573
> #12 0xef775504 in g_main_run (loop=0x79dd8) at gmain.c:617

OK, there is an easy fix for this one. We can just split the
main_loop lock into two pieces - one for the poll array and
one for the rest. Could you try the following patch out?

There are two more general points, though brought up
by this:

a) Should we have recursive mutexes? We decided against
   this initially because we didn't seem to need them
   and there were a number of portability problems.

   (They aren't available for all pthreads implementations
   in a standard way, and so you need to emulate them, and
   if you emulate them, then you run into the problem
   that all threading systms don't have a nice equivalent
   of pthread_self() so you might have to emulate that too...)

   I'd tend to stick to this opinion; less because I'm
   worried about changing this API (which few people
   are using yet), than because the original reasons
   for avoiding them still hold.

   (In my opinion, recursive mutexes are poor practice
   because they indiciate you don't know where your
   locks are being held, which leaves you vulnerable
   to other types of deadlocks. But they are occasionally
   useful.)

b) We destroy sources within the main loop lock. This
   is nasty because we will end up calling the destroy
   notifies for the user data within the main lock
   too.

   So, considering how that destroy-notify is used
   in many language bindings (reference counting)
   you could easily end up with the destructor for
   some Perl object (for example) being called within 
   the main loop lock. 

   And by that time, expecting the application programmer
   to understand the consequences is a pretty slim
   chance. (Note that making the main loop lock recursive 
   does not remove all possible problems here)

   But avoiding this is not easy, since you first
   have to disentagle the GHook out of the GHookList,
   unlock, then destroy the hook. This definitely
   would require modifications to the GHook API's.

   (Hmmm, I suppose the cheat is to do all destroy
   notifications from an idle function... (not really
   a serious suggestion))

                                        Owen

Index: gmain.c
===================================================================
RCS file: /cvs/gnome/glib/gmain.c,v
retrieving revision 1.19
diff -u -r1.19 gmain.c
--- gmain.c	1999/01/07 20:12:19	1.19
+++ gmain.c	1999/01/16 16:17:05
@@ -89,8 +89,6 @@
 static void     g_main_poll               (gint      timeout,
 					   gboolean  use_priority, 
 					   gint      priority);
-static void     g_main_add_poll_unlocked  (gint      priority,
-					   GPollFD  *fd);
 
 static gboolean g_timeout_prepare      (gpointer  source_data, 
 					GTimeVal *current_time,
@@ -114,10 +112,11 @@
 static GSList *pending_dispatches = NULL;
 static GHookList source_list = { 0 };
 
-/* The following lock is used for both the list of sources
- * and the list of poll records
+/* The following two locks are partially ordered; you cannot
+ * acquire poll_records_lock after acquiring main_loop_lock
  */
-G_LOCK_DECLARE_STATIC (main_loop);
+G_LOCK_DECLARE_STATIC (main_loop);    /* Locks the list of sources */
+G_LOCK_DECLARE_STATIC (poll_records); /* Locks the poll records */
 
 static GSourceFuncs timeout_funcs = {
   g_timeout_prepare,
@@ -661,9 +660,11 @@
 
       wake_up_rec.fd = wake_up_pipe[0];
       wake_up_rec.events = G_IO_IN;
-      g_main_add_poll_unlocked (0, &wake_up_rec);
+      g_main_add_poll (&wake_up_rec, 0);
     }
   
+  G_LOCK (poll_records);
+
   fd_array = g_new (GPollFD, n_poll_records);
  
   pollrec = poll_records;
@@ -678,6 +679,8 @@
       i++;
     }
 
+  G_UNLOCK (poll_records);
+
   poll_waiting = TRUE;
   
   G_UNLOCK (main_loop);
@@ -693,6 +696,8 @@
   else
     poll_waiting = FALSE;
 
+  G_LOCK (poll_records);
+
   pollrec = poll_records;
   i = 0;
   while (i < npoll)
@@ -702,25 +707,19 @@
       i++;
     }
 
+  G_UNLOCK (poll_records);
+
   g_free (fd_array);
 }
 
 void 
-g_main_add_poll (GPollFD *fd,
+g_main_add_poll (GPollFD *fd, 
 		 gint     priority)
 {
-  G_LOCK (main_loop);
-  g_main_add_poll_unlocked (priority, fd);
-  G_UNLOCK (main_loop);
-}
-
-/* HOLDS: main_loop_lock */
-static void 
-g_main_add_poll_unlocked (gint     priority,
-			  GPollFD *fd)
-{
   GPollRec *lastrec, *pollrec, *newrec;
 
+  G_LOCK (poll_records);
+
   if (!poll_chunk)
     poll_chunk = g_mem_chunk_create (GPollRec, 32, G_ALLOC_ONLY);
 
@@ -751,6 +750,8 @@
   newrec->next = pollrec;
 
   n_poll_records++;
+
+  G_UNLOCK (poll_records);
 }
 
 void 
@@ -758,7 +759,7 @@
 {
   GPollRec *pollrec, *lastrec;
 
-  G_LOCK (main_loop);
+  G_LOCK (poll_records);
   
   lastrec = NULL;
   pollrec = poll_records;
@@ -782,7 +783,7 @@
       pollrec = pollrec->next;
     }
 
-  G_UNLOCK (main_loop);
+  G_UNLOCK (poll_records);
 }
 
 void 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]