Re: GChildWatchSource take three



Thus spake Owen Taylor:
> +void
> +g_child_watch_source_init (void)
> ===
> 
> I don't think making this public is a good idea; the precondition
> for using this code is that you aren't using SIGCHLD yourself.
> If you are, you can remove your handler before calling
> g_child_watch_source_new(), we don't have to clutter the API
> for that.

The thing is:  usually, when you launch a child process, you do
something like:

1) setup signal handler and unblock SIGCHLD
2) launch child
3) SIGCHLD arrived; call waitpid.

Without this function being public, you can only set the signal handler
after launching the child.

Is this a problem?  Well, if the programmer himself had a signal handler
installed, it's his problem - let him shoot his foot.

I'm more worried about what the parent of the process that uses
g_child_watch can do - this is also the source of my comments about
signal masks.

Digging around some more, I found a case where you can make waitpid fail
with ECHILD while using g_child_watch - sure, this case depends on
behaviour specified by SuSv3 (but not Posix) regarding setting SIGCHLD
to SIG_IGN (no zombies are generated), and some unespecified behaviour
during exec (whether setting SIGCHLD to SIG_IGN survives an exec).
Nevertheless, it occurs on linux.

Essentially:  set SIGCHLD to SIG_IGN and launch a program that uses
g_child_watch.  If the child exits before the (first) call to
g_child_watch_add, the source is dispatched with a status of -1.

Proof of concept:  modifying the non-threaded child-test.c like this
(give the race condition some chance to happen):

   pid = get_a_child ();
+  sleep (3);
   g_child_watch_add (pid, child_watch_callback, NULL);

and compiling and running the attached program, I get this:

$ ./child-test 
Testing non-threaded GChildWatchSource
child 12434 exited, status 0

$ ./sigchld_ign child-test
Testing non-threaded GChildWatchSource
child 12466 exited, status -1

Instrumenting check_for_child_exited to print the value of errno, I get
"No child processes" (i.e. ECHILD).

Calling g_child_watch_source_init before get_a_child fixes this.

If you decide to keep this function public, the docs should mention that
it should be called before the first call to fork (i.e. before the first
child is launched) - otherwise there's no point in calling it.

> I don't believe that this note describes the problem well; as I
> understood
> the conversation, it should be:
>  
>  <note><para> Some thread implementations are buggy and one thread 
>   cannot call waitpid() on a child created in a different thread; if
>   you are using this functionality in a threaded program, you may
>   need to structure your program so that child watches are always
>   added to a GMainContext running in the same thread as where
>   the child was created. </note></para>

This is a better description of the general case.
Maybe specifically mention LinuxThreads as an example?  It still is the
major thread implementation in Linux...

> 
> Couple of comments about test case:

<wishlist> It would be nice if the test exited on its own, instead of
requiring the user to press Ctrl-C </wishlist>


FWIW, I don't think I have anything else to add...

Alexis
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>

int
main (int   argc,
      char *argv[])
{
  if (signal (SIGCHLD, SIG_IGN) == SIG_ERR) 
    {
      fprintf (stderr, "SIG_ERR!\n");
      exit (1);
    }

  execv (argv[1], argv+1);
  printf ("exec failed\n");

  exit (1);
}


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]