Re: [BuildStream] Proposal: A small number of subprocesses handling jobs
- From: Tristan Van Berkom <tristan vanberkom codethink co uk>
- To: Jürg Billeter <j bitron ch>
- Cc: Jonathan Maw <jonathan maw codethink co uk>, buildstream-list gnome org
- Subject: Re: [BuildStream] Proposal: A small number of subprocesses handling jobs
- Date: Mon, 4 Mar 2019 15:38:54 +0900
On Mar 4, 2019, at 2:35 PM, Jürg Billeter <j bitron ch> wrote:
Hi,
On Wed, 2019-02-27 at 20:54 +0900, Tristan Van Berkom via buildstream-list wrote:
On Fri, 2019-02-22 at 17:04 +0000, Jonathan Maw via BuildStream-list wrote:
Any information passed to an existing subprocess has to be pickled and
unpickled, so ideally this would need to be as little as possible.
I am not sure how I would go about providing this information, assuming 
I can track down and isolate all the parts of the pipeline that change during 
a build.
A job needs access to:
 * The element being processed
 * All of the element's dependencies
Basically, all of the public API exposed to plugins, and all of the
state of all elements which the element depends on, must be readily
handy for the plugin. This contract is rather inherent due to the fact
that an Element has access to all public API and has access to all of
it's dependencies, combined with the fact that the core cannot have any
knowledge whatsoever of what the plugin will do, asides from calling
some public API.
Another option was discussed during the gathering.  We could execute
the plugin code itself in the main process/thread and hand off only the
expensive parts to a worker pool.  Not allowing arbitrary plugin code
to run in the worker makes it much simpler and more efficient to pickle
the data that is actually needed by the worker.
A possible implementation approach would be to extend the command
batching concept to also cover staging dependencies and sources, and
using a single batch/context to cover integration and build commands. 
I.e., instead of having the plugin actually do the expensive
operations, the plugin will create an operation list (possibly using a
context manager), which the BuildStream core will execute outside the
control of the plugin.
This sounds like complicating the plugin apis significantly due to technical difficulties which should be 
solvable outside of the plugin.
In general I would much rather go to great lengths in the core in order to provide a luxuriously simple and 
attractive api for plugin authors.
Beyond this, i suspect that what you describe will have similar performance as moving to a threading model, 
and think we should try that before resorting to complicating the plugin API.
With the GIL in place, we only run python in the main thread, while long standing I/O and system calls should 
be parallelized (sounds similar to what you are suggesting).
Cheers,
    -Tristan
[
Date Prev][
Date Next]   [
Thread Prev][
Thread Next]   
[
Thread Index]
[
Date Index]
[
Author Index]