Re: [BuildStream] Proposal: A small number of subprocesses handling jobs
- From: Tristan Van Berkom <tristan vanberkom codethink co uk>
- To: Jonathan Maw <jonathan maw codethink co uk>, buildstream-list gnome org
- Subject: Re: [BuildStream] Proposal: A small number of subprocesses handling jobs
- Date: Tue, 05 Mar 2019 16:55:03 +0900
Hi,
On Mon, 2019-03-04 at 15:50 +0000, Jonathan Maw via buildstream-list wrote:
[...]
We discussed the pool of subprocesses, and came to broadly two different
ways to go about it:
1. A multiprocessing pool that forks off at the start of the scheduler.
===
* Changes to the element in the lifetime of a job will be captured, and
passed through the job's result object when the job finishes.
* Changes from a job result will be received by the scheduler and pushed
to each worker subprocess.
* Mandate that only the element that the job is running for can be
changed in the job
- A "soft" mandate (changes will not be propagated to the other
workers) is enough for normal operation, but a separate mode where such
changes are forbidden (or any changes outside the element are thrown
away) would be useful for debugging.
* A "pristine" subprocess that is unchanged by jobs would be useful for
forking new subprocesses, especially if we decide that each worker
should have a finite lifetime.
2. An element graph service
===
i.e. one subprocess holds the pipeline and uses some form of IPC to get
and set state changes.
This is a valuable long-term goal to work towards, once we have a much
better idea of where/when we access the element graph, and have
completely encapsulated every time a plugin would access the element
graph.
At this point, that process will be acutely affected by any slowness in
`_update_state()`, and if plugin authors can affect this then we will
have to hope/impress/demand that this should have a small time impact.
(As an aside, this is currently not the case. git-based plugins
implement `validate_cache()` and fork off a git subprocess to find the
branch and tag. using libgit2 here would be valuable)
Both of these drastic changes incur some measure of overhead.
Where is the evidence that these overheads are going to be less than
the overhead of simply forking on demand ?
Without this evidence, we should definitely simply stay with option 3,
which is to not make any changes, the code is simpler now than it would
be if either of the above proposed routes were to be followed.
While this means that the core must be careful to tip-toe around
libraries which inadvertently spawn threads, this tip-toeing is not
necessary from the plugin code[1]. I strongly believe fork-on-demand is
the lesser evil in terms of overall complexity in the core.
Cheers,
-Tristan
[1]: Plugin code here refers to code controlled by plugin authors, not
private core business logic encoded into the base classes.
And the statement above is actually inaccurate, but it would not be
very difficult to ensure that plugin code is only ever invoked from a
subprocess, ensuring that plugins need not worry about thread-spawning
libraries in their own code.
Further, the majority of calls into plugin code from the main process
are Source.get_consistency() (to interrogate the Source plugin owned
cache directory initially) - we should be able to significantly
optimize the load times by parallelizing these interrogations.
Cheers,
-Tristan
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]