Re: [BuildStream] Proposal: Moving UI into a subprocess
- From: Tristan Van Berkom <tristan vanberkom codethink co uk>
- To: Phil Dawson <phil dawson codethink com>, Benjamin Schubert <contact benschubert me>
- Cc: buildstream-list gnome org
- Subject: Re: [BuildStream] Proposal: Moving UI into a subprocess
- Date: Sun, 19 May 2019 19:13:52 +0900
TLDR:
* Sure, I agree we can do this, Benjamin has raised some more data
which appears to justify it.
* This is going to be a lot of work, and changes the core very dramatically,
no matter how you slice it, please don't underestimate this.
* Scheduler.run() needs to be called *multiple times* in the same session,
it's already a bug that we are tracking/fetching/pulling things outside
of the regular `Scheduler.run()` harness, as pointed out before here[0]
Important to consider this because state needs to be perfectly consistent
in *between* multiple calls to `Scheduler.run()`
* Asides from the above, this message presents a few abstract but incomplete
ideas of directions we could take to go about this.
Hi again,
On Fri, 2019-05-17 at 17:15 +0100, Phil Dawson wrote:
Thanks both for the responses. I've replied inline.
[...]
When running BuildStream, the main process uses 100% of a processor. There
is therefore a real bottleneck here. The scheduler shouldn't need that much.
I'd be much happier if the scheduler could use 100% of it's own process, having
the UI updated separately.
I also think that when adding Remote Execution in the picture, with many more
processes to orchestrate, we will still have this bottleneck, evne with all
the other optimisations.
Thanks Benjamin for providing this extra detail.
Tristan, is this enough to convince you that having the UI in a separate
process to the scheduler is worth pursuing (implementation details aside)?
I already replied with some hints on how we can improve logging
performance here[0], I'm certain we can do better without resorting to
splitting up roles and responsibilities into multiple processes.
Thanks for those, I indeed think they should be implemented. I however don't
think this is enough.
+1 here.
So:
* I did suspect that the UI updates are too heavy already, while I
don't usually notice any slow down in my regular usage, I can tell
that it is a lot (under heavy loads I see a lot of flicker in the
rendering).
* I wasn't really sure that we would *still* need this separation
after lessening the work we do in the main process as is.
It seems Benjamin has pointed out that even when the UI updates
every second, it is somehow taking 100% of a single CPU. On any
fairly modern machine this is certainly a problem.
The fact that this is under `bash.exe` on windows in particular,
and that running `bst ... 2>&1 | tee` is making things run
noticeably faster, leads me to believe that bash.exe is having
a very hard time processing the the ANSI escape sequences (which
are all removed by piping the output and disconnecting bst from
the terminal).
Still, even if this is the case of course I agree we can improve
performance of BuildStream when running in suboptimal environments.
So yes sure I have no objection to doing this, I wanted mostly to make
clear that "logging in the main process is taking 50% of the main
process" is not an observation which is indicative of a problem, we
should not be basing large refactors on observations which by
themselves do not indicate actual bottlenecks.
[...]
<snip>
One thing to point out is that no matter how we slice things, we are
still going to have the issue of synchronizing element state across
processes, as state mutates over the course of a session, and the
logger needs to display state.
I would hope that the worker process could eventually give all the necessary
information to the frontend process, that would remove any problem arrising from
this and would remove lots of state synchronisation.
As far as state synchronisation goes, as far as I understand, the
following bits of state are needed by the front-end.
1) The element lookup table, this will be populated after load time and
won't change during a scheduler run, so I don't think we'd any
synchronisation here.
2) Element names. Again, these are available after load time and won't
change during a scheduler run, so I don't think we'd need to communicate
this back to the main process.
3) Element cache keys. I'm not sure if the cache keys as displayed by
the front-end can change during a scheduler run. I feel like they
probably do, but I'm not sure.
4) The status of the scheduler queues to be displayed on the status bar.
5) Cache usage.
Am I missing anything here?
There are a few ways to fry this cat...
First of all, I'm not particularly fond of enumerating the various
things one by one which need to be synchronized with the frontend
process for the sole purpose of logging - this indicates that we think
that the frontend will never be enhanced and will only ever need to
know what we are telling it right now.
Another thing which I think is extremely important to consider, is that
all side effects on the data model caused by a single invocation of
`Scheduler.run()` need to be observable after `Scheduler.run()`
completes, at which point the calling code can observe the side effects
before making further calls to `Scheduler.run()` on the same data
model.
As I pointed out (in coincidentally the same thread) here[0], the
`Scheduler.run()` function was never intended to be something that is
only ever called once in a single invocation of BuildStream, and I
really consider it to be a bug that we are not calling
`Scheduler.run()` for *all* tasks which we offload to elements (like
fetching of junctions, fetching or tracking of sources as a side effect
of opening workspaces, or pulling of build trees as a side effect of
opening a bst shell, *everything* should be passing through
`Scheduler.run()`).
Note that I am not being specific about *which* process this needs to
happen in, but certainly, multiple calls to `Scheduler.run()`, along
with any code in between, needs to have a reliable data model which
represents exactly what happened as a side effect of processing any
jobs on that data model.
One solution can be:
* Make sure that *all state* is always mirrored into the frontend
process, and every call to `Scheduler.run()` spawns a new
subprocess in order to run the scheduler.
* Ensure that when any given call to `Scheduler.run()` completes,
we process all income state update messages in the frontend at
the end of that subprocess and don't lose any state (ensuring
that we have a consistent data model in between calls to
`Scheduler.run()`).
Another solution potentially be:
* Refactor the frontend logging mechanism so that the full context
of every message is serialized in the Message() object, and the
frontend never needs to access the data model (perhaps the
Message() object in this case loses the task_id and unique_id
members completely, and we provide all fields in every message)
That would be a pretty hefty rewrite, but would also have the
advantage of reflecting the state at the time of which a message
was sent to the frontend, rather than reflecting the state which
happens to be in the frontend when the message arrives.
I.e., one could imagine a scenario where:
- The messages are printed every second, or twice a second
- A workspace build completes and state is synchronized
- The frontend decides to print messages about the workspace build
Here we might end up seeing the workspace cache key in messages
before the final message where the cache key was actually resolved.
If we had snapshotted exactly everything that needs to be printed
in a frontend message and sent it in the message itself, then we
wouldnt have this problem.
* Run the entire body of any `Stream()` function in a subprocess.
This would also include loading the project itself.
* In this approach we don't have the problem of synchronizing state
at all really, as the frontend never needs any state at all and
only performs logging.
Side effects from one call to `Scheduler.run()` remain in the
same process where subsequent calls to `Scheduler.run()` will
occur.
While this second approach has some benefits, it also makes interactive
debugging a bit harder, i.e. when a build fails and it is time to shell
into a build sandbox in order to debug what happened - we may need to
reload the data model in the frontend just for the sake of this.
In this case, we would not want to load files from the disk again, in
my experience, my builds run for a very long time, and I am probably
doing work on the same BuildStream project, patching a separate branch,
editing files and such, while a build is ongoing - so I certainly
expect that BuildStream knows what it loaded and is able to present me
with a debug shell of the build I launched 2 hours ago, without
consulting my filesystem to do so.
In *any* scenario:
* We're going to anyway need a generic place, probably in Stream()
where we can launch the subprocess and setup an IPC for that,
process, running an asyncio event loop in the frontend for
the sake of:
- Handling user events (SIGINT, SIGTERM, SIGTSTP)
- Listening for incoming logging messages, probably coming from
various sources (i.e. we can have logging messages coming
from the scheduling process, or they might also come directly
from jobs).
* This harness will need to mirror some of the Scheduler() mechanics,
i.e. the frontend will need to send explicit IPC messages to the
subprocess, informing it to:
- suspend all jobs now
- resume all jobs now
- kill all jobs now
Another solution might be a cross between the two, i.e. I could
envision a possible, rather undesirable approach, where the frontend
spawns a subprocess for every single call to `Scheduler.run()`, and
then *recalculates state* in the frontend process after having
completed a call to `Scheduler.run()` and before observing any side
effects or making any subsequent scheduler runs - this is of course
undesirable because we duplicate work by re-resolving state in the
frontend which was already resolved in a subprocess.
Cheers,
-Tristan
[0]: https://mail.gnome.org/archives/buildstream-list/2019-April/msg00063.html
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]