Re: [BuildStream] Proposal: Reduce the amount of work that's done at places where we call Element._update_state()
- From: Tristan Van Berkom <tristan vanberkom codethink co uk>
- To: Jonathan Maw <jonathan maw codethink co uk>, buildstream-list gnome org
- Subject: Re: [BuildStream] Proposal: Reduce the amount of work that's done at places where we call Element._update_state()
- Date: Tue, 02 Apr 2019 16:17:03 +0900
Hi Jonathan,
Thanks for the detailed analysis.
On Mon, 2019-04-01 at 14:59 +0100, Jonathan Maw via buildstream-list wrote:
Hi,
I have been looking at ways to reduce the amount of time we spend in
Element._update_state(), because it is responsible for a significant
amount of time.
e.g. in the last round of performance profiling, it took 16-20% of the
time spent in `bst show`.
I think we all know that _update_state() itself is a bottleneck, I
think we know this mostly from intuition, we know that it is being
called overly frequently and we know that it does too much in one
place, and we also observe that it takes a large portion of the load
process.
However, 16-20% is quite meaningless here, it is quite fairly possible
that after all optimizations are done to the best of abilities, it will
still take 16-20% of load time and not be a problem anymore.
I would very much like to observe how exponential the algorithms are
which we are analyzing, i.e. I want to see time-per-record in a graph
which proves that the time-per-record spent in _update_state() is
increasing depending on how many elements are being loaded, and it
would be interesting to also know if this time-per-record increases in
horizontal graphs, where I expect it to currently increase mostly only
for vertical graphs (i.e. dependency chains vs one stack depending on
all elements).
In short, my point here is that we should not be counting 16-20% as any
kind of indication that there is a problem (we know there is a problem,
but not because of the number 16-20%).
To that end, I have spent some time tracing around how _update_state is
called
(https://gitlab.com/BuildStream/buildstream/issues/902#note_153368417),
and produced the following tasks:
* via `Loader._get_loader()`:
- [x] Only calculate the weak cache key <-- Implemented in
https://gitlab.com/BuildStream/buildstream/merge_requests/1251
* via `Element._schedule_tracking()`:
- [x] Remove call to _update_state() <-- Implemented in
https://gitlab.com/BuildStream/buildstream/merge_requests/1251
I think both of these changes are wrong, as explained in these comments:
https://gitlab.com/BuildStream/buildstream/merge_requests/1251/diffs#note_156279890
https://gitlab.com/BuildStream/buildstream/merge_requests/1251/diffs#note_156281026
* via `Pipeline.resolve_elements()`:
- [ ] Don't attempt to schedule assembly
* via `Element._set_required()`:
- Conceivably all of _update_state could happen. Nothing to split out.
* via `Element._schedule_assemble()`:
- [ ] Only needs to wipe workspaced elements and calculate cache
keys/cached state.
* via `Element._tracking_done()`:
- Conceivably all of _update_state could happen. Nothing to split out.
* via `Element._assemble_done()`:
- [ ] Cache key calculation (even weak keys, because workspaced
elements and BST_STRICT_REBUILD)
- [ ] Scheduling assembly when in strict mode
* via `Element._fetch_done()`:
- [x] Only update source state <--
https://gitlab.com/BuildStream/buildstream/merge_requests/1251
* via `Element._pull_done()`:
- [ ] Strict and strong cache key calculation only (everything else
has already been done)
Broadly, the purpose of these changes should be to improve performance
and readability, and while this is easy for the ones already
implemented, this becomes harder for the other tasks.
I appreciate your attention to detail here as it appears you have
examined what is actually required to be done in each case where
_update_state() is called, and considered what could be "split out" and
avoided in each of these cases.
I don't think the above should be considered a task list at all though,
lets interpret the above as a valuable case study to assist us in
making the right changes to address them (as you venture to do below).
e.g. `Element._update_state()` via `Pipeline.resolve_elements()` - In
_update_state(), the logic for scheduling assembly is interleaved with
calculating cache keys, because when buildstream isn't being run in
strict mode, we can decide whether to schedule assembly based on the
artifacts' weak cache key, instead of waiting for the strict cache key
to become available (i.e. in non-strict mode we don't need to know the
state of an element's dependencies to know whether it's ready to be
built).
This is the best observation here; what in fact is interleaved ?
Also, a very good question here is: What is the real purpose of
_schedule_assemble() ?
It appears that this is just caching a value which we want to avoid
repeatedly resolving in the BuildQueue's Queue.status() implementation,
and that this cached state *might* go away entirely after Benjamin's
scheduling refactor:
https://mail.gnome.org/archives/buildstream-list/2019-March/msg00034.html
Lets assume for the purpose of this discussion that we still need this
caching of "whether an element is ready to be assembled" state.
To avoid making an unmaintainable mess, `Pipeline.resolve_elements()`
and `Element._update_state()` should both call common functions.
The best idea I can come up with to do this is:
1. Move the logic to calculate (and check if cached under that key) the
weak (`self.__weak_cache_key`), strict (`self.__strict_cache_key`), and
strong (`self.__cache_key`) cache keys into separate methods, e.g.
`_get_weak_cache_key`, `_get_strict_cache_key`, `_get_strong_cache_key`.
These methods return the cache key, and calculate it if it's not already
set.
2. Replace the cache key generation logic in `Element._update_state()`
with the new methods.
3. `Pipeline.resolve_elements()` will:
a. call `Element.__update_source_state()`
b. return early if the element's consistency is
Consistency.INCONSISTENT
c. call _get_weak_cache_key, _get_strict_cache_key and
_get_strong_cache_key in order, returning early if any of those fail.
I want to point out that what you are suggesting in (3) here, is
_exactly_ what _update_state() is already doing.
Does the above task list, and the suggested implementation of another
task match people's expectations of how to reduce the amount of work
that _update_state is doing?
Not exactly for me, but certainly close. Let's take a step back and
look at the bigger picture.
What do we know ?
* We know that the logic of whether an element is ready to be assembled
or not is interleaved with the calculation of cache keys.
We probably know that an element *cannot* be scheduled to be
assembled until at least all of it's dependency cache keys are
resolved, and it *cannot* be scheduled to be assembled until it's
own keys have been resolved unless it is a workspaced element in
which case it's cache key can only be discovered after a build.
Does this mean that we can safely move the logic which decides
whether an element can be scheduled to be assembled to a location
which runs *after* the cache key logic runs ?
* We also know that the logic of whether an element is *required*
is controlled by whether an element requires a build dependency
which could not be pulled from a remote, and that similarly to
the state of whether an element is ready to be assembled, this
is controlled as a side effect of _update_state().
* We know that weak, strong and strict cache keys are being calculated
at the same time, when perhaps they don't need to be.
* We know that the logic of how to treat cache keys differs depending
on certain circumstances, these include:
- If we are not in strict mode, we want to delay calculation of the
cache key which will be used until after a pull is complete, unless
we already have a cached artifact who's strong cache key matches
our strict cache key for this session.
Note: I have a feeling that we are not properly delaying reverse
dependency builds on discovery of the ultimately used
artifact in this case, we might be scheduling assemble
prematurely here
- If an element is workspaced, the cache key is also calculated
differently
What current refactors are already in flight ?
- Benjamin has recently landed a patch which causes _update_state()
to be called on reverse dependencies instead of at Queue.status()
time, so now we have the ability to perform reverse dependency
walks in the build graph:
https://gitlab.com/BuildStream/buildstream/merge_requests/1070
- The natural next step to the above, will be to remove all calls
to Element.dependencies() when calculating cache keys.
In the current state after Benjamin's refactor, we already
recurse into reverse dependencies only when we have a new cache
key to contribute to the reverse dependency's cache key.
So the next step here will be to *hand over* our freshly
calculated key to the reverse dependencies directly, such
that the reverse dependency will never need a loop to accumulate
they keys of it's dependencies.
When looking at the above, what I am thinking is:
* We need to identify what exactly is the difference between
_set_required() and _schedule_assemble().
* We need to safely move _set_required() and _schedule_assemble()
outside of cache key calculation logic
* We *might* need to split up cache key calculation logic into separate
phases, instead of calculating weak/strong/strict in the same
go.
This one is a bit less certain, I am not sure if we gain anything
by conditionally resolving these keys separately, or handing the
responsibility of "calculating the keys" to a single function which
conditionally resolves these keys separately.
Now, to speak the fact that we know that cache keys are calculated
differently in different scenarios, I still wonder if we can delegate
this to a separate object who's responsibility is to calculate keys.
I think we know at load time, even as early as Element instantiation
time, what technique will be employed and what cache key behaviors will
be used for a given element, with this in mind, could we then do the
following ?
* Create a CacheKey abstract class
* Decide which CacheKey implementation to use for each element at
load time
* Delegate some methods to the CacheKey object
I think it would be ideal to move all of this out of element.py, and
having separate subclasses to handle the different scenarios of how a
cache key needs to be calculated would allow us to avoid the spaghetti
of conditional statements which is currently _update_state().
Does this make sense ?
Cheers,
-Tristan
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]