[BuildStream] Proposal: Reduce the amount of work that's done at places where we call Element._update_state()
- From: Jonathan Maw <jonathan maw codethink co uk>
- To: buildstream-list gnome org
- Subject: [BuildStream] Proposal: Reduce the amount of work that's done at places where we call Element._update_state()
- Date: Mon, 01 Apr 2019 14:59:31 +0100
Hi,
I have been looking at ways to reduce the amount of time we spend in
Element._update_state(), because it is responsible for a significant
amount of time.
e.g. in the last round of performance profiling, it took 16-20% of the
time spent in `bst show`.
To that end, I have spent some time tracing around how _update_state is
called
(https://gitlab.com/BuildStream/buildstream/issues/902#note_153368417),
and produced the following tasks:
* via `Loader._get_loader()`:
- [x] Only calculate the weak cache key <-- Implemented in
https://gitlab.com/BuildStream/buildstream/merge_requests/1251
* via `Element._schedule_tracking()`:
- [x] Remove call to _update_state() <-- Implemented in
https://gitlab.com/BuildStream/buildstream/merge_requests/1251
* via `Pipeline.resolve_elements()`:
- [ ] Don't attempt to schedule assembly
* via `Element._set_required()`:
- Conceivably all of _update_state could happen. Nothing to split out.
* via `Element._schedule_assemble()`:
- [ ] Only needs to wipe workspaced elements and calculate cache
keys/cached state.
* via `Element._tracking_done()`:
- Conceivably all of _update_state could happen. Nothing to split out.
* via `Element._assemble_done()`:
- [ ] Cache key calculation (even weak keys, because workspaced
elements and BST_STRICT_REBUILD)
- [ ] Scheduling assembly when in strict mode
* via `Element._fetch_done()`:
- [x] Only update source state <--
https://gitlab.com/BuildStream/buildstream/merge_requests/1251
* via `Element._pull_done()`:
- [ ] Strict and strong cache key calculation only (everything else
has already been done)
Broadly, the purpose of these changes should be to improve performance
and readability, and while this is easy for the ones already
implemented, this becomes harder for the other tasks.
e.g. `Element._update_state()` via `Pipeline.resolve_elements()` - In
_update_state(), the logic for scheduling assembly is interleaved with
calculating cache keys, because when buildstream isn't being run in
strict mode, we can decide whether to schedule assembly based on the
artifacts' weak cache key, instead of waiting for the strict cache key
to become available (i.e. in non-strict mode we don't need to know the
state of an element's dependencies to know whether it's ready to be
built).
To avoid making an unmaintainable mess, `Pipeline.resolve_elements()`
and `Element._update_state()` should both call common functions.
The best idea I can come up with to do this is:
1. Move the logic to calculate (and check if cached under that key) the
weak (`self.__weak_cache_key`), strict (`self.__strict_cache_key`), and
strong (`self.__cache_key`) cache keys into separate methods, e.g.
`_get_weak_cache_key`, `_get_strict_cache_key`, `_get_strong_cache_key`.
These methods return the cache key, and calculate it if it's not already
set.
2. Replace the cache key generation logic in `Element._update_state()`
with the new methods.
3. `Pipeline.resolve_elements()` will:
a. call `Element.__update_source_state()`
b. return early if the element's consistency is
Consistency.INCONSISTENT
c. call _get_weak_cache_key, _get_strict_cache_key and
_get_strong_cache_key in order, returning early if any of those fail.
Does the above task list, and the suggested implementation of another
task match people's expectations of how to reduce the amount of work
that _update_state is doing?
Thanks,
Jonathan
--
Jonathan Maw, Software Engineer, Codethink Ltd.
Codethink privacy policy: https://www.codethink.co.uk/privacy.html
[Date Prev][
Date Next] [Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]