[BuildStream] Plans for workspaces and incremental builds
- From: Darius Makovsky <darius makovsky codethink co uk>
- To: buildstream-list gnome org
- Subject: [BuildStream] Plans for workspaces and incremental builds
- Date: Wed, 02 Oct 2019 12:13:45 +0100
Recently I've been thinking about workspaces and how they currently work
versus how they should work in the future. One of the main goals is to
facilitate
remote execution (RE) builds of workspaced sources in addition to local
build support.
I've had some initial thoughts about this.
In order to support RE, workspaces will be staged via the sourcecache.
This
will fundamentally change the nature of workspaces from their current
implementation such that test expectations should be revisited: a
scheduled
process no longer affects the directory on the local filesystem (wsdir).
(This
change was committed in !1563[1].) In this context a process is
something
encapsulating any rule-based change (such as a build).
`f(x) = x' = T_x`
Consequently, the post-process wsdir key is identical to the pre-process
wsdir
key and the concept of key stability can be removed: WS keys do not
require
resetting and post-process recalculation and meaningful keys are
obtained at
staging.
In order to support incremental builds it will be necessary to have a
mechanism
to produce the difference of source trees (`h(x,y) = d`) and apply a
difference
(`h^-1(x,d) = y`). It will also be necessary to track a previous
state of the workspace.
Currently only successful builds are tracked in the workspace (via the
persisting workspace metadata) but I think this must change to track the
last
WS key regardless of the success of the process. Assuming that the
previous
digest is stored then the associated build tree is recoverable via the
cache.
The scheme for incremental builds could then be expressed as:
1. Given current workspace state `y`, and stored input state `x => T_x`
2. Verify that `h^-1(x, T_x) == T_x` If this verification fails, then
incremental build cannot continue and we should fall back to `f(y) =
T_y`
3. Compute the delta between `x` and `y`: `h(x,y) = d`
4. Apply that delta to the previous build's output: `h^-1(T_x, d) => y'`
5. Apply the process to that new input state: `f(y') = T_y'`
Assuming that `f()` represents a sane build system, we can believe that
the
application of `f()` to `y'` will produce a build tree functionally
equivalent,
if not identical, to `f(y)` (`T_y = T_y'`). The verification step in 2
may fail
if, for example, a build system chooses to remove one of its inputs as
part of
the build process.
In addition to storing the source digest of the previous wsdir on each
process
it will be useful to store the dependency hash and the artifact ref
(necessary
for application of the source difference). If the dependency hash
changes
between processes then a complete build will be required rather than an
incremental build.
I would like to get the opinions of the list on this before moving
further ahead.
There is a development branch removing the concept of cache key
stability and key
recalculation[2] which currently seems to only fail
`tests/integration/shell.py::test_workspace_visible`. In summary:
* remove unstable cache key concept
* do not reset or recalculate workspace cache keys
* store source digest, dependency hash, and artifact ref for workspaces
* introduce mechanism to diff and apply trees
* add logic to decide to continue or abort incremental builds
[1] https://gitlab.com/BuildStream/buildstream/merge_requests/1563
[2]
https://gitlab.com/BuildStream/buildstream/tree/traveltissues/benchmark-3
Best Regards,
Darius
[Date Prev][
Date Next] [Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]