[BuildStream] Allowing duplicate junctions [Was: Be explicit when overriding junction configuration, or else warn/error]
- From: Tristan Van Berkom <tristan vanberkom codethink co uk>
- To: Jürg Billeter <j bitron ch>, Chandan Singh <chandan chandansingh net>
- Cc: buildstream-list gnome org
- Subject: [BuildStream] Allowing duplicate junctions [Was: Be explicit when overriding junction configuration, or else warn/error]
- Date: Fri, 08 May 2020 16:50:49 +0900
Hi again all,
I'm reposting to the list to get some more eyes on the problem of
explicitly allowing multiple junctions to the same project to exist,
and how we want to make that happen.
To jump right in see below, first I will preface this message with the
context of what changes are in the pipeline.
Explicit overriding of junction configurations
==============================================
As discussed in previous messages, I've set out to reimplement how
shared junctions get overridden, such that you can no longer
unknowingly override a junction configuration.
There is now a WIP merge request up for this[0], the current patch
implements the following format for junctions:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kind: junction
sources:
- kind: git
url: example.com/flying-ponies.git
config:
# Override the "lurking bullfrogs" junction in the
# "flying ponies" project with the local project's
# "pouncing froggies" junction.
#
overrides:
lurking-bullfrogs.bst: pouncing-froggies.bst
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Additionally, this merge request:
* Removes any implicit coalescing of junctions by their junction name
* Raises errors when multiple instances of the same project get
loaded
How to allow multiple junctions to the same project
===================================================
As a result of the new project name based checks which disallow
multiple junctions to the same project from being loaded in the same
pipeline, we are now not entirely sure what the best approach is to
allow this.
What are the dangers ?
----------------------
Logically, we've been thinking that if you have elements from the
same project loaded twice, this may be problematic especially if it
is not explicitly known in some way, but why is this problematic ?
I think the best answer to this so far is if you are constructing
a system or runtime where you have some common components with some
diamond shaped dependencies of projects, you don't want to
accidentally combine different versions of the same elements and
stage them together in a sandbox.
In this case you don't want something like this:
a
/ \
b c
\ /
d
To accidentally look like this instead:
a
/ \
b c
/ \
d(1) d(2)
Interestingly, this particular danger could be solved by file overlap
errors, because whenever staging (a) we would most probably get
overlap warnings/errors when attempting to stage d(1) and d(2) to the
same directories in a sandbox.
However, overlaps are a late stage error and we would definitely want
to catch this error at a much earlier stage.
Are there any other concrete examples of problematic situations which
might be caused by loading the same project twice with possibly
differing configurations ?
Valid use cases ?
-----------------
There are a few valid use cases we've come up with; reasons why we
definitely would want to support loading the same project more than
once with possibly differing configurations.
Cross architecture bootstrapping
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When bootstrapping a runtime for a different architecture, it can
be interesting to use the same toolchain project configured
multiple times with different project options defining which host
and target architectures to build libc/gcc under.
When combining this ability with remote execution, we can
streamline the process of bootstrapping a system under any
architecture which we have runners for on the RE cluster.
Auxiliary projects which provide static build-only dependencies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When one project depends on another project for some static data
which will be consumed as build-only dependencies, the data
from the junctioned project is consumed statically as is, and there
is no concern of runtime dependencies being propagated forward to
reverse dependency projects which might consume the same junction.
Auxiliary build-only projects could be projects which build and
provide static databases, like voice recognition DBs or navigation
data, or it could be projects which build and provide tooling like
compilers which are simply used "as is" when build depended on.
Consider this illustration:
toplevel
| \
| \
| Auxiliary
|
another
| \
| \
| Auxiliary
|
baseproject
| \
| \
| Auxiliary
In the above graph, we might have multiple projects which abstract
away their requirement of a given compiler or tool in the same
"Auxiliary" project.
Here we would like to have the freedom to have that project many
times, possibly at different versions, and ideally we would like
that to be "hidden".
For instance, the baseproject knows what version of "Auxiliary" it
needs, but the "another" and "toplevel" projects should never be
forced to know about its hidden dependency "Auxiliary".
Separation of tooling and data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
While a lot of the time depending on different versions of the same
elements constitutes a danger of having file overlaps, and
uncertainty of what is being mixed into a system, this is however
not a rule by any means.
Take for instance ScriptElement derivatives like x86image, which
will stage some dependencies for one purpose and another set of
dependencies for another purpose.
Here is another illustration to consider:
bootable-image (in the toplevel project)
/ \
/ \
/ \
tools (like mkfs, syslinux) payload (apps, stuff for the image)
| |
| |
freedesktop-sdk freedesktop-sdk
On the left hand of this graph, we might have the tooling needed
in order to construct bootable images for various platforms and
filesystems.
The need for reving the left side of the above is seldom to none,
you only really need to rebuild these artifacts when you need new
features for building new filesystems or such.
On the right hand of the graph, you have the payload which is going
into the image - these are probably reving on a continuous basis,
as you probably want to build snapshot images of your system fairly
often.
In this scenario, it is perfectly fine for both freedesktop-sdk
instances to be configured differently and have different versions,
even though runtime dependencies may be propagated forward through
other intermediate projects - in the end they will be staged safely
at separate locations within the sandbox, and there will not be any
file overlap errors as a result.
Any other concrete use cases we've overlooked here ?
Solutions ?
-----------
So far we've got two approaches in mind but I think we need to
brainstorm a bit and I am hoping that people will provide some good
ideas.
Isolated junctions
~~~~~~~~~~~~~~~~~~
The idea with junction isolation is that a project can make a
statement that:
"I'm going to use this junction, and I will not propagate runtime
dependencies forward to reverse dependency projects"
This has the advantage of good encapsulation, avoiding pushing
any burden of knowledge onto reverse dependencies which simply
want to depend on your project and have things "just work" as
expected, and safely.
I think this approach would need to come with additional errors
which detect cases where runtime dependencies leak forward from
this project (possibly error messages which recommend the use of a
`compose` element to ensure encapsulation).
Enforced whitelisting
~~~~~~~~~~~~~~~~~~~~~
This is the simplest approach, and dictates that if a project
itself declares a junction to a project which appears more than
once, it must whitelist that project as a statement that
"yes I know what I'm doing".
In this case, other reverse dependencies of the project which
did whitelist it's own junction still remain free of the burden
of knowledge, unless they also want to directly junction the same
already junctioned project.
Looking at this email so far, I'm tempted to think that we might have
both of these approaches (declaring a junction as 'isolated' can
allow hiding a local junction and be more convenient, but failing
this we can still whitelist junctions in reverse dependencies).
Any additional thoughts on this subject ?
Cheers,
-Tristan
[0]: https://gitlab.com/BuildStream/buildstream/-/merge_requests/1901
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]