[BuildStream] Allowing duplicate junctions [Was: Be explicit when overriding junction configuration, or else warn/error]

From: Tristan Van Berkom <tristan vanberkom codethink co uk>
To: Jürg Billeter <j bitron ch>, Chandan Singh <chandan chandansingh net>
Cc: buildstream-list gnome org
Subject: [BuildStream] Allowing duplicate junctions [Was: Be explicit when overriding junction configuration, or else warn/error]
Date: Fri, 08 May 2020 16:50:49 +0900
Hi again all,

I'm reposting to the list to get some more eyes on the problem of
explicitly allowing multiple junctions to the same project to exist,
and how we want to make that happen.

To jump right in see below, first I will preface this message with the
context of what changes are in the pipeline.


Explicit overriding of junction configurations
==============================================
As discussed in previous messages, I've set out to reimplement how
shared junctions get overridden, such that you can no longer
unknowingly override a junction configuration.

There is now a WIP merge request up for this[0], the current patch
implements the following format for junctions:

  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  kind: junction

  sources:
  - kind: git
    url: example.com/flying-ponies.git

  config:
    # Override the "lurking bullfrogs" junction in the
    # "flying ponies" project with the local project's
    # "pouncing froggies" junction.
    #
    overrides:
      lurking-bullfrogs.bst: pouncing-froggies.bst
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Additionally, this merge request:

  * Removes any implicit coalescing of junctions by their junction name
  * Raises errors when multiple instances of the same project get
    loaded


How to allow multiple junctions to the same project
===================================================
As a result of the new project name based checks which disallow
multiple junctions to the same project from being loaded in the same
pipeline, we are now not entirely sure what the best approach is to
allow this.


  What are the dangers ?
  ----------------------
  Logically, we've been thinking that if you have elements from the
  same project loaded twice, this may be problematic especially if it
  is not explicitly known in some way, but why is this problematic ?

  I think the best answer to this so far is if you are constructing
  a system or runtime where you have some common components with some
  diamond shaped dependencies of projects, you don't want to
  accidentally combine different versions of the same elements and
  stage them together in a sandbox.

  In this case you don't want something like this:

               a
              / \
             b   c
              \ /
               d

  To accidentally look like this instead:

               a
              / \
             b   c
            /     \
         d(1)     d(2)

  Interestingly, this particular danger could be solved by file overlap
  errors, because whenever staging (a) we would most probably get
  overlap warnings/errors when attempting to stage d(1) and d(2) to the
  same directories in a sandbox.

  However, overlaps are a late stage error and we would definitely want
  to catch this error at a much earlier stage.

  Are there any other concrete examples of problematic situations which
  might be caused by loading the same project twice with possibly
  differing configurations ?


  Valid use cases ?
  -----------------
  There are a few valid use cases we've come up with; reasons why we
  definitely would want to support loading the same project more than
  once with possibly differing configurations.


    Cross architecture bootstrapping
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    When bootstrapping a runtime for a different architecture, it can
    be interesting to use the same toolchain project configured
    multiple times with different project options defining which host
    and target architectures to build libc/gcc under.

    When combining this ability with remote execution, we can
    streamline the process of bootstrapping a system under any
    architecture which we have runners for on the RE cluster.


    Auxiliary projects which provide static build-only dependencies
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    When one project depends on another project for some static data
    which will be consumed as build-only dependencies, the data
    from the junctioned project is consumed statically as is, and there
    is no concern of runtime dependencies being propagated forward to
    reverse dependency projects which might consume the same junction.

    Auxiliary build-only projects could be projects which build and
    provide static databases, like voice recognition DBs or navigation
    data, or it could be projects which build and provide tooling like
    compilers which are simply used "as is" when build depended on.

    Consider this illustration:

            toplevel
              |    \
              |     \
              |     Auxiliary
              |
            another
              |    \
              |     \
              |     Auxiliary
              |
            baseproject
              |    \
              |     \
              |     Auxiliary

    In the above graph, we might have multiple projects which abstract
    away their requirement of a given compiler or tool in the same
    "Auxiliary" project.

    Here we would like to have the freedom to have that project many
    times, possibly at different versions, and ideally we would like
    that to be "hidden".

    For instance, the baseproject knows what version of "Auxiliary" it
    needs, but the "another" and "toplevel" projects should never be
    forced to know about its hidden dependency "Auxiliary".


    Separation of tooling and data
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    While a lot of the time depending on different versions of the same
    elements constitutes a danger of having file overlaps, and
    uncertainty of what is being mixed into a system, this is however
    not a rule by any means.

    Take for instance ScriptElement derivatives like x86image, which
    will stage some dependencies for one purpose and another set of
    dependencies for another purpose.

    Here is another illustration to consider:


               bootable-image (in the toplevel project)
                     /                    \
                    /                      \
                   /                        \
     tools (like mkfs, syslinux)    payload (apps, stuff for the image)
                  |                          |
                  |                          |
           freedesktop-sdk             freedesktop-sdk


    On the left hand of this graph, we might have the tooling needed
    in order to construct bootable images for various platforms and
    filesystems.

    The need for reving the left side of the above is seldom to none,
    you only really need to rebuild these artifacts when you need new
    features for building new filesystems or such.

    On the right hand of the graph, you have the payload which is going
    into the image - these are probably reving on a continuous basis,
    as you probably want to build snapshot images of your system fairly
    often.

    In this scenario, it is perfectly fine for both freedesktop-sdk
    instances to be configured differently and have different versions,
    even though runtime dependencies may be propagated forward through
    other intermediate projects - in the end they will be staged safely
    at separate locations within the sandbox, and there will not be any
    file overlap errors as a result.

  Any other concrete use cases we've overlooked here ?


  Solutions ?
  -----------
  So far we've got two approaches in mind but I think we need to
  brainstorm a bit and I am hoping that people will provide some good
  ideas.


    Isolated junctions
    ~~~~~~~~~~~~~~~~~~
    The idea with junction isolation is that a project can make a
    statement that:

      "I'm going to use this junction, and I will not propagate runtime
       dependencies forward to reverse dependency projects"

    This has the advantage of good encapsulation, avoiding pushing
    any burden of knowledge onto reverse dependencies which simply
    want to depend on your project and have things "just work" as
    expected, and safely.

    I think this approach would need to come with additional errors
    which detect cases where runtime dependencies leak forward from
    this project (possibly error messages which recommend the use of a
    `compose` element to ensure encapsulation).


    Enforced whitelisting
    ~~~~~~~~~~~~~~~~~~~~~
    This is the simplest approach, and dictates that if a project
    itself declares a junction to a project which appears more than
    once, it must whitelist that project as a statement that
    "yes I know what I'm doing".

    In this case, other reverse dependencies of the project which
    did whitelist it's own junction still remain free of the burden
    of knowledge, unless they also want to directly junction the same
    already junctioned project.


  Looking at this email so far, I'm tempted to think that we might have
  both of these approaches (declaring a junction as 'isolated' can
  allow hiding a local junction and be more convenient, but failing
  this we can still whitelist junctions in reverse dependencies).


Any additional thoughts on this subject ?

Cheers,
    -Tristan


[0]: https://gitlab.com/BuildStream/buildstream/-/merge_requests/1901
Follow-Ups:
- Re: [BuildStream] Allowing duplicate junctions [Was: Be explicit when overriding junction configuration, or else warn/error]
  - From: Jürg Billeter
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]