Re: Stop the train ! Caching build trees is going to be too big





On Fri, Apr 27, 2018 at 11:10 AM Tristan Van Berkom <tristan vanberkom codethink co uk> wrote:
Hi all.

This is just a quick email to raise a problem early, make sure that we
adjust our expectations, take a pause and fix a big flaw in our plan.

So, last week we identified that, it is not going to be realistic to
blindly cache build trees, because VCS data tends to cost a damn lot of
disk space (feel free to substitute "damn lot" with less family
friendly wording for dramatic effect).

For this, we opened this issue to block it:

    https://gitlab.com/BuildStream/buildstream/issues/376

But the buck doesn't stop here, unfortunately.

For example, my workspace directory for WebKit (from a *tarball*, with
no VCS data added), costs me 5.8GB of disk space after a build. This is
only the source code we mean to build, plus the resulting object files.
The object files in the `_build/` subdirectory cost 5.6GB, so the
source code is only a couple of hundred MB.

To put this in perspective; when we started building GNOME against a
debian sysroot runtime, which costed about 3GB, it was quite annoying
because it takes a *damn long time* to download the base runtime before
we even start building.

Introducing a 5.8GB download for a prebuilt WebKit artifact is just not
gonna fly, we cannot start introducing these downloads into the build
process.

Ugh, yes, we cannot unconditionally introduce this overhead without
any added value.
 
What I propose that we do, is the following:

  * Split artifact keys in two:

    * The regular artifact remains "${project}/${element}/${key}"

    * The cached build tree is addressable as
      "${project}/${element}/${key}/build"

    * Alternatively, we split the artifact into metadata, logs,
      output and build components, this remains to be discussed
      and analyzed.

I would prefer us to take a slightly different approach than storing
under different keys, and instead store a "BuildStreamArtifact"
message under the key.  That can then be used to download
the different elements of the artifact.
This is a similar approach to the ActionResult stored in the
BuildFarm ActionCache.
 
  * Uploading of the build tree to artifact shares remains mandatory

    * We should ensure integrity of artifact share servers

    * In the usual cases, regular users do not contribute to artifact
      shares anyway, automated build servers do this part

  * Downloading the build tree of an artifact must only ever be done
    on demand

    * We could have an option to force download all the sources if
      we expect to need them later for offline work, but this is
      not mandatory in order to land the feature I think

    * The build trees are only useful for a subset of purposes:

      - opening workspaces in a state ready for an incremental build
      - running a `bst shell` with all of the element's dependencies
        source code and built objects "in tree", such that debugging
        experience can be much more powerful in a `bst shell`

      In both of these cases, I think it even makes sense to have the
      download optional - I might rather enjoy opening a workspace
      on WebKit *right now* instead of waiting for a 5.8GB download.

Things are probably not as apocalyptic as I'm making them sound, but we
have to keep in mind that:

  * Dramatic effect is just super FUN !

It is...  I do caution that with many different backgrounds drama there is
definitely room for misinterpretation :).  I like that you're calling this out
to prevent that from happening.
 
  * People are already working on this feature and related features,
    so we need them to pause, think and strategize.

So I don't want people to panic, but please be understanding that we've
hit some roadblocks, reality has struck and we have to adapt to that.

Yup.
 
Any thoughts about the proposed plan for splitting up the artifact
cache into separate addressable units, and making the downloads
optional ?

+1 on splitting it up.
 
Cheers,
    -Tristan

Cheers,

Sander
 
_______________________________________________
Buildstream-list mailing list
Buildstream-list gnome org
https://mail.gnome.org/mailman/listinfo/buildstream-list
--

Cheers,

Sander


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]