[BuildStream] Revisiting artifact cache configurations
- From: Tristan Van Berkom <tristan vanberkom codethink co uk>
- To: BuildStream <buildstream-list gnome org>
- Subject: [BuildStream] Revisiting artifact cache configurations
- Date: Tue, 26 May 2020 19:24:47 +0900
Hi all,
Yesterday amidst other house cleaning activities, I started thinking we
should consider renaming the `cache-junction-elements` and
`ignore-junction-remotes` configuration and documentation[1]
surrounding these, as the docs are quite underspecified here, leaving
much to the imagination.
In an attempt to understand what these really do and why they exist, I
revisited the history around these, for which we have a lot, including
some merge requests[2][3], this original issue[4] and an interesting
mailing list post which appears to have gone unanswered[5].
Then Jürg and I discussed this on IRC[6].
The bottom line is, we've gone to great lengths to provide fine grained
flexibility at the project[7], junction[1] and user configuration[8]
levels, and what we've ended up with is very confusing, underspecified,
and in some cases impossible to tell if we are implementing these
correctly (due to lack of any specification of what "correctly" is).
I think we need to reexamine what the real use cases are, coordinate a
redesign of this configuration surface, and drive this design by real
existing use cases only, hopefully eliminating any obscurity or
confusion around how things might work.
In this mail, I will attempt to outline and rate what use cases we are
trying to address with all of this, and draft a strawman proposal for
the sake of getting some feedback here.
Use cases
=========
First I'd like to point out a fundamental fact about artifact cache
configuration: This falls strictly under the category of user
configuration data (see this link to the architecture[9]).
Artifact caches are what the user (at a workstation, or when
configuring CI) uses to build a project, they do not define the project
in any way, affect cache keys or build input/output.
* Ability to pull and push artifacts to the artifact cache server
Of course reuse of already built artifacts is the main reason
we need configuration to specify one.
* Ability to configure a prioritized list of artifact servers
It is interesting to have some redundancy of artifact servers,
where you might push artifacts to only some of them but try
pulling from others in case we can avoid builds.
This was originally discussed in the context of "branching
buildstream projects" where you might have separate artifact
servers to store branch specific artifacts, but still have
a lot of commonality with mainline such that you can still
avoid builds whenever rebasing your branch.
* Ability for the project to provide a recommendation about
artifact server(s) where you are likely to find cache hits
for artifacts related to this project.
This is clearly useful for pulling from artifact caches, especially
for the "user at desk" scenario where configuring the artifact
cache for every separate developer would be wasteful.
For the "automated CI" use case this is useful because you can
easily pull from artifact servers from deeply nested subprojects
without needing to know the artifact servers associated with those
projects.
* Ability to re-cache artifacts from junctioned project artifact
servers into your own toplevel project's artifact server.
This is an important feature as it helps to ensure:
- Projects keep all of their resources on the infrastructure they
control
- Reduction of stress on underlying infrastructure of junctioned
projects, resulting from downstream projects frequently
downloading artifacts in their own CI.
On this point, I agree with Javier[10]'s expectations and think
that this should be the default behavior
* Ability to distrust artifact servers recommended by subprojects
In some cases, one might want to ensure that they do not
download artifacts from external or untrusted sources, ensuring
that everything is built on your own infrastructure.
Does this pretty much cover the actual use cases of configuring
artifact cache servers ?
Opinions and problems
=====================
Before jumping into my draft proposal, I'll express some opinions drawn
from the use cases and current architecture, which should help inform
the conversation.
For the last two listed use cases, i.e. what is currently expressed as
`cache-junction-elements` and `ignore-junction-remotes`
configurations[1] on junctions, I would argue that:
* We don't need per project granularity here.
It is okay to either trust all subproject artifact caches or not,
and likewise it is fine to just upload all subproject artifacts
to the same artifact cache (whether it be recommended by your
toplevel project.conf or by user configuration) or not.
* We don't need to care about subproject opinions about these two
points either, we only care about the toplevel project being
built and how it recommends going about building.
* The fact that these two configurations live so far away from
the `artifacts` section of `project.conf`[7] is disquieting, as
it cannot be expressed or documented in the same place.
We should be able to achieve similar configuration in
`project.conf` (even if we wanted to retain fine grained
selectivity about which subproject recommended artifact
servers to trust, or migrate artifacts from into our own
server).
* This project defined recommendation of user configuration
actually cannot be reasonably overridden in user configuration.
Another point is that it is fairly rare that "user at desk" scenarios
involve pushing to artifact servers; in most cases artifact servers are
populated by CI and only downloaded in "user at desk" settings; this
means that it is arguably unimportant to configure recommendations of
push URLs in a project.conf.
I'm not sure however that removing this ability improves the codebase
or the user experience (provided that the project.conf format matches
the buildstream.conf format verbatim), this ability seems to come at a
low cost in complexity.
Draft Format Proposal
=====================
In a nutshell, I propose that we remove junction level configuration[1]
of artifact server behaviors completely, and replace this with a
project.conf recommended behavior which can be overridden by
buildstream.conf.
For the project and user configuration, I would suggest that we break
the format such that `artifacts` becomes a dictionary with a `servers`
list, adding the two extra configurations on that dictionary, e.g.:
#
# Artifacts
#
artifacts:
#
# Whether to try pull artifacts from artifact
# caches recommended by subprojects.
#
pull-subproject-artifacts: True
#
# Whether to push subproject artifacts into
# the following servers.
#
push-subprojects-artifacts: True
#
# Same old server list as before, this is a per project hint
# of where to pull artifacts from (or perhaps push to if you
# have credentials).
#
servers:
- url: https://artifacts.com/artifacts:11001
server-cert: server.crt
- url: https://artifacts.com/artifacts:11002
server-cert: server.crt
client-cert: client.crt
client-key: client.key
push: true
When expressed in project configuration, I would clarify that the two
added settings are only considered as a default behavior when building
this project as a toplevel project, which may be overridden by user
configuration.
Further, I would propose that we clarify in the user configuration
documentation that the per-project settings regarding artifact caches
are NOT taken into consideration separately when building a project
which junctions another project in your list.
For example:
#
# My buildstream.conf
#
projects:
foo:
artifacts:
pull-subproject-artifacts: False
bar:
artifacts:
pull-subproject-artifacts: True
If there is a junction between `foo` and `bar`, in either direction,
there is no conflict here to resolve or differing behavior on a per
project basis: Only the toplevel project is ever considered to
determine artifact caching behaviors here.
Any thoughts ?
Cheers,
-Tristan
[1]: https://docs.buildstream.build/master/elements/junction.html
[2]: https://gitlab.com/BuildStream/buildstream/-/merge_requests/1403
[3]: https://gitlab.com/BuildStream/buildstream/-/merge_requests/1759
[4]: https://gitlab.com/BuildStream/buildstream/-/issues/401
[5]: https://mail.gnome.org/archives/buildstream-list/2019-June/msg00049.html
[6]: https://irclogs.baserock.org/buildstream/%23buildstream.2020-05-25.log.html#t2020-05-25T07:58:23
[7]: https://docs.buildstream.build/master/format_project.html#artifact-server
[8]: https://docs.buildstream.build/master/using_config.html#artifact-server
[9]: https://docs.buildstream.build/master/arch_data_model.html#context
[10]: https://gitlab.com/BuildStream/buildstream/-/issues/401#note_74690536
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]