|
Hi Tristan,
Thanks for the input, I've addressed points below and had a few
thoughts about improving this proposal.
Proposed Solution 2.0
~~~~~~~~~~~~~~
Project configuration will provide a `manifest:` configuration for
selecting what data will be included in the manifest. (The
'manifest' may need to be renamed for clarity)
manifest:
elements:
junction:
- options
sources:
git:
- ref
- url
tar:
- url
- ref
This configuration would enable all git and tar sources to include
ref and url in the manifest as well as having junctions state their
options.
Regarding source names, I think it is best to identify sources by
index and kind since there is no field for identifying sources.
Potentially the ability to optionally name sources could be quite
useful to this feature.
Providing a manifest configuration will invoke a manifest to be
produced during a build (maybe checkout?). The manifest can most
likely also be
generated through other means such as providing `--build-manifest`
to commands such as show.
Element() and Source() will get buildstream private functions to
build their own manifest section using the
manifest configuration available through their associated project.
Side Note: Currently, config validation is performed within
Configure because each Plugin knows which config keys are valid.
If we could make this a constant that is defined by the plugins,
then validation can be performed for both manifest configuration
and before we call `Configure()` (We could avoid changes to
backwards compatibility by only performing these validation checks
when the constant is actually populated)
On 03/08/18 08:09, Tristan Van Berkom
wrote:
Problem Statement
~~~~~~~~~~~~
BuildStream currently collects a collection of .bst files to
configure and build a collection of artifacts. On a release, project
maintainers may wish to provide a manifest of build sources, which
currently means raking through a collection of .bst files for
sources.
Interestingly, I don't think we understand the same thing by the word
"manifest", although I can see how both interpretations can be
interesting.
Output manifest
~~~~~~~~~~~~~~~
The list of files produced in a given artifact, or in a set of
artifacts in a dependency tree (i.e. `--deps run` is interesting
for a target, as it includes everything which is not a build-only
dependency).
Input manifest
~~~~~~~~~~~~~~
A list of inputs which were used to create the project, this
is mostly covered by the `bst show` invocation I outlined in #235,
digging deeper than just the bst files is a bit problematic, more
below.
This is definitely worth clarifying I agree, perhaps we need a
clearer name for any potential feature in this area.
As you made out, this feature is mostly relating to the `Input
manifest` in terms of defining which sources were
used and their 'urls', 'refs' etc.
Proposed Solution
~~~~~~~~~~~
When `bst build` is supplied with an option "--build-manifest" it
will produce a YAML dictionary containing the date/time of the build,
the version of buildstream used, a collection of elements and their
sources (name, url, ref).
Here is where you hit a technical challenge.
o A source does not have a name, although it does have a position
in a list, inside a named target element (.bst file).
o It is currently impossible for BuildStream core to identify what
is the URL associated with a given Source.
The parsing of the URL and ref and such, are in the domain of the
plugins themselves, and while extending the API is possible, it is
not possible to stop supporting plugins which do not implement a
given API, the core must be able to fallback gracefully, and our
functionality is limited by what the plugins in use happen to
implement (or alternatively, the core can be made to abort
gracefully when encountering plugins which do not implement
functionality that is asked for by a given BuildStream invocation).
Plugins are guaranteed to implement only the original set of APIs,
what is guaranteed can be gleaned by observing what is optional in
the plugin facing documentation:
http://buildstream.gitlab.io/buildstream/buildstream.source.html
http://buildstream.gitlab.io/buildstream/buildstream.plugin.html
The fact that Sources mostly happen to use the key "ref" to load what
is returned and set by "Source.get_ref()" and "Source.set_ref()" from
the YAML, and that they normally use the key "url" to load the URL from
where something is downloaded, is mostly a matter of following
established precedent, but this cannot be relied upon by the core.
Some sources have no URL, some have more than one URL; git has
optionally configurable extra URLs which allow overriding of the URLs
from whence to obtain submodules... the waters are muddy around here.
That said, the ability for the core to aggregate and report things
about sources, such as their refs and URLs, has been requested before,
usually this has been discussed in the context of additional `bst show`
functionality, or a separate `bst show` like command specifically for
sources (since the `bst show` CLI interface is not very amenable to
this, a separate command might make more sense).
I see this problem as an API change definitely. In my
branch, I assumed the url/ref keywords within buildstream
core to get a quick working example but this is
not how I envisage the
final implementation.
I think that we can implement some form of API within
Element and Source to provide their own 'input manifest',
this would be implemented within all core
buildstream elements and sources and ignored for those that
don't
(With a warning that can be configured as
fatal). Realistically, this will leave manifests missing some
core sources
for some time until plugins update to support
this new API.
Existing functionality will be fully backwards compatible
and this new feature will be available to those projects using
updated plugins.
An API addition was my initial idea, this could still
work but would require more work on the side of plugin developers.
This feature will be opt-in and therefore will not change the default
behaviour of buildstream while still adding a useful feature for
those users who choose to use it.
I will say right away that I am opposed to baking this kind of
additional functionality into `bst build`.
o This would set a precedent for lumping whatever people want into
the `bst build` command, when it comes to introspecting any
information they might want to know about a built pipeline (or a
pipeline they are about to build) - this information should be
readily available through other bst commands.
o Whenever implementing a new feature, we should be getting the most
bang for our buck.
This is to say that, if you want this information after a build,
this does not mean that nobody will ever want this information at
any other given time.
Implementing this through another codepath, helps us provide a
good set of scriptable bst commands which can accommodate every
users needs, without implementing various separate codepaths
to support these in different corner cases inside BuildStream.
I agree, this should be implemented in a generic form so that
various bst commands can make use of this in one
way or another. I suggested `bst build` as it is where I initially
saw a primary use case for producing an
input manifest during builds.
This would be yet another stable API surface to maintain, and the
situation is worsened by trying to cover everybody's different use
cases for such a file in the same API. Maybe one person will want it in
JSON while another wants XML; maybe one person wants to add data that
others are not interested in, causing everyones metadata to grow in
orders of magnitude with more data they didnt ask for, all of this can
be avoided by just providing the means for you to generate the file you
want.
I realise that adding to the element/source API increases the amount
of work maintaining said API which is why I agree that we
should try and minimise this. Regarding file types, I don't think
that writing to a variety of different file types is a particularly
large workload,
from a dictionary we can get to writing each of these file types in
a fairly straight forward manner.
|