Re: Project options and other format enhancements (and dropping "variants")

From: Tristan Van Berkom <tristan vanberkom codethink co uk>
To: Sam Thursfield <sam thursfield codethink co uk>, buildstream-list gnome org
Subject: Re: Project options and other format enhancements (and dropping "variants")
Date: Sat, 16 Sep 2017 02:22:35 -0400

On Fri, 2017-09-15 at 15:53 +0100, Sam Thursfield wrote:

Hi Tristan,

In general I agree with your premises, and I think the proposal is 
workable. I don't have anything better to propose.

    Option Declaration
    ------------------
    A project declares which options are valid for the
    project in the project.conf.

    These options should have some metadata which can be used
    to declare the defaults, assert valid values of the options,
    and also a description string which the CLI can use to communicate
    the meaning of project options to buildstream users (not all users
    building a project wrote the project.conf).


Are you expecting to support only enum values, or freeform strings and 
integers too?


I certainly dont want to support only enums, although I was undecided
on whether the data would simply be in string form and have conditional
statements deal with typing; or to have typing encoded into the option
metadata.

One idea I am liking more and more is the (contains list_opt "string")
kind of conditional, where one could check for the presence of a word
in a whitespace separated wordlist, which one could use to whitelist
elements for a feature or to test for a single feature in a list (like
the compiler tuning example I made in my reply to Sander).

<slight deviation from topic>

As you've been looking into bootstrapping compilers with BuildStream
maybe you can shed some light on what we could do for this, because I
feel my approach doesnt solve it perfectly.

From what I understand, currently we can only single case symbolic

machine names and make a huge list of full tunings flags depending on
that symbolic name. This is an area I think yocto excels at and I would
like to have a solution that allows enough flexibility for this (of
course without being shell scripts which execute and source eachother).

At the basic level, maybe this could be done by allowing a project to:

  A.) Define symbolic names, maybe they are "macros" or "presets"
  B.) The symbolic preset defines values for options
  C.) Write conditionals based on the options

It's just the first approach that comes to mind, but would allow us to
define feature lists associated to symbolic machine names, and then
write conditional YAML fragments based on the resulting feature sets
instead of having to special case every machine name individually.

Any other ideas ?

Implementing a solution to this should not block our landing of a
project options feature; however, our approach should probably be
informed by if/how we intend to address this kind of complex case.

</slight deviation from topic>

This sounds quite similar to Meson's option system[1], which is probably 
a good sign.

1. http://mesonbuild.com/Build-options.html

    Format Enhancements
    -------------------
    I would propose we add some special tokens which can be used at any level
    of a buildstream element.bst file, or also in some specific parts of the
    project.conf (since project.conf is declaring options, we cannot conditionalize
    that part)

    Below are my ideas for the '>>', '<<', '==', '??' and '!!' operators.


On one hand this is ugly as sin because YAML describes itself as a "data 
serialization standard", where self-modification shouldn't really be a 
thing. On the other hand, it already contains two special operators ('&' 
and '*', which are effectively "copy" and "paste") and in the interests 
of being "human friendly" there is definitely justification for allowing 
more syntax sugar.


Right, asides from this we already sort of break that rule because the
loaded yaml dictionaries are post processed and composited multiple
times.

As I've said else where I've migrated from:

  '??':
    condition:
    ...

to:

  (?):
    condition:
    ...

I feel like it will stand out more and I dont like the quotes. That
said I'm open to changing the conditionals to something more
conforming, if we really expect that the result is going to be more
legible.

For the other (>), (<) and (=) tokens, I dont see any way around it;
it's already become an annoyance that you cannot extend arrays in
'split-rules' but are forced to override them; we need some format if
we want to let the user decide about append/prepend/override (and of
course, this removes the need for post/pre commands everywhere as added
sugar).

That said it's not that evil, we are just deserializing dictionaries
which bear meaning about how they are to be composited against other
deserialized dictionaries.

I think the symbols you've chosen are pretty good, I like that they 
don't look anything like normal text so just glancing at a .bst file 
should set off alarm bells of "this isn't just a list of dictionaries, 
there's extra processing being done" and hopefully the reader will head
for the documentation.


It's worth a quick review of existing solutions in this area. There are 
some processing/filtering tools:

* jq -- https://stedolan.github.io/jq/ -- CLI tool for running filters

on JSON-serialized data, which supports all kinds of manipulations, 
path-based access, and conditionals

* xlst -- https://www.w3.org/standards/xml/transformation -- similar but

for XML and is about as horrific to use as you would expect

And also formats that support variants / self-modification:

* Ansible Playbooks -- 
https://docs.ansible.com/ansible/latest/playbooks.html -- mixes Jinja2 
templates with YAML, to provide variable substitution and conditionals 
using Jinja's expression syntax

* jsonnet -- http://jsonnet.org/ -- extends JSON to add an expression

syntax based on JavaScript (although specified independently)


Nod, the conditionals themselves are a preprocessing step and it's
indeed possible to instead generate YAML from "buildstream format",
I've given that a small amount of thought... honestly not much.

There are probably more things that I'm missing.

    The '??' expression format
    ~~~~~~~~~~~~~~~~~~~~~~~~~~
    So at first I was thinking what this would look like as a pure
    YAML format, but it looks like it will be way too verbose for
    expressing simple comparisons.

    Example:


        variables:
        '??':
          condition:
            kind: ifeq
            args:
              option: debug
              value: on
          then:
            conf-extra: --enable-debug
          else:
            conf-extra: --disable-debug


This could be abbreviated to be on one line:

     variables:
     '??':
       condition: { kind: ifeq, args: { option: debug, value: on } }
       then: "conf-extra: --enable-debug"
       else: "conf-extra: --disable-debug"

It's not super readable though.

    Later I thought maybe we do our own parsing of strings like
    `ifeq(option, value)`, but that also becomes a little unwieldy, hard
    to maintain and extend to support compound expressions.

    So what I'm leaning towards now is to create a simple expression
    format based on S-Expressions, this way the same expression above would
    just look like:


        '??':
          condition: (ifeq "debug" "on")
          then:
            conf-extra: --enable-debug
          else:
            conf-extra: --disable-debug


    This is especially nice once you want to do anything a bit more
    complex, the following would be a lot more verbose to express if
    it were in YAML:


        '??':
          condition: |

            (and (ifeq "logging" "off") (ifeq "debug" "on"))

          then:
            ... value ...
          else:
            ... value ...


    The S-Expressions are fairly easy to parse and there is a python
    library for that (http://sexpdata.readthedocs.io/en/latest/).


I have to admit I was surprised to see Greenspun's 10th rule borne out 
here :-)


As they say... nobody expects the Spanish Inquisition!


Honestly though, there's probably some good reasons why half baked
lisp-like syntaxes get reused a lot. In this case for instance the
parser in python is ~600 loc (heavily docstringed) - not being full
blown lisp or something is not a bad thing when what you need is much
less; I can easily turn around and throw away / replace a dependency
like this.

This could be workable, but the choice of S-Expressions is risky. I 
think those familiar with Lisp will be unhappy that our implementation 
doesn't match up with their preferred dialect of Lisp, and those 
unfamiliar with Lisp will think "what on earth am I looking at".


Ok so frankly I'm not attached to parenthesis nested lists esthetically
 speaking, I am however quite attracted by the simplicity of it, we
probably can achieve similar simplicity with something else.

I'm not convinced that a:

   condition: |

     %{foo} == "bar"

kind of notation is simple though; it tries to be very human friendly
and programing languagey, and then leaves us a bit blind if we want to
later extend the operators, what would we use for the kind of word in
list 'contains' conditional ? (maybe we do like python sets and use C
bitwise field test operators on them ?).

This operator comparison expression approach is also more rigid and
demanding, it would have to be done perfectly the first time.

We dont have a nice relaxed namespace in which we can deprecate the
'ifeq' symbol for a new 'equals' symbol on the day that we figure out
that comparisons should have been case insensitive, we might instead be
in a corner left with yucky workaround alternatives, advising users
that they should use the '===' operator in new projects, instead of the
existing but botched and unrecommended '==' operator.


Just to illustrate the simplicity of the nested lists (which can be
expressed with S expressions, but certainly also other formats), I
wrote up the following in really not too much time the other day:

    https://bpaste.net/show/553780fab83d

(attached as well as testsexp.py but pasted in case the list eats my
attachments).

Without much code at all it already supports constructs such as:

  (or option1 option2 ...)
  (and option1 option2 ...)
  (ifeq option1 "pony")
  (ifeq option1 option2)
  (and option1 (ifeq option2 "pony"))


Any format that gives us structured data is plausible to implement the
same way, and while I wouldnt mind the format to be different; I
wouldn't like to end up maintaining or depending on something more
complex just because of some silly stigma attached to parenthesis
nesting.


Cheers,
    -Tristan

One option is to reuse Jinja2 expressions, as they are quite 
Python-like, and are already used in Ansible. The jinja2 library looks 
flexible enough that we could set up an execution environment and just 
evaluate Jinja expressions[2] to produce values, rather than using the 
full templating functionality.

2. http://jinja.pocoo.org/docs/2.9/templates/#expressions

This could get us something like this:

            condition: |
               %{logging} == "off" and %{debug} == "on"
            then:
              ... value ...
            else:
              ... value ...

I would prefer if we could use True and False for booleans rather than 
the strings "off" and "on". That way we could get to this:

            condition: %{logging} and %{debug}
            then:
              ... value ...
            else:
              ... value ...



Some rambling to finish ... does anyone remember BuildJ [3]? It was a 
project to replace Autotools/CMake/etc with a declarative JSON/YAML 
format. Started out promisingly, but faltered before completely figuring 
out conditionals. Hopes for replacing Autotools and CMake then 
gravitated towards Meson, which is pretty much on track for success at 
this point. Rather than using YAML, Meson defines a Python-esque DSL for 
build instructions which is deliberately not Turing complete (no loops 
or functions) and can be parsed in a few 1000 lines of Python.

I used to be disappointed that Meson didn't use a "declarative" approach 
but I actually find it fine to work with now. I like YAML because it's 
always possible to reliably parse it, unlike a Turing-complete 
programming language such as Shell or BitBake. But Meson's language also 
ticks that box. It is also apparently designed to be re-writable so that 
IDEs can make changes to hand-written meson.build files, although I'm 
yet to see how well that works. If it does, I'd be interested in what 
BuildStream would look like if it abandoned YAML for a similar 
Python-like DSL. Of course, this is not at all BuildStream 1.0 
territory, but something to think on :-)



[3] https://wiki.gnome.org/Attic/BuilDj

Sam

Attachment: testsexp.py
Description: Text Data

Follow-Ups:
- Re: Project options and other format enhancements (and dropping "variants")
  - From: Sam Thursfield

References:
- Project options and other format enhancements (and dropping "variants")
  - From: Tristan Van Berkom
- Re: Project options and other format enhancements (and dropping "variants")
  - From: Sam Thursfield

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]