Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A new YAML based format for "conda-build" files #54

Merged
merged 13 commits into from
Oct 30, 2023
286 changes: 286 additions & 0 deletions cep-20.1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,286 @@
# A new recipe format – part 1

<table>
<tr><td> Title </td><td> A new recipe format </td>
<tr><td> Status </td><td> Proposed</td></tr>
<tr><td> Author(s) </td><td> Wolf Vollprecht &lt;[email protected]&gt;</td></tr>
<tr><td> Created </td><td> May 23, 2023</td></tr>
<tr><td> Updated </td><td> May 23, 2023</td></tr>
<tr><td> Discussion </td><td> </td></tr>
<tr><td> Implementation </td><td>https://github.com/prefix-dev/rattler-build</td></tr>
</table>

## Abstract

We propose a new recipe format that is heavily inspired by conda-build. The main
change is a pure YAML format without arbitrary Jinja or comments with semantic
meaning.

## Motivation

The conda-build format has grown over the years to become quite complex.
Unfortunately it has never been formally "specified" and it has grown some
features over time that make it hard to parse as straightforward YAML.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one really important motivating factor that isn't mentioned in the motivation section is increasing the ease by which we can parse and modify these files (making them more machine readable friendly). I know that there is currently a desire to be able to do this, but the old format holds us back. It's very hard to parse and multiple attempts have already been made to try to do this if I'm not mistaken.

I know it's mentioned below, but it might be worth emphasizing here briefly too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1000

The CEP attempts to introduce a subset of the conda build format that allows for
fast parsing and building of recipes.

### History

A discussion was started on what a new recipe spec could or should look like.
The fragments of this discussion can be found here:
https://github.com/mamba-org/conda-specs/blob/master/proposed_specs/recipe.md
The reason for a new spec are:

- Make it easier to parse ("pure yaml"). conda-build uses a mix of comments and
jinja to achieve a great deal of flexibility, but it's hard to parse the
recipe with a computer
- iron out some inconsistencies around multiple outputs (build vs. build/script
and more)
- remove any need for recursive parsing & solving
wolfv marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursive parsing and solving was mainly to support things like inheriting specs, like run_constrained, from explicitly specified dependencies. Is there any kind of explicit spec for whether/how packages add/impose dependencies on sections outside of the one where they are specified?

What about the pin_compatible jinja? I'm getting the sense that it will be cut in this spec. I'm not sure whether you could impose limits on it to avoid the need to be recursive. It is very useful, so I hope it can stay somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In boa (and rattler-build) we have "deferred" evaluation for the pins, so that still works. We just require that the name of the pin is available, in order to build the proper topologically sorted build-graph :)

A dependency list in rattler-build contains an "enum" of type:

https://github.com/prefix-dev/rattler-build/blob/665c981a392a6ebeba5842eee1d968b8753b2dcd/src/render/dependency_list.rs#L22-L27

Indeed, pin_subpackage is not yet implemented but quite trivial to achieve.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the package name it isn't known. For example cross-compilation usage or compilers themselves

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakirkham in these cases you can infer those variables from the build-time config though (or variant config) so no recursion needed :)


## Major differences with conda-build

- no full Jinja2 support: no conditional or `{% set ...` support, only string
interpolation. Variables can be set in the toplevel "context" which is valid
YAML
wolfv marked this conversation as resolved.
Show resolved Hide resolved
- Jinja string interpolation needs to be quoted at the beginning of a string,
e.g. `- "{{ version }}"` in order for it to be valid YAML
- Selectors use a YAML dictionary style (vs. comments in conda-build). E.g. `-
sel(osx): somepkg` instead of `- somepkg # [osx]`

## Selectors

Selectors in the new spec take the following format:

`sel(unix): selected_value`
Copy link

@skupr-anaconda skupr-anaconda Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a simplified variant:
(unix): selected_value
(win and arm64)
((osx or linux) and aarch64)
(something == "test")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is also valid YAML:

Screenshot 2023-06-07 at 14 49 00

However, it would effectively special case all dictionary keys that start with a round bracket. I think that would be OK, but less explicit than the sel(...) operator.

Copy link
Member

@jakirkham jakirkham Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to bikeshed, but what about something like is(...) so is(unix)?


This is a valid YAML dictionary. Selector contents are simple boolean
expressions and follow Python syntax. The following selectors are all valid:

```
win and arm64
(osx or linux) and aarch64
something == "test"
```

### The cmp function for variant selection

Furthermore, we have a special "cmp" function that can be used to run a check
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerned that there might be more needs for "special" functions like this and that it could end up recreating Jinja2, bascially

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have "access" to jinja2 (e.g. to lowercase a string, extract a character from a string, join things etc.). This is about extending it with +1 function specific to conda matchspec comparison and variant-conditional builds (e.g. add this dependency only for python 3.8. I think it warrants a special function. I might however remove it from the proposal for now as it seems to be a point of contention ... and then we introduce a new proposal for this functioanlity.

against a selected variant version. The `cmp` function looks like the following:

```
cmp(python, "3.6")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we have to use cmp here? We're evaluating the boolean expressions and it might be easier to understand with this:

Suggested change
cmp(python, "3.6")
if: python >= "3.6"

Where it would then look like:

 requirements:
   build:
     - name: dataclasses
       if: python >=3.6 and python <3.10
     - cryptography

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that operator overloading might be harder to implement vs. the cmp function. With the cmp function we'd know that the left hand side is the variant specifier and the right hand side is a version number. We could do python >= Version("3.6") basically, but that is even trickier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe if we know that the left hand side is a VariantSpecifier we could overload the comparison with strings ... but yeah, would need to figure out if that's easy to implement with the Jinja expression system(s).

Copy link

@remkade remkade Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I also find the if: style quite a bit more intuitive. Its similar to how ansible does its conditionals which should help new people transition easier.

You could also do something like unless: python > 3.10 as well, which makes negated conditionals a bit cleaner.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on if as it is consistent with CI providers as well

Copy link
Member

@jakirkham jakirkham Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One caveat with cmp specifically is some languages use that to mean cmp(x, 3) to mean:

  • cmp(x, 3) > 0 # x > 3
  • cmp(x, 3) == 0 # x == 3
  • cmp(x, 3) < 0 # x < 3

Not to bikeshed, but maybe the critiques in the comments above (and the caveat around cmp) can be resolved with a different name? Some options:

  • within(...)
  • contains(...)
  • includes(...)

Same syntax, but maybe this already implies the conditional case evaluating to true? We could also use a different tense (contains -> contained) if preferable

cmp(python, ">=3.6")
cmp(python, ">=3.8,<3.10")
etc
```

This can be used in a selector like so:

```
requirements:
build:
- sel(cmp(python, ">=3.6,<3.10")): dataclasses
```

This functionality generalizes and replaces the previous special variables such
as `py2k`, `py3k`, `py36`, `py37`, and works just as well for NumPy, Ruby, R, or
any other variant that might be of interest in the future.
wolfv marked this conversation as resolved.
Show resolved Hide resolved

### Preprocessing selectors

You can add selectors to any item, and the selector is evaluated in a
preprocessing stage. If a selector evaluates to `true`, the item is flattened
into the parent element. If a selector evaluates to `false`, the item is
removed.

```yaml
source:
- sel(not win):
url: http://path/to/unix/source
- sel(win):
url: http://path/to/windows/source
```

Because the selector is a valid Jinja expression, complicated logic is possible:

```yaml
source:
- sel(win):
url: http://path/to/windows/source
- sel(unix and cmp(python, "2")):
url: http://path/to/python2/unix/source
- sel(unix and cmp(python, "3")):
url: http://path/to/python3/unix/source
```

Lists are automatically "merged" upwards, so it is possible to group multiple
items under a single selector:
wolfv marked this conversation as resolved.
Show resolved Hide resolved

```yaml
test:
commands:
- sel(unix):
- test -d ${PREFIX}/include/xtensor
- test -f ${PREFIX}/include/xtensor/xarray.hpp
- test -f ${PREFIX}/lib/cmake/xtensor/xtensorConfig.cmake
- test -f ${PREFIX}/lib/cmake/xtensor/xtensorConfigVersion.cmake
- sel(win):
- if not exist %LIBRARY_PREFIX%\include\xtensor\xarray.hpp (exit 1)
- if not exist %LIBRARY_PREFIX%\lib\cmake\xtensor\xtensorConfig.cmake (exit 1)
- if not exist %LIBRARY_PREFIX%\lib\cmake\xtensor\xtensorConfigVersion.cmake (exit 1)

# On unix this is rendered to:
test:
commands:
- test -d ${PREFIX}/include/xtensor
- test -f ${PREFIX}/include/xtensor/xarray.hpp
- test -f ${PREFIX}/lib/cmake/xtensor/xtensorConfig.cmake
- test -f ${PREFIX}/lib/cmake/xtensor/xtensorConfigVersion.cmake
```

## Templating with Jinja

The spec supports simple Jinja templating in the `recipe.yaml` file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth elaborating what "simple" means in this context, what wouldn't be covered by this?

Would this also work with MiniJinja for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple means:

  • Nothing that could go between {% ..., means no for loops, no set, no if / else etc.
  • Everything that can go between {{ ... }} is still supported, including the pipe operator, so that {{ version | lowercase }} and things work. They can be assigned to helper variables inside the context, e.g.
context:
  version: "0.1.2.POST1234"
  lower_version: "{{ version | lower }}"  # evaluates to 0.1.2.post1234
  • Inside the selectors, the Jinja expression language is used which only knows a few primitive types (str, int, float, ...).

This works fantastically with MiniJinja (which is what we use in rattler-build). However, MiniJinja would support much more than what we are currently supporting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to preserve the meta.yaml filename and cause conda-build or rattler-build to detect the format from its contents?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly since we are doing a bit of a redesign here, think it is worth asking whether we still want Jinja to play a role here. In other words, if we were not already using Jinja today, how would we have solved these use cases? How do others approach YAML templating? Could we think of pure YAML solutions?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, what does jinja accomplish that we can't do with the other new features?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well other new features would also have to be implemented and I am not sure we would necessary end up with something simpler.
I think there is a future where the entire build tool looks a bit different (and the recipe as well) and wolfi-os / melange have a nice approach: https://github.com/wolfi-dev/os/blob/main/libidn2.yaml

But I am not sure how to proceed here - we could either go down that road right now or build something that is more reasonable for people to port existing recipes to and then iterate in a next step on the recipe format (when we have the necessary infrastructure nicely implemented in rattler-build).


You can set up Jinja variables in the context YAML section:

```yaml
context:
name: "test"
version: "5.1.2"
major_version: "{{ version.split('.')[0] }}"
```

Later in your `recipe.yaml` you can use these values in string interpolation
with Jinja. For example:

```yaml
source:
url: https://github.com/mamba-org/{{ name }}/v{{ version }}.tar.gz
```

Jinja has built-in support for some common string manipulations.

In the new spec, complex Jinja is completely disallowed as we try to produce
YAML that is valid at all times. So you should not use any `{% if ... %}` or
similar Jinja constructs that produce invalid yaml. Furthermore, quotes need to
be applied when starting a value with double-curly brackets like so:

```yaml
package:
name: {{ name }} # WRONG: invalid yaml
name: "{{ name }}" # correct
```

Some Jinja functions remain valid, but this is out of the scope of this spec.
However, as an example, the `compiler` Jinja function will still be available,
with the main difference being the quoting of the brackets.

```yaml
requirements:
build:
- "{{ compiler('cxx') }}"
```

## Shortcomings

Since we deliberately limit the amount of "Jinja" that is allowed in recipes
there will be several shortcomings.

For example, using a `{% for ... %}` loop is prohibited. After searching through
the conda-forge recipes with the Github search, we found for loops mostly used
in tests.

In our view, for loops are a nice helper, but not necessary for many tasks: the
same functionality could be achieved in a testing script, for example. At the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that it's generally possible to move stuff to test scripts, this is IMO very disadvantageous:

  • far away from meta.yaml (if they use the "magic" test script names, they're not even mentioned and then often not up-to-date because people forget to look at them)
  • duplication between platforms (aside from the pain of writing loops in batch)

For example, it would be a real pain to rewrite the following

        {% for each_lib in core_libs + core_cpp_libs + vendored_libs %}
        # presence of shared libs
        - test -f $PREFIX/lib/lib{{ each_lib }}.so              # [linux]
        - test -f $PREFIX/lib/lib{{ each_lib }}.dylib           # [osx]
        - if not exist %LIBRARY_BIN%\{{ each_lib }}.dll exit 1  # [win]
        - if not exist %LIBRARY_LIB%\{{ each_lib }}.lib exit 1  # [win]

        # absence of static libs (unix)
        - test ! -f $PREFIX/lib/lib{{ each_lib }}.a             # [unix]
        {% endfor %}

        # binaries
        {% for each_lang in binaries_plugin_langs %}
        - test -f $PREFIX/bin/grpc_{{ each_lang }}_plugin                    # [unix]
        - if not exist %LIBRARY_BIN%\grpc_{{ each_lang }}_plugin.exe exit 1  # [win]
        {% endfor %}

        # pkg-config (no metadata for vendored libs)
        {% for each_lib in core_libs %}
        - pkg-config --print-errors --exact-version "{{ core_version }}" {{ each_lib }}
        {% endfor %}
        {% for each_lib in core_cpp_libs %}
        - pkg-config --print-errors --exact-version "{{ version }}" {{ each_lib }}
        {% endfor %}

to the point that those tests would just not be written (or rather: maintained) as well as now (I moved them out of the heavily outdated test scripts for exactly that reason).

Now, it's possible that this is still a worthwhile trade-off compared to what "pure YAML" gets us, but it will reduce the quality of our testing (at least, as someone who's added a lot of test coverage to feedstocks I got involved with, I believe it will strongly impact my output that front).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are two options:

  • We can allow Jinja (plus the values from context) inside the script text blocks. I wouldn't mind that too much since it doesn't affect the recipe execution at all.
  • You can rewrite your tests to take advantage of the new test helpers in boa. In that case it would look something like this:
      exists:
        lib:
          - gpr
          - grpc
          - grpc_unsecure
          - grpc++
          - grpc++_unsecure
          - address_sorting
          - upb
        glob:
          # absence of static libs (unix)
          - ~ **/*.a
        bin:
          - grpc_cpp_plugin
          - grpc_csharp_plugin
          - grpc_node_plugin
          - grpc_objective_c_plugin
          - grpc_php_plugin
          - grpc_python_plugin
          - grpc_ruby_plugin
        pkg_config: 
          - grpc
          - grpc++
          - grpc_unsecure
          - grpc++_unsecure
      commands:
        # final CMake test to compile examples
        - ./test_grpc.sh   # [unix]
        - ./test_grpc.bat  # [win]

Note that this is implemented in a prototype-y fashion in boa and not yet implemented in rattler, and we'd spec this out in another CEP. Of course, there is duplication of the core libs. We could think of ways of being able to refer to lists that are defined elsewhere to make that less repetitive but IMO that can come later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also hypothetically allow lists to be defined in the context, and then use some other syntax to extend other lists. For example:

context:
  core_libraries: ["grpc", "a", "b"...]

tests:
  exists:
    lib:
      - @ core_libraries

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the comment I left in the other thread was meant for both conversations - sorry if that wasn't clear.

I think it'll be manageable in the way that you propose. Being able to define lists would be nice, but isn't critical IMO.

same time we also plan to formalize a more powerful testing harness ([prototyped
in
boa](https://github.com/mamba-org/boa/blob/main/docs/source/recipe_spec.md#check-for-file-existence-in-the-final-package)).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks very nice, but it's critically missing a way to check for the absence of files (which is crucial for many recipes, especially multi-output ones).

It would actually be a nice improvement over what conda currently does to check for absence of files in the output under testing, rather than just looking at the path in the prefix. That's because a given path might be satisfied -- intentionally -- by another output; for example, it's currently not possible to check1 that in a lib + bin + bindings setup, the binary output (that's depending on the lib) does not accidentally package any library artefacts.

Footnotes

  1. without an extra output

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just came up with a syntax to test the absence (negation) of the statement by using the ~ prefix. This shoudl be fairly straightforward to add. And as you suggested, these functions would operate on the paths.json of the package (and not the environment).

But on the other hand this is a bit off topic since this is not the right CEP for that :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw. conda doesn't do anything -- what people do is write ugly bash scripts that need special knowledge to handle Windows vs. Unix ... which is why testing is generally pretty bad on conda-forge. Another thing that we want to fix :) (but off-topic).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the gitignore syntax (e.g. ! for negation)? Devs are already familiar with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ! has a special meaning in YAML so you'd have to quote it ... :) YAML is a complex spec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy if there's a way to test absence of files, syntax is secondary to me. That said:

YAML is a complex spec

I'm not deeply enough in this whole discussion, but I'd be thrilled to move to TOML as well. One thing that scares me about the sel / if discussion is that YAML breaks horrifically if you get the indentation levels wrong within a list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since many folks are already coming from Python, think significant whitespace is already an accepted thing

Not to say moving to a simpler markup language might be beneficial for us, but that seems like a separate discussion altogether. Right now Wolf is trying to just move use from 2 meta languages and 1 markup language to just 1 markup language, which would be a huge improvement over the status quo. Maybe a future discussion could be moving from 1 markup language to another (once we are fully in 1 :)


This could be used instead of a for loop to check the existence of shared
libraries or header files cross-platform (instead of relying on Jinja templates
as done
[here](https://github.com/conda-forge/opencv-feedstock/blob/2fc7848655ca65419050336fe38fcfd87bec0649/recipe/meta.yaml#L131)
or
[here](https://github.com/conda-forge/boost-cpp-feedstock/blob/699cfb6ec3488da8586833b1500b69133f052b6f/recipe/meta.yaml#L53)).

Other uses of `for` loops should be relatively easy to refactor, such as
[here](https://github.com/conda-forge/libiio-feedstock/blob/1351e5846b753e4ee85624acf3a14aee4bcf321d/recipe/meta.yaml#L45-L51).

However, since the new recipe format is "pure YAML" it is very easy to create
and pre-process these files using a script, or even generating them with Python
or any other scripting language. That means, many of the features that are
currently done with Jinja could be done with a simple pre-processing step in the
future.

## Examples

### xtensor

Original recipe [found
here](https://github.com/conda-forge/xtensor-feedstock/blob/feaa4fd8015f96038168a9d67d69eaf06a36d63f/recipe/meta.yaml).

```yaml
context:
name: xtensor
version: 0.24.6
sha256: f87259b51aabafdd1183947747edfff4cff75d55375334f2e81cee6dc68ef655

package:
name: "{{ name|lower }}"
version: "{{ version }}"

source:
fn: "{{ name }}-{{ version }}.tar.gz"
url: https://github.com/xtensor-stack/xtensor/archive/{{ version }}.tar.gz
sha256: "{{ sha256 }}"

build:
number: 0
sel(win and vc<14):
skip: true

requirements:
build:
- "{{ compiler('cxx') }}"
- cmake
- sel(unix): make
host:
- xtl >=0.7,<0.8
run:
- xtl >=0.7,<0.8
run_constrained:
- xsimd >=8.0.3,<10

test:
commands:
sel(unix):
- test -d ${PREFIX}/include/xtensor
- test -f ${PREFIX}/include/xtensor/xarray.hpp
- test -f ${PREFIX}/share/cmake/xtensor/xtensorConfig.cmake
- test -f ${PREFIX}/share/cmake/xtensor/xtensorConfigVersion.cmake
sel(win):
- if not exist %LIBRARY_PREFIX%\include\xtensor\xarray.hpp (exit 1)
- if not exist %LIBRARY_PREFIX%\share\cmake\xtensor\xtensorConfig.cmake (exit 1)
- if not exist %LIBRARY_PREFIX%\share\cmake\xtensor\xtensorConfigVersion.cmake (exit 1)

about:
home: https://github.com/xtensor-stack/xtensor
license: BSD-3-Clause
license_family: BSD
license_file: LICENSE
summary: The C++ tensor algebra library
description: Multi dimensional arrays with broadcasting and lazy computing
doc_url: https://xtensor.readthedocs.io
dev_url: https://github.com/xtensor-stack/xtensor
jaimergp marked this conversation as resolved.
Show resolved Hide resolved

extra:
recipe-maintainers:
- SylvainCorlay
- JohanMabille
- wolfv
- davidbrochart
```