Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would like to specify that a list of lists should be flattened #339

Open
acampove opened this issue Oct 5, 2024 · 9 comments
Open

Would like to specify that a list of lists should be flattened #339

acampove opened this issue Oct 5, 2024 · 9 comments

Comments

@acampove
Copy link

acampove commented Oct 5, 2024

Hi,

I have a file like:

mammals: &mammals
  - lion
  - tiger
  - elephant

reptiles: &reptiles
  - snake
  - lizard
  - crocodile

animals:
  - *mammals
  - *reptiles

In the example above, animals is not a single list, but two and we would have to flatten this in the code. It would be good if we could somehow specify that these lists have to be flattened, e.g. with - **mammals.

@UnePierre
Copy link

Another syntax could follow that for appending maps, like:

mammals: &mammals
  - lion
  - tiger
  - elephant

reptiles: &reptiles
  - snake
  - lizard
  - crocodile

animals:
  <<:
    - *mammals
    - *reptiles

@ingydotnet
Copy link
Member

I'd like to present YAMLScript as a solution for this.
YAMLScript embeds cleanly into YAML and is 100% valid YAML itself.
It can solve your concat need and about anything else for extending YAML.
It currently has YAML loaders in 10 languages: https://yamlscript.org/doc/bindings/

For brevity I'll use the flow form where appropriate. It's the same data and semantics.

mammals: &mammals [lion, tiger, elephant]
reptiles: &reptiles [snake, lizard, crocodile]
animals: [*mammals, *reptiles]

There are many ways to accomplish this in YAMLScript (YS).
Here's one:

# File: 339.yaml
!yamlscript/v0:
mammals: &mammals [lion, tiger, elephant]
reptiles: &reptiles [snake, lizard, crocodile]
animals::
  apply concat::
  - *mammals
  - *reptiles

Then we load load it from the command line with:

$ ys -Y 339.yaml
mammals:
- lion
- tiger
- elephant
reptiles:
- snake
- lizard
- crocodile
animals:
- lion
- tiger
- elephant
- snake
- lizard
- crocodile

The !yamlscript/v0: line is needed for YS to eval any code in your file. It starts out in data mode, where plain scalars are strings or numbers. The :: toggles the k/v pair between data mode and code mode. Thus the apply concat key is a code expression and it's value would have been code too but we toggled back with another ::.

Let's make it a little better.

!yamlscript/v0:
mammals: &mammals [lion, tiger, elephant]
reptiles: &reptiles [snake, lizard, crocodile]
animals: !:concat*
- *mammals
- *reptiles

does the same thing. You can put a function call in a tag. The function is applied to the node. The trailing * splats the sequence so that the ys::std/concat function is called like concat(*mammals *reptiles) and not concat([*mammals *reptiles]). You can use 100s of standard functions that way.

Let's look at how this file compiles:

$ ys -Uc 339.yaml
{"mammals" (_& 'mammals ["lion" "tiger" "elephant"]),
 "reptiles" (_& 'reptiles ["snake" "lizard" "crocodile"]),
 "animals" (apply concat [(_** 'mammals) (_** 'reptiles)])}

It's often the case that you have YAML where you want to concat or merge anchored structures like &mammals and &reptiles but you only care about the animals result. Unfortunately mammals and reptiles needs to be part of the document to use them, and thus they end up in the result too. YS let's you do this:

--- !yamlscript/v0:
- &mammals [lion, tiger, elephant]
- &reptiles [snake, lizard, crocodile]
--- !yamlscript/v0:
animals: !:concat*
- *mammals
- *reptiles

YS only loads the final doc by default, but can access anchors in other docs. So now you only get animals:

$ ys -Y 339.yaml
animals:
- lion
- tiger
- elephant
- snake
- lizard
- crocodile

You could also use load to have the data in other files:

# 339.yaml
--- !yamlscript/v0:
- &mammals ! load('mammals.yaml')
- &reptiles ! load('reptiles.yaml')
--- !yamlscript/v0:
animals: !:concat*
- *mammals
- *reptiles

# mammals.yaml
[lion, tiger, elephant]

# reptiles.yaml
[snake, lizard, crocodile]

We don't need to use anchors and aliases to name things, we can use variables:

--- !yamlscript/v0
mammals =: load('mammals.yaml')
reptiles =: load('reptiles.yaml')
--- !yamlscript/v0:
animals: !:concat*
- ! mammals
- ! reptiles

Note the first tag changed from !yamlscript/v0: to !yamlscript/v0. That starts the document in code mode.

Data mode allows you to specify variable assignments inline, so we can go back to a single document:

--- !yamlscript/v0:
mammals =: load('mammals.yaml')
reptiles =: load('reptiles.yaml')
animals: !:concat*
- ! mammals
- ! reptiles

There are other ways to call concat here:

--- !yamlscript/v0:
mammals =: load('mammals.yaml')
x =: &R load('reptiles.yaml')
animals:: concat(mammals *R)
animals:: mammals.concat(*R)
animals:: mammals + *R

Just to show how flexible YS is...

Data mode also has a auto insert functionality for data mode sequences:

--- !yamlscript/v0
mammals =: load('mammals.yaml')
x =: &R load('reptiles.yaml')
--- !yamlscript/v0:
animals:
- aardvark
- :: mammals
- mastadon
- :: *R
- zebra

which produces:

animals:
- aardvark
- lion
- tiger
- elephant
- mastadon
- snake
- lizard
- crocodile
- zebra

The cool thing about this is you can put conditional code on these inserts. when rand(100) > 50 evaluates to true half the time:

--- !yamlscript/v0
mammals =: load('mammals.yaml')
x =: &R load('reptiles.yaml')
--- !yamlscript/v0:
animals:
- aardvark
- :when rand(100) > 50: mammals
- mastadon
- :when rand(100) > 50: *R
- zebra

This is probably not useful but proves the point. The conditional insert here is shorthand for:

- aardvark
- ::
    when rand(100) > 50:
      mammals

The code starting with when could be any code at all. It it evaluates to a sequence, the sequence is inserted. If evals to nil nothing happens.

Let's compile it:

(def mammals (load "mammals.yaml"))
(+++ (def x (_& 'R (load "reptiles.yaml"))))
{"animals"
 (concat
  ["aardvark"]
  (when (> (rand 100) 50) mammals)
  ["mastadon"]
  (when (> (rand 100) 50) (_** 'R))
  ["zebra"])}

All the :: stuff also works in mappings too for inline merging, conditional or not. (iow, the YS way to do << stuff in mappings)

So back to your original. You can also do:

--- !yamlscript/v0:
mammals: &mammals
  - lion
  - tiger
  - elephant

reptiles: &reptiles
  - snake
  - lizard
  - crocodile

animals:
  - :: *mammals
  - :: *reptiles

@ingydotnet
Copy link
Member

See also: https://yamlscript.org/posts/2024-11-29/

@UnePierre
Copy link

This is so cool! 😎

@ingydotnet
Copy link
Member

ingydotnet commented Dec 2, 2024

@UnePierre Thanks. I think so too. tbh, I actually did like your idea for extending << for sequences. Well done. Usually these type of suggestions aren't even close to possible. :) That said I think YS doesn't need it, given the other affordances.
The << (key for mappings) isn't even supported by the current spec, but I get that people like it (thus YS supports it too).

If you are keen, I encourage you to try this stuff out and file any issues you have at https://github.com/yaml/yamlscript/issues.
A lot of the data features are new and if something seems like it should work but doesn't its probably our fault!

Looking forward to what you find...

@acampove
Copy link
Author

acampove commented Dec 3, 2024

Hello @ingydotnet . Thanks for the work you and others in YS have done. I need to go through it carefully as the need arises. Two questions come to my mind though.

  • We normally see YAML as a human friendly way of storing data and configs. This data is supposed to be loaded by our actual code. YS is a programming language though? I am thinking that there is no easy way to do what I suggested at the beggining of the thread without making this somewhat of a language. Have you heard of JINJA? What would be the advantage of using YS over JINJA?

  • Been a new language, it has to come with its own syntax and therefore code editors like neovim need to somehow support it through plugins. Has there been any work in that direction? It would be easier to use YS if the editor highlights everything correctly. This is one of the main reasons why I do not use JINJA. I see that here there are already highlighting issues:

--- !yamlscript/v0:
mammals =: load('mammals.yaml')
x =: &R load('reptiles.yaml')
animals:: concat(mammals *R)
animals:: mammals.concat(*R)
animals:: mammals + *R

@ingydotnet
Copy link
Member

@acampove as I said at the start of my long comment, YS can be used as a loader from (currently 10) languages. Adding a language is simple, btw. I don't know what programming language you typically load YAML from but if it is Python there's a full example in https://yamlscript.org/doc/bindings/. You can just replace PyYAML usage in your programs with Python's yamlscript.py modules. Same story for any other language.

YS syntax is 100% YAML. YS is a programming language, but also all your existing YAML files are (almost certainly) valid YS "programs". When you "run" a program you are actually "compiling" it and then "evaluating" the result. Since your existing YAML files don't start with !yamlscript/v0 they can't / won't invoke any functions, but "loading" then still compiles/evaluates them and results in the data structure a yaml loader would produce.

The YS compiler is literally a YAML loader that loads to a YS AST. Consider:

$ ys -e 'each i (1 .. 3): say("$i) Hello")'
1) Hello
2) Hello
3) Hello

you can compile it like:

$ ys -c -e 'each i (1 .. 3): say("$i) Hello")'
(each [i (rng 1 3)] (say (str i ") Hello")))

but if you know how a yaml loader works it has steps parse->compose->resolve->construct
The -d flag will show that process in the YS compiler:

$ ys -c -d -e 'each i (1 .. 3): say("$i) Hello")'
*** parse     *** 0.130113 ms

({:+ "+MAP", :! "yamlscript/v0/code"}
 {:+ "=VAL", := "each i (1 .. 3)"}
 {:+ "=VAL", := "say(\"$i) Hello\")"}
 {:+ "-MAP"}
 {:+ "-DOC"})

*** compose   *** 0.00416 ms

{:! "yamlscript/v0/code",
 :% [{:= "each i (1 .. 3)"} {:= "say(\"$i) Hello\")"}]}

*** resolve   *** 0.043306 ms

{:xmap [{:expr "each i (1 .. 3)"} {:expr "say(\"$i) Hello\")"}]}

*** build     *** 0.291258 ms

{:xmap
 [[{:Sym each} {:Sym i} {:Lst [{:Sym rng} {:Int 1} {:Int 3}]}]
  {:Lst [{:Sym say} {:Lst [{:Sym str} {:Sym i} {:Str ") Hello"}]}]}]}

*** transform *** 0.043876 ms

{:xmap
 [[{:Sym each} {:Vec [{:Sym i} {:Lst [{:Sym rng} {:Int 1} {:Int 3}]}]}]
  {:Lst [{:Sym say} {:Lst [{:Sym str} {:Sym i} {:Str ") Hello"}]}]}]}

*** construct *** 0.0865 ms

{:Top
 [{:Lst
   [{:Sym each}
    {:Vec [{:Sym i} {:Lst [{:Sym rng} {:Int 1} {:Int 3}]}]}
    {:Lst
     [{:Sym say} {:Lst [{:Sym str} {:Sym i} {:Str ") Hello"}]}]}]}]}

*** print     *** 0.010697 ms

"(each [i (rng 1 3)] (say (str i \") Hello\")))"

(each [i (rng 1 3)] (say (str i ") Hello")))

So you can see it literally is a yaml loader.

A yaml file that is just data is handled the same way:

$ ys -md -le 'foo: [1, 2, 3]'
{"foo":[1,2,3]}
$ ys -md -Uce 'foo: [1, 2, 3]'
{"foo" [1 2 3]}
$ ys -md -d -Uce 'foo: [1, 2, 3]'
*** parse     *** 0.17216 ms

({:+ "+MAP", :! "yamlscript/v0/data"}
 {:+ "=VAL", := "foo"}
 {:+ "+SEQ", :flow true}
 {:+ "=VAL", := "1"}
 {:+ "=VAL", := "2"}
 {:+ "=VAL", := "3"}
 {:+ "-SEQ"}
 {:+ "-MAP"}
 {:+ "-DOC"})

*** compose   *** 0.010318 ms

{:! "yamlscript/v0/data",
 :% [{:= "foo"} {:-- [{:= "1"} {:= "2"} {:= "3"}]}]}

*** resolve   *** 0.056601 ms

{:map [{:str "foo"} {:seq ({:int "1"} {:int "2"} {:int "3"})}]}

*** build     *** 0.024117 ms

{:Map [{:Str "foo"} {:Vec [{:Int 1} {:Int 2} {:Int 3}]}]}

*** transform *** 0.013991 ms

{:Map [{:Str "foo"} {:Vec [{:Int 1} {:Int 2} {:Int 3}]}]}

*** construct *** 0.090168 ms

{:Top [{:Map [{:Str "foo"} {:Vec [{:Int 1} {:Int 2} {:Int 3}]}]}]}

*** print     *** 0.014071 ms

"{\"foo\" [1 2 3]}"

{"foo" [1 2 3]}

Of course I know about Jinja. Various groups have added things to on top of YAML including using Jinja.
These efforts often come with their own sets of problems, only add a limited amount of capability and don't help YAML overall for all use cases.

Jinja is a templating language and YAML doesn't lend itself well to templating. Jinja can do basic looping, conditionals and interpolation, but in a way that is not YAML itself.

YS can do anything a language can do. Its interpolation syntax is great. It is intended to make logic in YAML available to all YAML uses, and it concentrates hard on making the use of logic not clutter your YAML. And the files are always 100% YAML, thus can make use of YAML tooling like yamllint.

If you have a specific Jinja'd YAML file you'd like to see as YS for comparison, I'd be happy to show you.

FWIW, Kubernetes' Helm uses a templating language that is not Jinja but looks close.
I have a conversion comparison here: https://yamlscript.org/doc/helmys/
It does exactly the same thing before and after and I think the YS there speaks for itself.


Syntax highlighting is not yet done for YS, but its an easy solve. We just need to implement an LSP server for it and that's not particularly hard. If you or anyone reading this has time and is interested I'd be glad to assist you in building one. Otherwise I expect it to happen sometime in 2025.

@acampove
Copy link
Author

@ingydotnet Hi, I tried to use this project, but:

  • I do not see a short and simple thing I can follow, I see this, which has mostly stuff I do not care about.
  • I think I tried to install this a while a go, but this seems to rely on c++ code and I vaguely remember I was trying to build some library, but It took too long to figure out how to do it, so I gave up and moved on.

From the user's point of view, the user will want something like:

pip install yamlscript
import ys

dictionary=ys.load_file('file.yaml')

and other small and simple examples. There are already other libraries out there to deal with YAML that most users know how to use. If this tool is not easy to use, people will not use it and you will have wasted years working on a tool that no one cares about.

@ingydotnet
Copy link
Member

@acampove,

In a Python 3 venv like this:

$ python -m venv venv
$ source venv/bin/activate

Try:

$ pip install yamlscript
$ curl -s https://yamlscript.org/install | LIB=1 bash
$ python -c 'from yamlscript import YAMLScript as ys; print(ys().load("foo: bar"))'
{'foo': 'bar'}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants