Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Syntax Bikeshedding Dojo, round 10: Symbolic strings #1039

Closed
yannham opened this issue Jan 11, 2023 · 16 comments
Closed

The Syntax Bikeshedding Dojo, round 10: Symbolic strings #1039

yannham opened this issue Jan 11, 2023 · 16 comments
Labels
area: syntax question Further information is requested

Comments

@yannham
Copy link
Member

yannham commented Jan 11, 2023

#994 introduced symbolic strings, as described in #948. The current syntax is temporary, using an s prefix, which is not very specific (could mean special, string, symbolic, etc.). Let's try to find a better surface syntax.

The team has proposed to support a different prefix depending on the use-case. For example, nix%"{inputs.coreutils}/bin/bash"% or tf%"{resource-foo.id}"%. Two questions follow, mostly orthogonal:

Prefix syntax

  • The examples above use an arbitrary prefix combined with the current syntax for multiline string. This is one possibility: we have to leave out reserved prefix, like m for multiline string: all the other would be interpreted as a symbolic string prefix.
  • One issue with the previous proposal is that if a string has a special compiler support (multiline strings) or is a symbolic string handled by a library (nix strings) is not obvious at a first glance. This also makes backward-compatibility harder: two sensible future addition to strings are raw strings (probably written r%".."%, and language snippet (like the language option for a markdown code block: the semantics would be the same as a normal string, but editors could highlight the snippet differently, probably written cpp%"..."% or cpp-lang%"..."%). Such addition could conflict with custom prefixes. The solution is to have a more restricted scheme for symbolic strings, such as requiring them to end with -s: nix-s%".."%, tf-s%".."% or any scheme which tells them apart from other special strings.

Prefix semantics

The other question is what to do with this prefix.

  • The easiest approach (from the point of view of the implementers of Nickel) is to just pass this information down to the parsing function and let it decide what to do with it. That is, nix-s%"foo"% would be parsed e.g. as {prefix = `nix, chunks = ["foo"]} instead of ["foo"] currently. This solution is cheap and lets the library authors take care of checking that the prefix matches, or failing otherwise. We could provide combinator to check the prefix and otherwise raise blame with a standardized error. The library implementer could also decide to totally ignore the prefix, which is not great.
  • Another proposed solution was to desugar the symbolic strings to a function application depending on the prefix: nix-s%"foo"% would be desugared to nix s%"foo"% (or whatever expression we want involving Nix). This is how Scala does custom string interpolation (see Advanced usage section). The user then has to make sure that the relevant function is in scope, and this approach might require specialized error reporting to reach a reasonable user experience. To be contrasted with the previous approach coupled with a contract, where the said contract is responsible for selecting the parsing function and performing the checks while the user writing symbolic strings doesn't have to know or care.
@yannham yannham added question Further information is requested area: syntax labels Jan 11, 2023
@matthew-healy
Copy link
Contributor

matthew-healy commented Jan 12, 2023

Re-stating what I've said when we've discussed this in meetings: I'm very pro- finding some way to allow end-users to think about "a Nix string" as a distinct thing from "a Terraform string" or "a Nickel string" when writing configurations. I think the main components for doing that are:

  • Decorating the symbolic string syntax with the "type" of string (e.g. nix%".."%).
  • Raising sensible errors when users misuse symbolic strings. Ideally that would mean: (1) trying to use a regular string where a symbolic string is expected raises an error which suggests using the correct prefix, and (2) using the wrong "type" of symbolic string raises an error suggesting using a symbolic string of the correct "type".

The two error-handling cases there are likely a lot easier said than done.

I think trying to just desugar into function application, while nice & simple from an implementation perspective, probably puts too much burden on the user in terms of making sure the correct "type" of string is in scope.

The "tagged" approach, plus a stdlib contract (e.g. SymbolicString "nix"), could definitely work. I wonder if we could somehow avoid the risk of users not using this contract by changing how we interpret symbolic strings. Right now they're essentially directly desugared to an array of terms, but we could instead "seal" this array inside a primop which can only be unsealed as part of a contract. For example:

// my-config.ncl
some_nix_fn nix%"..."%,

// nix-lib.ncl
some_nix_fn | (contract.SymbolicString "nix") -> SomeT = fun s => ...,

// contract.ncl
SymbolicString | Str -> Dyn = fun s =>
  fun v l => %unseal_symbolic_str% v s l,

I'm just sketching this out quickly, so maybe there are reasons this wouldn't work that I'm not seeing, but it seems like this would allow us to raise pretty detailed errors in all of the situations mentioned above, thereby making the feature a lot simpler to use, at the relatively minimal cost of a pair of new seal/unseal primitives.


This also makes backward-compatibility harder: two sensible future addition to strings are raw strings (probably written r%".."%, and language snippet (like the language option for a markdown code block: the semantics would be the same as a normal string, but editors could highlight the snippet differently, probably written cpp%"..."% or cpp-lang%"..."%). Such addition could conflict with custom prefixes. The solution is to have a more restricted scheme for symbolic strings, such as requiring them to end with -s: nix-s%".."%, tf-s%".."% or any scheme which tells them apart from other special strings.

An alternative solution is to take the approach that Rust took with keywords & reserve a bunch of string prefixes in advance that we think we could want to use in future. We could reserve all single-character prefixes, as well as .+-lang (or syntax-.+, which I think I might prefer) - basically anything we might want to use. If we decide that we don't need them then we can simply allow them them, which is backwards compatible.

@bew
Copy link

bew commented Jan 12, 2023

I don't have full understanding of what you're talking about, but I just want to make sure: Will we be able to mix nix-like interpolations and normal interpolations?

Like we can do in Nix:

let simple_var = "something"; in
"${pkgs.some-drv}/bin/${simple_var}"

Here pkgs.some-drv would be need special handle, but something doesn't need.
So can we still mix the 2 in the same value?

@matthew-healy
Copy link
Contributor

@bew Yes, you'll be able to mix the two. As of right now, when you write:

s%"%{pkgs.some_drv}/bin/%{something}"%

we essentially desugar the whole expression to:

[pkgs.some_drv, "/bin/", something]

In the current WIP iteration of nix interop, we then check that each element of that array is either a derivation or a string, and handle them accordingly.

@bew
Copy link

bew commented Jan 12, 2023

Thanks 👍

s%"%{pkgs.some_drv}/bin/%{something}"%

(hot reaction, might be helpful for the bikeshedding)
That's a lot of percent signs!
You really need to have good syntax highlighting to make sense of / visually parse this relatively simple string.
👉 Are you considering using less 'symbols' to make it easier to read when no syntax highlighting is used?

@aspiwack
Copy link
Member

Two comments:

  • It seems to me that the function semantics leans itself quite a bit better to contracts (in fact, the function could apply a contract itself, which could be tremendously useful).
  • If we are to use a multiple-prefix syntax for strings, I wonder if nix+s isn't better than nix-s.

@yannham
Copy link
Member Author

yannham commented Jan 13, 2023

It seems to me that the function semantics leans itself quite a bit better to contracts (in fact, the function could apply a contract itself, which could be tremendously useful).

My pet-peeve with the function semantics is that it puts the burden of making the corresponding function available in scope to the end-user. Because we don't have any namespacing or a global overloading resolution mechanism, this is very sensitive to variable names and shadowing. For example:

# somepackage.ncl
{
  name = "somepackage",
  build_command = nix%"
    %{someotherpkg}/bin/somebinary --bar --baz
  "%,
}

Now, this means that a nix variable must be in scope L4. If you're a random devops user writing a new package, this is a non trivial task. Let's say they read the doc of nickel-nix, find what they need to import in the end, and fix it:

# somepackage.ncl
let {nix, ..} = import "nickel-nix/nixstring.ncl" in

{
  name = "somepackage",
  build_command = nix%"
    %{someotherpkg}/bin/somebinary --bar --baz
  "%,
}

Now, if some reason we need to add a nix field to the record, we're doomed, because it will be picked over the outer nix import. Same thing if we have a local binding shadowing nix. In the case of the recursive record, we can't even do the trick of let nix_ = because the prefix of the string imposes that we have a nix variable in scope and that we control its content. What's more, because the identifier is defined, but is just of the wrong type, I fear the error messages won't be great. Especially if such nix turns out to be a function as well.

I personally find the opposite approach easier in Nickel: use a contract which is responsible for applying the parsing function. Then you can design your library in such a way that the user writing the config doesn't have anything to put in scope:

# part of nickel-nix
Package = {
  name | Str,
  build_command | NixStr,
  ..
}

The glue code will import somepackage.ncl and apply the NixStr contract, which is just an idempotent version of the nix parsing function + custom error messages (note also that the error reporting is probably better, because if you raise an error from within a nix function application, you don't have a label relating to the callsite of nix nor build_command, while in the second case the error message does show the definition of build_command in the Package contract and the actual value of build_command).

This is in fact how it is done today in the nickel-nix prototype. In the end, I think the function semantics can be nice, but I'm afraid it won't work very well given the very simple scoping model of Nickel.

@yannham
Copy link
Member Author

yannham commented Jan 13, 2023

(hot reaction, might be helpful for the bikeshedding)
That's a lot of percent signs!
You really need to have good syntax highlighting to make sense of / visually parse this relatively simple string.
point_right Are you considering using less 'symbols' to make it easier to read when no syntax highlighting is used?

One possibility is to have two versions of special strings: a single line one, and a multi line one. The multiline, beside the indentation and all, is also useful because the escaping mechanism is different. So I tend to think we should have it anyway.

But for short strings, we could use the cpp syntax prefix"...":

nix"%{pkg}/bin/bar --foo --bar"

@aspiwack
Copy link
Member

My pet-peeve with the function semantics is that it puts the burden of making the corresponding function available in scope to the end-user.

This is a good point. Yet it's not necessarily completely true: when using, say, the nix-nickel thing, I assume that the nickel file(s?) will be evaluated in a context that contains more than the standard library. Or maybe not, but then there would be a standard import at the top of the file. Either way, the extra context may very well include the nix-string-evaluation function.

Now, if some reason we need to add a nix field to the record, we're doomed, because it will be picked over the outer nix import.

A solution to this problem could be to give a very intentional name to string evaluators. Like nix%"…" uses a function named string-evaluator-nix. This is unlikely to clash with any user-defined names which is not about symbolic strings to begin with.

I personally find the opposite approach easier in Nickel: use a contract which is responsible for applying the parsing function.

This is clever. However

  • It feels a bit yucky because you have a contract that is not a sub-identity.
  • It seems to me that both of your objections above apply equally to this approach.

I'm not saying that it's not the way to go (it's clearly more economical in terms of language design), but I'm not sold.

@yannham
Copy link
Member Author

yannham commented Jan 13, 2023

It feels a bit yucky because you have a contract that is not a sub-identity.

Yes, that's a general philosophical question about what is the extent of processing that contracts should be ideally doing. I tend to think that "normalization" is fine (e.g. you have a union contract but your output value is normalized to always have one canonical form corresponding to one of the branch, think Date = IsoStr \/ StructuredDate and the contract automatically convert from IsoStr to StructuredDate). In term of concrete properties, I would say that idempotent functions are OK. But this is another debate, and I'm not claiming to know the right answer.

It seems to me that both of your objections above apply equally to this approach.

The clash part doesn't, because there isn't any constraint on the name of an identifier holding a contract. If your Foo contract clashes with something else, you can always do let Foo_ = Foo in ... Foo_. The precise name of contracts is unimportant and can be substituted for something else without changing the semantics, which is not the case with the implicit application approach. It's true that an extended name for parsing functions reduce the risk.

The contract also doesn't have to be in scope for the end user, as the library code consuming the data may apply it. If you split your package in 10 Nickel files, each with symbolic strings, the function approach mandates that each one must import the relevant parsing function. With a contract, they might be pure Nickel without any import, if desired.

Additionally, we'll have to have a contract on the field anyway, because the user may fill build_command = 5 without going through symbolic strings at all. So the function approach may apply a contract (has to, I guess), but that doesn't free us of attaching a NixStr contract to build_command anyway.

I'm not sold either, but I feel the contract approach only leverages known and usual concepts of Nickel (beside the funny string syntax for symbolic strings), while the implicit function requires to be familiar with an additional idea (making sure the corresponding parsing function is in scope).

@aspiwack
Copy link
Member

Yes, that's a general philosophical question about what is the extent of processing that contracts should be ideally doing. I tend to think that "normalization" is fine (e.g. you have a union contract but your output value is normalized to always have one canonical form corresponding to one of the branch, think Date = IsoStr \/ StructuredDate and the contract automatically convert from IsoStr to StructuredDate). In term of concrete properties, I would say that idempotent functions are OK. But this is another debate, and I'm not claiming to know the right answer.

It makes sense indeed. It's in the same spirit as validation functions producing structured data in Haskell as well.

The contract also doesn't have to be in scope for the end user, as the library code consuming the data may apply it. If you split your package in 10 Nickel files, each with symbolic strings, the function approach mandates that each one must import the relevant parsing function. With a contract, they might be pure Nickel without any import, if desired.

I must be missing something here, because I don't see how the two options differ with respect to this point.

I'm not sold either, but I feel the contract approach only leverages known and usual concepts of Nickel (beside the funny string syntax for symbolic strings), while the implicit function requires to be familiar with an additional idea (making sure the corresponding parsing function is in scope).

I see where you're coming from now. It is indeed less material, which is valuable.

@yannham
Copy link
Member Author

yannham commented Jan 17, 2023

I must be missing something here, because I don't see how the two options differ with respect to this point.

Typically, the current nickel-nix prototype generates glue code to evaluate the main Nickel file, something like:

nickelWithImports = builtins.toFile "eval.ncl" ''
          let params = {
            inputs = import "${exportedJSON}",
            system = "${system}",
            nix = import "${./.}/nix.ncl",
          } in
          let nickel_expr | params.nix.NickelPackage = import "${nickelFile}" in
          nickel_expr.output params
      '';

In particular, the NickelPackage contract is applied by the Nixel library itself. Technically, you could write your nickel package definition without any contract, in several files, use symbolic strings in all of them, compose them (as symbolic strings can be composed as they are, they are just trees) without issues. The parsing function is called eventually when eval.ncl - which is the entry point generated by nickel-nix library - is evaluated, applying a NixStr contract to the fields that accept such strings, which recurses inside the symbolic string AST.

It's not entirely true in the case of nickel-nix, because we also use contracts to define the type of package we wanna build, so there is usually a contract application in the user package definition. It's also probably better to do so for the LSP.

Still, the point is that some "root" contract (whatever it means) eventually applied is sufficient to propagate the required XxxxStr contracts inside an arbitrary complex record that can be defined by pieces in several files.

In a non Nix-related example, you could have something like:

# root.ncl
let {SpecialStr, ..} = import "some_nickel_framework/string.ncl" in

{
  config
    | doc "main config"
    | Schema
    =
      import "piece1.ncl"
      & import "piece2.ncl",

  Schema = {
    foo | SpecialStr,
    bar | SpecialStr,
  }
}

# piece1.ncl
let piece3 = import "piece3.ncl" in
{
  foo = sometool%"
    %{piece3.sub_foo} adding my stuff!
  "%,
}

# piece2.ncl
let piece4 = import "piece4.ncl" in
{
  bar = sometool%"
    %{piece4.sub_bar} adding my stuff!
  "%,
}

# piece3.ncl
{ sub_foo = sometool%"some symbolic string"% }

# piece4.ncl
{ sub_bar = sometool%"some symbolic string"% }

Now, with the parsing function approach, you have to make sure the parsing function is in scope at each usage site, that is in the pieceX.ncl files:

# root.ncl
let {SpecialStr, ..} = import "some_nickel_framework/string.ncl" in

{
  config
    | doc "main config"
    | Schema
    =
      import "piece1.ncl"
      & import "piece2.ncl"
  Schema = {
    foo | SpecialStr,
    bar | SpecialStr,
}

# piece1.ncl
let piece3 = import "piece3.ncl" in
let {sometool_parsing_function, ..} = import "some_nickel_framework/string.ncl" in
{
  foo = sometool%"
    %{piece3.sub_foo} adding my stuff!
  "%,
}

# piece2.ncl
let piece4 = import "piece4.ncl" in
let {sometool_parsing_function, ..} = import "some_nickel_framework/string.ncl" in

{
  bar = sometool%"
    %{piece4.sub_bar} adding my stuff!
  "%,
}

# piece3.ncl
let {sometool_parsing_function, ..} = import "some_nickel_framework/string.ncl" in

{ sub_foo = sometool%"some symbolic string"% }

# piece4.ncl
let {sometool_parsing_function, ..} = import "some_nickel_framework/string.ncl" in

{ sub_bar = sometool%"some symbolic string"% }

It would work nicely in a language with a mechanism to resolve implicitly the definition of sometool_parsing_function, though.

@aspiwack
Copy link
Member

I understand your point now, thanks.

I imagine, though, that if you use contracts to enforce the nixiness of strings, you would not call the string literal nix%…%, would you?

It would work nicely in a language with a mechanism to resolve implicitly the definition of sometool_parsing_function, though.

We could imagine a mechanism to evaluate a nickel file in a context. We already do this somehow: the standard library is injected around every file. We could generalise this mechanism. Maybe with command-line arguments in the invocation of nickel.

That being said, it's not clear to me how we can pass this context to the LSP.

@yannham
Copy link
Member Author

yannham commented Jan 19, 2023

I imagine, though, that if you use contracts to enforce the nixiness of strings, you would not call the string literal nix%…%, would you?

One of the proposal was to do this nonetheless, but just pass it as data in the resulting value, something like:

nix%" foo %{bar} baz "%
# desugars to
{tag = `SymbolicString, prefix = `nix, fragments = ["foo ", bar, " baz"]}

The parsing function from Nixel is then responsible for checking the prefix and act accordingly (probably blame with a cool error message if the prefix isn't supported). I think what @matthew-healy dislikes about this solution is that the parsing function can just ignore the prefix, while the implicit function approach somehow forces you to choose one or a finite number of prefixes.

We could imagine a mechanism to evaluate a nickel file in a context. We already do this somehow: the standard library is injected around every file. We could generalise this mechanism. Maybe with command-line arguments in the invocation of nickel.

That being said, it's not clear to me how we can pass this context to the LSP.

Yes, in this case this would work pretty well. But as you mention, the mechanism must be standard and understood by the LSP (which probably requires a notion of "workspace", "module", "project" or whatnot).

@matthew-healy
Copy link
Contributor

Yes, that's my main problem with just passing a record containing the prefix into the library function, but I think it might be possible to work around that using %seal_symbolic_string%/%unseal_symbolic_string% primitives like I mentioned above, which check that the prefix expected by the library matches the one provided by the user.

@yannham
Copy link
Member Author

yannham commented Jan 20, 2023

I think there is a path which is forward-compatible (at least for the consumers of the library, the library authors of e.g. Nixel may have to adapt, but that's less of a problem):

  • start with the custom prefix approach which is just added to the desugaring as additional data. Libraries should check them, but hey, nothing prevents them from not doing that.
  • add sealing as proposed by @matthew-healy. One issue is that you may want to support multiple prefix (think a Nickel-Nix-Terraform library, with nix, tf, and tf-nix strings, or maybe we should just always use nix-tf ?). But that's not hard to support.
  • move to the implicit function approach, if we want to, once we have a proper implicit function resolution mechanism or a way to inject an ambient "prelude" library.

This shouldn't much difference on the user side, while imposing increasing constraints on the libraries.

@yannham
Copy link
Member Author

yannham commented Feb 9, 2023

The latter approach has been approved. As this issue is mostly a bikeshedding issue, and not a tracking one, I'm closing.

@yannham yannham closed this as completed Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: syntax question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants