Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbolic strings #994

Merged
merged 3 commits into from
Dec 16, 2022
Merged

Symbolic strings #994

merged 3 commits into from
Dec 16, 2022

Conversation

matthew-healy
Copy link
Contributor

This PR introduces a simple form of "symbolic strings" (#948), whereby the syntax s%"..."% is just de-sugared to an array.

@matthew-healy matthew-healy force-pushed the feature/symbolic-strings branch from b4d5879 to eefac48 Compare December 14, 2022 15:42
@github-actions github-actions bot temporarily deployed to pull request December 14, 2022 15:45 Inactive
@matthew-healy matthew-healy marked this pull request as ready for review December 14, 2022 15:57
Copy link
Contributor

@vkleen vkleen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I think we should merge this PR and see if the semantics of symbolic strings need chaning after experimenting with them a bit. Similarly, the string delimiters can be bikeshed later.

Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM. I just wonder what we should do about the "multiline" semantics of symbolic strings. The first question is: should we have symbolic strings be multiline by default, or offer a separate version for multiline strings?

Another solution would be to have s"foo" strings as well. I think C++ has those. Then there is a clear delimitation between the multiline axis and the "special" axis.

The second question, which is independent, is what to do with interpolated blocks inside mutliline strings. Currently, we carry around inside string chunks an extra ident arguments to handle those in the natural way (see #994 (comment)). One possibility would be to pass this argument to the parsing function, but it can quickly become cumbersome to pass this information around.

Another solution is just to abandon this semantics for symbolic strings (and do not interpolate other strings as "blocks"). It would be inconsistent though, that the behavior differs between multiline strings and those symbolic string.

@@ -490,26 +495,35 @@ StrChunks: RichTerm = {
.chain(lasts.into_iter())
.collect();

let mut chunks = if start == StringKind::Multiline {
let chunks = if start.needs_strip_indent() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this in mind but forget to ask. Should we behave like multiline string with respect to indentation, or not? It seems like you answer "no", which is very reasonable. That being said, I wonder if we then would like to combine the semantics of symbolic strings and multiline strings, e.g. in Nix often we currently use multiline strings to write builder commands, where symbolic strings would be used. This may not be trivial with respect to indenting interpolated values, though. I guess we don't have to answer this question right now.

src/parser/utils.rs Show resolved Hide resolved
src/parser/utils.rs Outdated Show resolved Hide resolved
pub fn needs_strip_indent(&self) -> bool {
match self {
StringStartDelimiter::Standard => false,
StringStartDelimiter::Multiline | StringStartDelimiter::Symbolic => true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so what I wrote before is wrong, multiline strings and symbolic strings actually behave the same. I'm not sure how this works with interpolated values though. There is some logic at runtime such that:

let bar = "alpha\nbeta" in
m%"
  foo
     %{bar}
  baz
"%

evaluates to

foo
  alpha
  beta
baz

I don't think we can easily keep this logic for symbolic strings, unless we delegate this to custom parsing functions (but that start to be cumbersome).

Matthew Healy added 2 commits December 16, 2022 10:23
Since both multiline and symbolic strings will end with `"%`, it will no
longer be true that the start & end delimiters of a string must match. It
will also no longer be the case that we can infer whether or not
indentation must be stripped from the starting delimiter.

This commit separates the opening and closing delimeter into two
different types, and provides a more explicit API for working with
them, which will be resilient to the addition of a new
`StringStartDelimiter` in a later commit.
This commit implements the symbolic string syntax `s%"..."%`.

These strings using the same parsing rules as multiline strings, but
are then desugared to `Term::Array`s rather than `Term::StrChunks`.
@matthew-healy
Copy link
Contributor Author

@yannham

The first question is: should we have symbolic strings be multiline by default, or offer a separate version for multiline strings?

I think starting off by only supporting multiline symbolic strings makes sense. Especially as it's possible to just write mutliline symbolic strings on a single line.

I can't currently think of a use case for a syntax like s"..." that isn't already supported by the multiline version, but maybe I'm missing something?

The second question, which is independent, is what to do with interpolated blocks inside mutliline strings.

In the current iteration I just ignore the indent parameter. My thinking on this was that symbolic strings will always go through post-processing before they ever end up as regular strings, so making assumptions about how the interpolated values should be indented feels like it might be unnecessary, as Nickel has absolutely no control over the ultimate string value (assuming one is ever created).

To look at your example from the comment above:

let bar = "alpha\nbeta" in
m%"
  foo
     %{bar}
  baz
"%

currently evaluates to

[ "
foo
  ", 
"alpha
beta",
"
baz"
]

This feels like how I'd expect this to behave (though obviously that's completely unscientific 😅) - it seems to make sense to give the exact value that was interpolated here. Though I suppose one place it could matter is if the symbolic string is intended to be evaluated into a language with meaningful whitespace... 🤔

Ultimately, I don't have particularly strong feelings & am happy with whatever you think makes sense. I do think what we have now is probably "good enough" for a first pass - what do you think about opening a separate issue for how to handle multiline interpolated values?

@matthew-healy matthew-healy force-pushed the feature/symbolic-strings branch from eefac48 to e062346 Compare December 16, 2022 10:16
@github-actions github-actions bot temporarily deployed to pull request December 16, 2022 10:19 Inactive
@vkleen
Copy link
Contributor

vkleen commented Dec 16, 2022

I think @matthew-healy's example evaluation is the most reasonable. I wouldn't expect Nickel to process a string that's interpolated into a symbolic string at all. That's also in line with how normal string interpolation in indented multiline strings behaves.

@yannham
Copy link
Member

yannham commented Dec 16, 2022

That's also in line with how normal string interpolation in indented multiline strings behaves.

Precisely, I don't think this is the case:

nickel> let bar = "alpha\nbeta" in
m%"
  foo
     %{bar}
  baz
"%
"foo
   alpha
   beta
baz"

nickel> let bar = "alpha\nbeta" in
s%"
  foo
     %{bar}
  baz
"% |> array.foldl (++) ""
"foo
   alpha
beta
baz"

We carry explicitly the whole indentation of an interpolation (here e.g. 4 spaces for the %{bar} block) and re-apply that at runtime, once we've evalauted bar, to the beginning of each line. We can't do that, at least not trivially, for symbolic string, because the parsing function is the one reconstructing the string (if ever done) at the end. The best we can do is to pass this indentation. Maybe we don't care 🤷‍♂️ but I just wanted to raise the difference in behavior.

Ultimately, I don't have particularly strong feelings & am happy with whatever you think makes sense. I do think what we have now is probably "good enough" for a first pass - what do you think about opening a separate issue for how to handle multiline interpolated values?

Sure, this is the way to go.

Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@vkleen
Copy link
Contributor

vkleen commented Dec 16, 2022

@yannham That's interesting! I tested with the following in test.ncl:

let s = "a\nb" in
  m%"foo
  in %{s}
  bar"%

and nickel -f test.ncl results in

"foo
  in a
b
  bar"

I think the indent behaviour only kicks in if the multiline string starts with whitespace?

@matthew-healy
Copy link
Contributor Author

@vkleen I think it kicks in if the line the interpolation is on starts with whitespace, so e.g.

let s = "a\nb" in
m%"hello
    %{s}
test
"%

results in

hello
    a
    b
test

But because your example has the text in on the same line we don't count that as an indent.

@yannham
Copy link
Member

yannham commented Dec 16, 2022

I think the indent behaviour only kicks in if the multiline string starts with whitespace?

Yes 🙂 you can find more context in #197. Maybe this information should be somewhere else, in the documentation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants