-
Notifications
You must be signed in to change notification settings - Fork 36
New keywords #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New keywords #27
Conversation
Can't find the description on how Reason wants to approach backward-incompatible syntax changes, but in short: it will expect an attribute with syntax version at the top of the file, parses files without this attribute at some fixed "current" version, and when specified - parses source according to version that is specified. Also Maybe OCaml can do something similar with the help of |
I had a look at how Reason handles new keywords, which comes up e.g. when converting OCaml code to Reason syntax, as in OCaml Since reasonml/reason#1539, it's done by appending underscores: the OCaml code |
@stedolan Minor point: I don't think we need multiple parsers... The keyword table of lexer.mll can be dynamically populated based on the configuration. That's how Merlin has dealt with a few variants of OCaml language with a single grammar (including Meta OCaml and a few camlp4 extensions). |
I also discussed in the past of something like the syntax proposed here with @jordwalke for Reason. It would allow both syntaxes to reserve only the needed keywords while still making everything accessible. |
I believe that this is a problem that we need to address, and I like the two parts of @stedolan's proposal. Count me in favor of accepting this RFC. |
... but there has to be some bike-shedding. Is there a proposal for a "raw identifiers" syntax that uses delimiters, and not just a prefix marker? This could be a nice way to interoperate with other dialects or languages and reuse their identifiers, even they use slightly different lexical conventions. (Maybe |
What's the problem with |
It's unlikely to be used in the wild, but currently |
Right, I had missed that. Thanks. |
Three thoughts:
|
On that note, a quick scan of opam packages didn't show any use of that "feature"; only string quoting in strings/comments. |
Overall, the proposal sounds very reasonable to me. Regarding |
That's not enough:
|
Unless I missunderstood François aim actually … disregard my comment. (PS: I like the general proposal) |
The example you point out is interesting, and its meaning would change if To clarify my proposal, I suggest that |
I second François with the idea that |
What about |
My impression from this collective bikeshedding session is that the proposed |
Except for this syntactic discussion, the most substantial change proposal for the RFC is @dra27's proposal to also have sets-of-keywords defined by their OCaml language version. (With a tasteful choice of semantics, we could eg. have |
I don't have a strong opinion, but it feels a bit ad-hoc to version keywords while we don't have a good story about versioning other aspects of the language/toolchain (warnings, stdlib, CLI, ...). But again, not a strong opinion either way. |
Meh, the argument of not doing something reasonable on A because we are also bad at B and C is not so convincing. |
A tiny argument to reduce the feeling of ad hoc-ness: versioning the CLI and the stdlib involve supporting multiple things, where noting the version a keyword was added is a single "fact" (i.e. it's fixed in the code). Similar has just been done - or is at least proposed - for warnings for the documentation. For warnings, arguably we wouldn't ever want "Warnings of 5.0" - that can silence real warnings in old code (that's just why we don't make new warnings fatal) whereas the same wouldn't be true of old code with new keywords. |
I agree. I withdraw my reservation. |
My impression based on this discussion is that @stedolan's proposal is roughly consensual. I will keep oh-so-subtly pushing things towards a clear decision, in the hope of motivating @stedolan (with possibly some help on the parsing part?) to propose an implementation. (I don't think we should necessarily wait for a "formal approval" to start implementing things.) |
I do agree on the basic idea (i.e. allow to change the set of keywords for backward compatibility, and offer raw identifiers to allow interaction between pieces of code using different sets of keywords). Also, I don't like very much One could even add an extra backquote after to use as an infix: e1 ``plus` e2. The introduction of raw identifiers also suggests that we could at last allow keywords as labels (who hasn't wished to use ~to or ~end as labels), but this is another subject. |
In the past I have mentioned the concept of editions/epochs from Rust/C++, but since nobody has picked up on the suggestion so far I would like to mention it again. Knowing you @stedolan, you are probably well aware of this prior art, and you have probably thought about it while writing this proposal (I think that you are alluding to it). But still I wonder whether some of the ongoing discussion is not reinventing the wheel. The documents that proposed epochs for Rust and C++ are good reads, they go through the motivations in depth and are written by people aware of the ramifications of BC issues. (see Rust, C++) Rust editions were designed to let the language evolve while solving both problems of backwards-compatibility and avoiding ecosystem splits. They address the fragmentation issue by ensuring that everyone evolves towards the same set of "options" (echoing @garrigue's legitimate worries). They can also be useful as a common versioning scheme for some other aspects of the toolchain later on (cf. @nojb's natural concern). Crucially, there is a distinction between compiler versions and editions. Another important aspect which deviates from the current discussion is that editions are opt-in: you never have to "update" your build system configuration to tell a newer compiler that you want an earlier edition (this would be seen as a breakage). In the case of Rust, together with the use of automated conversion tools, this opens the door to even more ambitious language changes (I mention it because such automated conversion has already been evoked for some proposed changes to OCaml, though I have not heard from it since). Overall, I recognize in this proposal what could be building blocks for such a notion of epochs/editions (and maybe you already see it this way), especially:
As a disclaimer, I am familiar with backwards-compatibility implications but I less so with the relevant parts of the compiler to have a good opinion about implementation strategies (hence my questions). |
This PR has in-effect be merged, as we merged ocaml/ocaml#12323 , but only partially: we implemented the "raw token" syntax but not the shady compiler options. I am going to "close" rather than "merge" this, but I am not sure what to do. |
I find myself using this feature in situations that are not related to forward-compatibility or new keywords, but when I do want to define a variant name that is also a keyword. For example I'm defining a module |
[RFC text copied below]
New keywords for OCaml
New language features require new syntax, and often the best new syntax involves one or more new keywords. However, adding keywords to a language brings backwards-compatibility concerns, as programs that use the keyword as an identifier stop working.
The aim of this RFC is to make it easier to add new keywords to OCaml. Specifically, two small changes are proposed to the lexical syntax:
a new class of "optional keywords", initially empty, which can be disabled by a command-line option or lexer directive. (If disabled, the keywords remain usable as identifier)
a new syntax of "raw identifiers", which allow words to be used as identifiers regardless of whether or not they are keywords.
With these changes, new keywords can be added as optional keywords, disabled by default. Old code works as normal, new code can opt in to the keyword, and identifiers colliding with the keyword can still be used as raw identifiers. Eventually, the new keyword can be enabled by default, but old code continues to work with an explicit compiler flag.
Goals and proposal
The main goal is to allow OCaml to be extended with new keywords in a backwards-compatible way. Specifically:
Old code should still be usable, even if it uses identifiers that now collide with keywords.
New and old code should mix, even if the new code needs to refer to an old identifier which is now a keyword.
The proposal is in two parts:
Add compiler options
-use-keyword foo
,-no-keyword foo
and lexer directives#use_keyword foo
,#no_keyword foo
to enable and disable an optional keywordfoo
.If
foo
is not recognised as an optional keyword, then-use-keyword foo
and#use_keyword foo
are errors, while-no-keyword foo
and#no_keyword foo
are silently ignored. (Silently ignoring these allows compatibility to be maintained before and after an optional keyword is introduced).Add a new syntax
\#foo
called a raw identifier. This syntax is equivalent to a plain identifierfoo
, except that\#foo
is always an identifier even iffoo
is a keyword.This syntax can be used anywhere that
foo
can be used:~\#foo
is a labelled argument,`\#Foo
is a polymorphic variant,'\#foo
is a type variable,\#Foo
is a module or constructor name, etc.(This feature is present in C# (
@foo
), Swift (`foo`
) and Rust (r#foo
). None of these syntaxes can be reused directly without conflicts, and the proposal here is closest to Rust's)Part 1 of the proposal ensures that old code continues to work, although once a new keyword is enabled by default old code will need to use the
-no-keyword foo
flag to compile unmodified. Part 2 ensures that new code can refer to identifiers exported by old code, even if they are now keywords.Alternative approaches
There are many ways to extend a language. Here are some possibilties, which seem less preferable to the proposal above:
Just add keywords
The traditional approach is to add keywords and accept some amount of breakage. The last time this occurred (adding
nonrec
in 4.02) it generated a long discussion on whether it is acceptable to add a new keyword, even one that will break no known code. For keyword proposals that are known to break code (e.g.effect
,macro
,implicit
,unboxed
), this seems unworkable.Symbols instead of keywords
Instead of adding keywords, it is possible to introduce new syntax using entirely non-alphabetic characters. However, it's hard to read and look up unfamiliar symbolic syntax, and the space of remaining options is small as the OCaml lexical syntax is quite crowded. (For an example of this, see RFC #10, which further overloads the
#
symbol in types, trying to keep this separate from the various current meanings of#
).Attributes instead of keywords
The
[@attribute]
syntax can be used to add arbitrary annotations to the parse tree, and has already been used for several new language features, including immediate types, unboxed types and explicit tail calls.It has a couple of disadvantages: first, the syntax is noiser than and inconsistent with existing keyword-based syntax. For example, a single-field record declaration may be annotated as any of mutable, private or unboxed. Two of these are a keyword, while one is spelled
[@@unboxed]
, where the distinction is based mostly on the date that the feature was introduced.Secondly, since attributes are valid anywhere, subtle bugs are possible if they end up on the wrong parsetree node. For instance,
type t [@@@immediate]
is silently accepted and declares a non-immediate type: the extra@
in@@@immediate
makes it a standalone annotation disconnected fromtype t
, so that it gets parsed and ignored.Contextual keywords
Some languages (notably, C#) allow words to be used as identifiers yet be recognised as keywords in certain contexts, which provides a high degree of backwards compatibility at the expense of more complex parsing.
However, there are two reasons why this approach is less effective in OCaml: first, OCaml distinguishes fewer contexts. In particular, the C# trick of making a word be a keyword in statement but not in expression context is not useful in a language that does not distinguish statements and expressions. Second, OCaml accepts a sequence of arbitrary space-separated identifiers as a function application, so it is harder to find a construction that does not already mean something.
Overloaded keywords
It is tempting to reuse an existing keyword, by giving it a new meaning in a context which it cannot currently be used. While it does preserve compatibility, this is mostly a bad idea: for instance, see the various confusing meanings of
static
in C. In particular, this sort of keyword reuse removes the ability to easily look up the meaning of some syntax, which is one of the main reasons to use keywords in the first place.Multiple parsers
Finally, we could ship multiple versions of the parser that accepted different editions of OCaml's syntax. This does have certain advantages, but has an unusually high maintenance cost, and seems undesirable on that basis alone.