Add support for extracting elements of a tuple by their numerical index.
Also formally add the well-established basic syntactic and semantic rules for tuples, for which we have had leads issues but no proposal, into the design.
Currently, the only way to access the elements of a tuple is through pattern matching. While this handles many cases well, it is sometimes desirable to access an element of a tuple more succinctly, especially in cases where only a single element's value is needed.
In Python, tuple indexing is performed using square brackets:
tup = (1, 2, 3)
# Prints 2.
print(tup[1])
In C++, std::pair
is indexed using .first
and .second
, and std::tuple
is
indexed using std::get<I>
.
In Rust and Swift, a tuple is indexed using .N
, where N
is a decimal integer
literal.
- Rust disallows digit separators and base prefixes in
N
, but allows certain literal suffixes for historical reasons. - Swift disallows digit separators and base prefixes in
N
.swiftc
allows leading0
digits, although this appears to be an unintentional consequence ofllvm::StringRef::getAsInteger
allowing them.
The current Carbon documentation suggests using tuple[i]
for tuple indexing,
but this has not been the subject of an approved proposal.
Formally, we have not yet approved a proposal that says that Carbon has tuple types, although we have approved several proposals that explicitly include support for tuples. So, this proposal does that: tuples exist in Carbon, and are product types with unnamed positional elements.
This proposal also updates the design to match other decisions that have been made in leads issues but not captured by a proposal, specifically:
- Leads issue #2191 (one-tuples and one-tuple syntax), despite being focused on one-tuples, established the syntax for tuples of all arities.
- Leads issue #710 established rules for assignment, comparison, and implicit conversion of tuples. These operations are performed elementwise, with relational comparisons being performed lexicographically.
Finally, the main intent of this proposal is to add support for indexing tuples, using the following syntaxes:
.
N, where N is an integer literal, and.
(
expr)
, where expr is a template constant of integer type.
For pointers to tuples, ->
N and ->
(
expr )
are also supported.
Multi-level tuple indexing will result in constructs such as
tuple_of_tuples.1.2
. It's important that these are lexed as two tuple indexing
operations, not as tuple_of_tuples
.
1.2
, as it would be under the current
lexical rules, so a new rule is introduced:
- When a
.
or->
token is followed immediately by a digit, it is lexed as a.
or->
token followed by an integer literal, never a real literal.
Note that this results in lexing being slightly contextual: the rule to lex a
token after a .
or ->
is different from the rule to lex a token in any other
context. However, there is an alternative equivalent formulation of the rule
that is not context-sensitive: that .integer
is treated as a single lexeme
that produces two tokens, and likewise for ->integer
.
The elements of a tuple are treated as if they had decimal integers as their
names: .0
, .1
, and so on. It is an error to use a different spelling of that
integer in a simple member access, because that spelling would not match the
element name. For example, (1, 2).0x0
is invalid, as is large_tuple.1_2
.
These spellings can be used as an expression operand as
described below: (1, 2).(0x0)
and large_tuple.(1_2)
are both valid.
The .
N syntax has the same precedence as postfix member access syntax, .
name, and can be combined in the same expression: a.0.x.1
is valid.
The .
(
expr )
syntax is not new in this proposal, and continues to have
the same precedence as .
name.
In the .
(
expr )
syntax, if the first operand is a tuple and the second
operand is a constant of any integer type, the result is the corresponding tuple
element, as if specified by a decimal integer literal. This rule is built into
the language; the .
(
... )
notion is not currently overloadable.
If the tuple index is not between 0 and one less than the number of elements in the tuple, inclusive, the indexing is invalid.
The current skeleton design suggests using tuple[a .. b]
to slice tuples. For
example, tuple[0 .. 2]
could be used to extract the first two elements of a
tuple. Tuple slicing support is not covered by this proposal, but could be added
in the future with syntax such as tuple.(0 .. 2)
. However, note that there is
a risk that this syntax may lead to an incorrect theory about how Carbon works:
namely, that tuple.__
gives an element whereas tuple.(__)
gives a tuple.
Goals:
- Language tools and ecosystem
- The lexing rule is relatively simple to implement. Tools such as syntax
highlighters can treat
.i
as a distinct kind of token rather than implementing any kind of context-sensitive lexing.
- The lexing rule is relatively simple to implement. Tools such as syntax
highlighters can treat
- Software and language evolution
- Consistent use of tuple field indexes can be used to support code that adds new tuple elements over time.
- Code that is easy to read, understand, and write
- This feature allows tuple access to be written more concisely than pattern matching would allow.
- Lexing
.1.2
as four tokens rather than two avoids a surprise that would make chained member access hard to write. - For simple member access, requiring a decimal integer with no digit separators allows the member access to be treated as an element name, making the indexing easier to understand.
- Interoperability with and migration from existing C++ code
- This feature provides a migration syntax for existing use of
.first
,.second
, andstd::get<I>
. The permission to use expressions rather than only literals supports migration ofstd::get<expression>
.
- This feature provides a migration syntax for existing use of
Principles:
- Low context sensitivity.
- We look only at the character immediately before a numeric literal to
determine whether it is lexed as a tuple index that stops before the
next
.
or as a general numeric literal.
- We look only at the character immediately before a numeric literal to
determine whether it is lexed as a tuple index that stops before the
next
We could lex .0
, .1
, ... as a single token rather than as separate .
and
0
, 1
, ... tokens. This would somewhat simplify the lexing rules, because
they would no longer be contextual. We choose to not do this because:
- This would be inconsistent with our handling of
struct.fieldname
. - Either
tuple . 0
would be invalid, unlikestruct.fieldname
, or it would need to use a distinct grammar production fromtuple.0
.
We could lex an integer literal when the previous token is .
, regardless of
whether the literal follows the .
immediately. For example, we could treat
let n: i32 = ((1, 2, 3), 4) . 0.1;
as tuple indexing, rather than as a tuple followed by a .
and a real literal.
This is what Swift does. We choose to not do this because:
- The
0.1
literal in this case looks like a real literal, not tuple indexing, so this would likely cause surprise for readers. - This would make the context-sensitive lexing be non-local. The chosen rule
can be interpreted as lexing
.[0-9]*
as a single lexeme, but forming two tokens from it, whereas this alternative rule would be much more firmly a context-sensitive lexing rule.
We could get a similar result in other ways:
- We could allowing a real literal after a
.
, and split it into a pair of member accesses when needed. This is whatrustc
does. - We could lex a real literal as three tokens: an integer token, a
.
token, and a suffix token, and merge them back together in the parser. This is whatintellij
does when parsing Rust.
Note that these approaches are not entirely equivalent to each other. In Rust, for example, the difference is observable in proc macros. Also, using any kind of token merging or splitting approach would result in the token stream not matching the interpretation of the program, which is problematic for tooling. For example, many common Rust syntax highlighters do not properly highlight chained tuple indexing.
Carbon follows Rust and Swift in restricting tuple indexes to being decimal integers:
// OK
let a: i32 = (1, 2, 3).0;
// Error, invalid index for tuple element.
let b: i32 = (1, 2, 3).0x0;
This restriction introduces an inconsistency between .0x0
and .(0x0)
, and we
could easily drop it. However, the restriction allows us to consider .0
, .1
,
and so on to simply be the names of the tuple elements, analogous to struct
field names, and there isn't a clear utility for permitting a base prefix or a
digit separator in a tuple index.
Instead of tuple.0
and tuple.(IndexConstant)
, we could use tuple[0]
and
tuple[IndexConstant]
. This would result in more consistent syntax for indexing
with a constant versus with an expression, but would make accessing an element
of a tuple less consistent with accessing an element of a struct. We expect
tuple access with a non-literal index to be a rare operation, so the consistency
with that syntax seems to have lower value.
Also, the use of .
notation aims to convey the intent of the developer better:
we intend x[n]
notation to be used primarily for homogenous indexing,
whereas .
notation is used for heterogenous access. This also reflects the
difference in phase: tuple indexing requires a constant index in the same way
that struct member access requires a constant name, whereas array or container
indexing would typically be expected to permit a runtime index.
The .N
notation can also be extended to perform member indexing into a struct
or class, at least the latter of which would not be reasonable to support with
[]
notation. However, such support is not part of this proposal.
Use of []
notation has the advantage of reducing visual ambiguity for cases
such as O.0
, l.0
, and Z.0
, which might be visually confused with 0.0
,
1.0
, and 2.0
, respectively. However, we're not aware of this being a problem
in practice in Rust or Swift, which use this notation, and the same problem
exists even without the .0
suffix: F(O, l, Z)
may resemble F(0, 1, 2)
.
We could support tuple.-1
, or perhaps tuple.(-1)
, as a notation for "the
last element of the tuple", as used for example in Python. We choose not to
support this at this time because such notation can be confusing and has awkward
edge cases. An off-by-one error, or an attempt to access a one-past-the-start
element, will sometimes be accepted and silently do the wrong thing.
If a future proposal introduces tuple slicing, it should revisit this question,
because this kind of indexing from the end is often desirable when forming a
slice. The possibility of using a different notation for this operation should
be considered, such as tuple.(.size - 1)
.
Carbon permits optional trailing commas in tuples, with mandatory trailing commas for one-tuples. Alternatives to this choice were considered in leads issue #2191.