Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify string descriptions #875

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
reword four ways of strings
bendyarm committed Feb 14, 2022
commit fa22931f421b05837f12e67189401279ac6dc866
8 changes: 4 additions & 4 deletions toml.md
Original file line number Diff line number Diff line change
@@ -259,10 +259,10 @@ String
------

There are four ways to express strings: basic, multi-line basic, literal, and
multi-line literal. All strings must be encoded as valid UTF-8, and can contain
any codepoint except control characters other than tab (U+0000 to U+0008, U+000A
to U+001F, U+007F). Multi-line strings can also contain newlines (U+000A) and
carriage returns (U+000D).
multi-line literal. Strings can contain any valid Unicode codepoint except the
following control characters: U+0000 to U+0008, U+000A to U+001F, and
U+007F. Note that tab (U+0009) is allowed. Multi-line strings can also contain
newlines (U+000A) and carriage returns (U+000D).
Copy link

@abravalheri abravalheri Feb 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that saying that U+000A and U+000D are not allowed first1 and then adding an exception for multi-line strings is kind of a double negative (an exception of the previous exception)...

I would recommend restricting the code point ranges/enumeration to the ones that are allowed in all types of strings.

Then I would add a second (separated) statement specifically saying that "basic" and "literal" strings (single-line) don't allow newlines/carriage returns.

For example, something like:

Strings can contain any valid Unicode codepoint except the following control characters:
U+0000 to U+0008, U+000B, U+000C, U+000E, U+001F, and U+007F.
Note that tab (U+0009) is allowed.
Newlines (U+000A) and carriage returns (U+000D) are allowed in multi-line strings
but forbidden in basic and literal strings.

Footnotes

  1. U+000A and U+000D are elements of the previously mentioned character ranges/enumeration


**Basic strings** are surrounded by quotation marks (`"`). Backslash and
quotation mark may only occur if they are part of a valid escape sequence.