From 7236940a2f5e716b06ba10636f5de8415057b8bd Mon Sep 17 00:00:00 2001 From: Tom Davies Date: Mon, 27 Mar 2023 17:36:34 +0100 Subject: [PATCH] feature: String interpolation Adds four kinds of string interpolation split over two axes (utf-8 binary or unicode codepoint list, and user-facing or developer-facing formatting). The result are four general classes of syntax with interpolated values: ``` % binary format <<"A utf-8 binary string: 4"/utf8>> = bf"A utf-8 binary string: ~2 + 2~" ``` ``` % list format "A unicode codepoint list string: 4" = lf"A unicode codepoint list string: ~2 + 2~" ``` ``` % binary debug <<"A utf-8 binary string: {4, foo, [x, y, z]}"/utf8>> = bd"A utf-8 binary string: ~{2 + 2, foo, [x, y, z]}~" ``` ``` % list debug "A unicode codepoint list string: {4, foo, [x, y, z]}" = ld"A unicode codepoint list string: ~{2 + 2, foo, [x, y, z]}~" ``` Arbitrary expressions can be nested inside string interpolation substitutions, including variables, function calls, macros and even further string interpolation expressions. Design ====== Why list- and binary-strings? ----------------------------- In the `string` module from the stdlib, a string is represented by `unicode:chardata()`, that is, a list of codepoints, binaries with UTF-8-encoded codepoints (UTF-8 binaries), or a mix of the two. With this in mind, the list- and binary-oriented string interpolation syntaxes accept either type of interpolated value, but the user of the interpolation determines whether they want to generate a `unicode:char_list()` or `unicode:unicode_binary()` based on which kind of interpolation they use (`bf"..."` and `bd"..."` to create binaries, or `lf"..."` and `ld"..."` to create lists). List-strings are most useful for backwards compatibility and convenience. Binary-strings are most useful for memory-compactness and IO. Why user- and developer-oriented strings? ----------------------------------------- There are two similar, but distinct cases where developers typically want to format strings: when logging/debugging, and when displaying data to users. When logging or debugging, the most important features are typically that any kind of term can be printed, and it should round-trip losslessly and be read by developers unambiguously. Examples of these properties are, for example, retaining runtime type information, e.g. keeping strings quoted when formatting them and printing floats with full range and resolution. When displaying to users, the most important features are typically that they are always going to be human-readable and cleanly formatted. Examples of these properties are, for example, formatting strings verbatim, without quotation marks, and not retaining any Erlang-isms (e.g. we don't want to be printing Erlang tuples, because they won't make much sense to the average application consumer), so we'd rather get a `badarg` error to push the developer to make an explicit formatting decision. Why no formatting options? -------------------------- Let's consider the two use-cases introduced earlier: - Logging/debugging: Typically you want to fire-and-forget, giving whatever value you care about to the formatter, and just let it print that value unambiguously, meaning there's no need to tweak formatting options: `bd"~Timestamp~: ~Query~ returned ~Result~"` - Displaying to users: Typically you want to tightly control formatting, and you probably want to do so in a modular and reusable way. In that case, factoring out your formatting decision to a function, and interpolating the result of that function is probably the best way to go: `bf"You account balance is now ~my_app:format_balance(Currency, Balance)~"`. Notably, nothing in the design and implementation here precludes the future introduction of formatting options such as `bf"float: ~.2f(MyFloat)~"` as one might do with `io_lib:format` etc. But existing stdlib functions can offer similar functionality, e.g. `bf"float: ~float_to_binary(MyFloat, [{decimals, 2}, compact])~"`, and can be factored out into their own reusable functions. Implementation ============== To parse interpolated strings, the scanner tracks some additional state regarding whether we are currently in an interpolated string, at which point it enables the recognition of `~` as the delimiter for interpolated expressions, and generates new tokens which represent the various components of an interpolated string. Early during compilation and shell evaluation, interpolated strings are desugared into calls to functions from the `io_lib` module, and therefore don't impact later stages of compilation or evalution. The new string interpolation syntax was not previously valid syntax, so should be entirely backwards compatible with existing source code. --- lib/compiler/src/compile.erl | 4 + lib/stdlib/examples/erl_id_trans.erl | 33 + lib/stdlib/src/Makefile | 1 + lib/stdlib/src/epp.erl | 4 + lib/stdlib/src/erl_desugar_interpolation.erl | 657 ++++++++++++++++++ lib/stdlib/src/erl_eval.erl | 6 + lib/stdlib/src/erl_lint.erl | 4 + lib/stdlib/src/erl_parse.yrl | 30 +- lib/stdlib/src/erl_scan.erl | 135 +++- lib/stdlib/src/io_lib.erl | 304 ++++++++ lib/stdlib/src/stdlib.app.src | 1 + lib/stdlib/test/Makefile | 1 + .../test/erl_desugar_interpolation_SUITE.erl | 575 +++++++++++++++ system/doc/reference_manual/data_types.xml | 65 +- 14 files changed, 1810 insertions(+), 10 deletions(-) create mode 100644 lib/stdlib/src/erl_desugar_interpolation.erl create mode 100644 lib/stdlib/test/erl_desugar_interpolation_SUITE.erl diff --git a/lib/compiler/src/compile.erl b/lib/compiler/src/compile.erl index 9ff09911f9d1..373da7139224 100644 --- a/lib/compiler/src/compile.erl +++ b/lib/compiler/src/compile.erl @@ -789,6 +789,7 @@ make_ssa_check_pass(PassFlag) -> standard_passes() -> [?pass(transform_module), + ?pass(desugar_interpolation), {iff,makedep_side_effect,?pass(makedep_and_output)}, {iff,makedep,[ @@ -1249,6 +1250,9 @@ strip_columns(Code) -> erl_parse:map_anno(F, Form) end || Form <- Code]. +desugar_interpolation(Code, #compile{options=Opt}=St) -> + {ok, erl_desugar_interpolation:module(Code, Opt), St}. + get_core_transforms(Opts) -> [M || {core_transform,M} <- Opts]. core_transforms(Code, St) -> diff --git a/lib/stdlib/examples/erl_id_trans.erl b/lib/stdlib/examples/erl_id_trans.erl index 792f944d365c..eb772c8df448 100644 --- a/lib/stdlib/examples/erl_id_trans.erl +++ b/lib/stdlib/examples/erl_id_trans.erl @@ -521,6 +521,39 @@ expr({match,Anno,P0,E0}) -> E1 = expr(E0), P1 = pattern(P0), {match,Anno,P1,E1}; +expr({interpolation_no_subs,Anno,binary,_IsDebug,Str}) -> + S = {string, Anno, Str}, + {bin, Anno, [{bin_element,Anno,S,default,[utf8]}]}; +expr({interpolation_no_subs,Anno,list,_IsDebug,Str}) -> + {string, Anno, Str}; +expr({interpolation,Anno, + {interpolation_head,AnnoHead,binary,IsDebug,HeadStr}, + Elems, + {interpolation_tail,AnnoTail,TailStr}}) -> + Elems1 = + [ case Elem of + {interpolation_cont,AnnoCont,Str} -> {interpolation_cont,AnnoCont,Str}; + {interpolation_subs,AnnoSubs,Expr} -> {interpolation_subs,AnnoSubs,expr(Expr)} + end + || Elem <- Elems ], + {interpolation,Anno, + {interpolation_head,AnnoHead,binary,IsDebug,HeadStr}, + Elems1, + {interpolation_tail,AnnoTail,TailStr}}; +expr({interpolation,Anno, + {interpolation_head,AnnoHead,list,IsDebug,HeadStr}, + Elems, + {interpolation_tail,AnnoTail,TailStr}}) -> + Elems1 = + [ case Elem of + {interpolation_cont,AnnoCont,Str} -> {interpolation_cont,AnnoCont,Str}; + {interpolation_subs,AnnoSubs,Expr} -> {interpolation_subs,AnnoSubs,expr(Expr)} + end + || Elem <- Elems ], + {interpolation,Anno, + {interpolation_head,AnnoHead,list,IsDebug,HeadStr}, + Elems1, + {interpolation_tail,AnnoTail,TailStr}}; expr({bin,Anno,Fs}) -> Fs2 = pattern_grp(Fs), {bin,Anno,Fs2}; diff --git a/lib/stdlib/src/Makefile b/lib/stdlib/src/Makefile index e5461728564d..5586cb373512 100644 --- a/lib/stdlib/src/Makefile +++ b/lib/stdlib/src/Makefile @@ -67,6 +67,7 @@ MODULES= \ erl_error \ erl_eval \ erl_expand_records \ + erl_desugar_interpolation \ erl_features \ erl_internal \ erl_lint \ diff --git a/lib/stdlib/src/epp.erl b/lib/stdlib/src/epp.erl index 1f7f614b00ba..f77182fb78a0 100644 --- a/lib/stdlib/src/epp.erl +++ b/lib/stdlib/src/epp.erl @@ -1930,6 +1930,10 @@ token_src({char,_,C}) -> token_src({string, _, X}) -> io_lib:write_string(X); token_src({_, _, X}) -> + io_lib:format("~w", [X]); +token_src({_, _, _, X}) -> + io_lib:format("~w", [X]); +token_src({_, _, _, _, X}) -> io_lib:format("~w", [X]). stringify1([]) -> diff --git a/lib/stdlib/src/erl_desugar_interpolation.erl b/lib/stdlib/src/erl_desugar_interpolation.erl new file mode 100644 index 000000000000..ab6114893588 --- /dev/null +++ b/lib/stdlib/src/erl_desugar_interpolation.erl @@ -0,0 +1,657 @@ +%% Purpose: Expand list-/binary-string interpolation expression syntax into +%% function calls which build the resulting strings. + +-module(erl_desugar_interpolation). + +-export([module/2, expr/1]). + +-spec(module(AbsFormsWithInterpolation, CompileOptions) -> AbsFormsWithoutInterpolation when + AbsFormsWithInterpolation :: [erl_parse:abstract_form()], + AbsFormsWithoutInterpolation :: [erl_parse:abstract_form()], + CompileOptions :: [compile:option()]). + +module(Fs0, _Opts0) -> + forms(Fs0). + +forms([F0|Fs0]) -> + F1 = form(F0), + Fs1 = forms(Fs0), + [F1|Fs1]; +forms([]) -> []. + +form({attribute,_Anno,module,_Mod}=Attr) -> + Attr; +form({attribute,_Anno,file,{_File,_Line}}=Attr) -> + Attr; +form({attribute,_Anno,export,_Es0}=Attr) -> + Attr; +form({attribute,_Anno,import,{_Mod,_Is0}}=Attr) -> + Attr; +form({attribute,_Anno,export_type,_Es0}=Attr) -> + Attr; +form({attribute,_Anno,optional_callbacks,_Es0}=Attr) -> + Attr; +form({attribute,_Anno,compile,_C}=Attr) -> + Attr; +form({attribute,Anno,record,{Name,Defs0}}) -> + Defs1 = record_defs(Defs0), + {attribute,Anno,record,{Name,Defs1}}; +form({attribute,_Anno,asm,{function,_N,_A,_Code}}=Attr) -> + Attr; +form({attribute,Anno,type,{N,T,Vs}}) -> + T1 = type(T), + Vs1 = variable_list(Vs), + {attribute,Anno,type,{N,T1,Vs1}}; +form({attribute,Anno,opaque,{N,T,Vs}}) -> + T1 = type(T), + Vs1 = variable_list(Vs), + {attribute,Anno,opaque,{N,T1,Vs1}}; +form({attribute,Anno,spec,{{N,A},FTs}}) -> + FTs1 = function_type_list(FTs), + {attribute,Anno,spec,{{N,A},FTs1}}; +form({attribute,Anno,spec,{{M,N,A},FTs}}) -> + FTs1 = function_type_list(FTs), + {attribute,Anno,spec,{{M,N,A},FTs1}}; +form({attribute,Anno,callback,{{N,A},FTs}}) -> + FTs1 = function_type_list(FTs), + {attribute,Anno,callback,{{N,A},FTs1}}; +form({attribute,_Anno,_WhichAttr,_Val}=Attr) -> + Attr; +form({function,Anno,Name0,Arity0,Clauses0}) -> + {Name,Arity,Clauses} = function(Name0, Arity0, Clauses0), + {function,Anno,Name,Arity,Clauses}; + +form({error,E}) -> {error,E}; +form({warning,W}) -> {warning,W}; +form({eof,Location}) -> {eof,Location}. + +variable_list([{var,Anno,Var}|Vs]) -> + [{var,Anno,Var}|variable_list(Vs)]; +variable_list([]) -> []. + +record_defs([{record_field,Anno,{atom,Aa,A},Val0}|Is]) -> + Val1 = expr(Val0), + [{record_field,Anno,{atom,Aa,A},Val1}|record_defs(Is)]; +record_defs([{record_field,Anno,{atom,Aa,A}}|Is]) -> + [{record_field,Anno,{atom,Aa,A}}|record_defs(Is)]; +record_defs([{typed_record_field,{record_field,Anno,{atom,Aa,A},Val0},Type}| + Is]) -> + Val1 = expr(Val0), + Type1 = type(Type), + [{typed_record_field,{record_field,Anno,{atom,Aa,A},Val1},Type1}| + record_defs(Is)]; +record_defs([{typed_record_field,{record_field,Anno,{atom,Aa,A}},Type}|Is]) -> + Type1 = type(Type), + [{typed_record_field,{record_field,Anno,{atom,Aa,A}},Type1}| + record_defs(Is)]; +record_defs([]) -> []. + +function(Name, Arity, Clauses0) -> + Clauses1 = clauses(Clauses0), + {Name,Arity,Clauses1}. + +clauses([C0|Cs]) -> + C1 = clause(C0), + [C1|clauses(Cs)]; +clauses([]) -> []. + +clause({clause,Anno,H0,G0,B0}) -> + H1 = head(H0), + G1 = guard(G0), + B1 = exprs(B0), + {clause,Anno,H1,G1,B1}. + +head(Ps) -> patterns(Ps). + +patterns([P0|Ps]) -> + P1 = pattern(P0), + [P1|patterns(Ps)]; +patterns([]) -> []. + +pattern({var,Anno,V}) -> {var,Anno,V}; +pattern({match,Anno,L0,R0}) -> + L1 = pattern(L0), + R1 = pattern(R0), + {match,Anno,L1,R1}; +pattern({integer,Anno,I}) -> {integer,Anno,I}; +pattern({char,Anno,C}) -> {char,Anno,C}; +pattern({float,Anno,F}) -> {float,Anno,F}; +pattern({atom,Anno,A}) -> {atom,Anno,A}; +pattern({string,Anno,S}) -> {string,Anno,S}; +pattern({nil,Anno}) -> {nil,Anno}; +pattern({cons,Anno,H0,T0}) -> + H1 = pattern(H0), + T1 = pattern(T0), + {cons,Anno,H1,T1}; +pattern({tuple,Anno,Ps0}) -> + Ps1 = pattern_list(Ps0), + {tuple,Anno,Ps1}; +pattern({map,Anno,Ps0}) -> + Ps1 = pattern_list(Ps0), + {map,Anno,Ps1}; +pattern({map_field_exact,Anno,K,V}) -> + Ke = expr(K), + Ve = pattern(V), + {map_field_exact,Anno,Ke,Ve}; +pattern({record,Anno,Name,Pfs0}) -> + Pfs1 = pattern_fields(Pfs0), + {record,Anno,Name,Pfs1}; +pattern({record_index,Anno,Name,Field0}) -> + Field1 = pattern(Field0), + {record_index,Anno,Name,Field1}; +pattern({record_field,Anno,Rec0,Name,Field0}) -> + Rec1 = expr(Rec0), + Field1 = expr(Field0), + {record_field,Anno,Rec1,Name,Field1}; +pattern({record_field,Anno,Rec0,Field0}) -> + Rec1 = expr(Rec0), + Field1 = expr(Field0), + {record_field,Anno,Rec1,Field1}; +pattern({bin,Anno,Fs}) -> + Fs2 = pattern_grp(Fs), + {bin,Anno,Fs2}; +pattern({op,Anno,Op,A}) -> + {op,Anno,Op,A}; +pattern({op,Anno,Op,L,R}) -> + {op,Anno,Op,L,R}. + +pattern_grp([{bin_element,Anno,E1,S1,T1} | Fs]) -> + S2 = case S1 of + default -> + default; + _ -> + expr(S1) + end, + T2 = case T1 of + default -> + default; + _ -> + bit_types(T1) + end, + [{bin_element,Anno,expr(E1),S2,T2} | pattern_grp(Fs)]; +pattern_grp([]) -> + []. + +bit_types([]) -> + []; +bit_types([Atom | Rest]) when is_atom(Atom) -> + [Atom | bit_types(Rest)]; +bit_types([{Atom, Integer} | Rest]) when is_atom(Atom), is_integer(Integer) -> + [{Atom, Integer} | bit_types(Rest)]. + +pattern_list([P0|Ps]) -> + P1 = pattern(P0), + [P1|pattern_list(Ps)]; +pattern_list([]) -> []. + +pattern_fields([{record_field,Af,{atom,Aa,F},P0}|Pfs]) -> + P1 = pattern(P0), + [{record_field,Af,{atom,Aa,F},P1}|pattern_fields(Pfs)]; +pattern_fields([{record_field,Af,{var,Aa,'_'},P0}|Pfs]) -> + P1 = pattern(P0), + [{record_field,Af,{var,Aa,'_'},P1}|pattern_fields(Pfs)]; +pattern_fields([]) -> []. + +guard([G0|Gs]) when is_list(G0) -> + [guard0(G0) | guard(Gs)]; +guard(L) -> + guard0(L). + +guard0([G0|Gs]) -> + G1 = guard_test(G0), + [G1|guard0(Gs)]; +guard0([]) -> []. + +guard_test(Expr={call,Anno,{atom,Aa,F},As0}) -> + case erl_internal:type_test(F, length(As0)) of + true -> + As1 = gexpr_list(As0), + {call,Anno,{atom,Aa,F},As1}; + _ -> + gexpr(Expr) + end; +guard_test(Any) -> + gexpr(Any). + +gexpr({var,Anno,V}) -> {var,Anno,V}; +gexpr({integer,Anno,I}) -> {integer,Anno,I}; +gexpr({char,Anno,C}) -> {char,Anno,C}; +gexpr({float,Anno,F}) -> {float,Anno,F}; +gexpr({atom,Anno,A}) -> {atom,Anno,A}; +gexpr({string,Anno,S}) -> {string,Anno,S}; +gexpr({nil,Anno}) -> {nil,Anno}; +gexpr({map,Anno,Map0,Es0}) -> + [Map1|Es1] = gexpr_list([Map0|Es0]), + {map,Anno,Map1,Es1}; +gexpr({map,Anno,Es0}) -> + Es1 = gexpr_list(Es0), + {map,Anno,Es1}; +gexpr({map_field_assoc,Anno,K,V}) -> + Ke = gexpr(K), + Ve = gexpr(V), + {map_field_assoc,Anno,Ke,Ve}; +gexpr({map_field_exact,Anno,K,V}) -> + Ke = gexpr(K), + Ve = gexpr(V), + {map_field_exact,Anno,Ke,Ve}; +gexpr({cons,Anno,H0,T0}) -> + H1 = gexpr(H0), + T1 = gexpr(T0), %They see the same variables + {cons,Anno,H1,T1}; +gexpr({tuple,Anno,Es0}) -> + Es1 = gexpr_list(Es0), + {tuple,Anno,Es1}; +gexpr({record_index,Anno,Name,Field0}) -> + Field1 = gexpr(Field0), + {record_index,Anno,Name,Field1}; +gexpr({record_field,Anno,Rec0,Name,Field0}) -> + Rec1 = gexpr(Rec0), + Field1 = gexpr(Field0), + {record_field,Anno,Rec1,Name,Field1}; +gexpr({record,Anno,Name,Inits0}) -> + Inits1 = grecord_inits(Inits0), + {record,Anno,Name,Inits1}; +gexpr({call,Anno,{atom,Aa,F},As0}) -> + case erl_internal:guard_bif(F, length(As0)) of + true -> As1 = gexpr_list(As0), + {call,Anno,{atom,Aa,F},As1} + end; +gexpr({call,Anno,{remote,Aa,{atom,Ab,erlang},{atom,Ac,F}},As0}) -> + case erl_internal:guard_bif(F, length(As0)) or + erl_internal:arith_op(F, length(As0)) or + erl_internal:comp_op(F, length(As0)) or + erl_internal:bool_op(F, length(As0)) of + true -> As1 = gexpr_list(As0), + {call,Anno,{remote,Aa,{atom,Ab,erlang},{atom,Ac,F}},As1} + end; +gexpr({bin,Anno,Fs}) -> + Fs2 = pattern_grp(Fs), + {bin,Anno,Fs2}; +gexpr({op,Anno,Op,A0}) -> + case erl_internal:arith_op(Op, 1) or + erl_internal:bool_op(Op, 1) of + true -> A1 = gexpr(A0), + {op,Anno,Op,A1} + end; +gexpr({op,Anno,Op,L0,R0}) when Op =:= 'andalso'; Op =:= 'orelse' -> + L1 = gexpr(L0), + R1 = gexpr(R0), %They see the same variables + {op,Anno,Op,L1,R1}; +gexpr({op,Anno,Op,L0,R0}) -> + case erl_internal:arith_op(Op, 2) or + erl_internal:bool_op(Op, 2) or + erl_internal:comp_op(Op, 2) of + true -> + L1 = gexpr(L0), + R1 = gexpr(R0), %They see the same variables + {op,Anno,Op,L1,R1} + end. + +gexpr_list([E0|Es]) -> + E1 = gexpr(E0), + [E1|gexpr_list(Es)]; +gexpr_list([]) -> []. + +grecord_inits([{record_field,Af,{atom,Aa,F},Val0}|Is]) -> + Val1 = gexpr(Val0), + [{record_field,Af,{atom,Aa,F},Val1}|grecord_inits(Is)]; +grecord_inits([{record_field,Af,{var,Aa,'_'},Val0}|Is]) -> + Val1 = gexpr(Val0), + [{record_field,Af,{var,Aa,'_'},Val1}|grecord_inits(Is)]; +grecord_inits([]) -> []. + +exprs([E0|Es]) -> + E1 = expr(E0), + [E1|exprs(Es)]; +exprs([]) -> []. + +expr({var,Anno,V}) -> {var,Anno,V}; +expr({integer,Anno,I}) -> {integer,Anno,I}; +expr({float,Anno,F}) -> {float,Anno,F}; +expr({atom,Anno,A}) -> {atom,Anno,A}; +expr({string,Anno,S}) -> {string,Anno,S}; +expr({char,Anno,C}) -> {char,Anno,C}; +expr({nil,Anno}) -> {nil,Anno}; +expr({cons,Anno,H0,T0}) -> + H1 = expr(H0), + T1 = expr(T0), %They see the same variables + {cons,Anno,H1,T1}; +expr({lc,Anno,E0,Qs0}) -> + Qs1 = comprehension_quals(Qs0), + E1 = expr(E0), + {lc,Anno,E1,Qs1}; +expr({bc,Anno,E0,Qs0}) -> + Qs1 = comprehension_quals(Qs0), + E1 = expr(E0), + {bc,Anno,E1,Qs1}; +expr({mc,Anno,E0,Qs0}) -> + Qs1 = comprehension_quals(Qs0), + E1 = expr(E0), + {mc,Anno,E1,Qs1}; +expr({tuple,Anno,Es0}) -> + Es1 = expr_list(Es0), + {tuple,Anno,Es1}; +expr({map,Anno,Map0,Es0}) -> + [Map1|Es1] = exprs([Map0|Es0]), + {map,Anno,Map1,Es1}; +expr({map,Anno,Es0}) -> + Es1 = exprs(Es0), + {map,Anno,Es1}; +expr({map_field_assoc,Anno,K,V}) -> + Ke = expr(K), + Ve = expr(V), + {map_field_assoc,Anno,Ke,Ve}; +expr({map_field_exact,Anno,K,V}) -> + Ke = expr(K), + Ve = expr(V), + {map_field_exact,Anno,Ke,Ve}; +expr({record_index,Anno,Name,Field0}) -> + Field1 = expr(Field0), + {record_index,Anno,Name,Field1}; +expr({record,Anno,Name,Inits0}) -> + Inits1 = record_inits(Inits0), + {record,Anno,Name,Inits1}; +expr({record_field,Anno,Rec0,Name,Field0}) -> + Rec1 = expr(Rec0), + Field1 = expr(Field0), + {record_field,Anno,Rec1,Name,Field1}; +expr({record,Anno,Rec0,Name,Upds0}) -> + Rec1 = expr(Rec0), + Upds1 = record_updates(Upds0), + {record,Anno,Rec1,Name,Upds1}; +expr({record_field,Anno,Rec0,Field0}) -> + Rec1 = expr(Rec0), + Field1 = expr(Field0), + {record_field,Anno,Rec1,Field1}; +expr({block,Anno,Es0}) -> + Es1 = exprs(Es0), + {block,Anno,Es1}; +expr({'if',Anno,Cs0}) -> + Cs1 = icr_clauses(Cs0), + {'if',Anno,Cs1}; +expr({'case',Anno,E0,Cs0}) -> + E1 = expr(E0), + Cs1 = icr_clauses(Cs0), + {'case',Anno,E1,Cs1}; +expr({'receive',Anno,Cs0}) -> + Cs1 = icr_clauses(Cs0), + {'receive',Anno,Cs1}; +expr({'receive',Anno,Cs0,To0,ToEs0}) -> + To1 = expr(To0), + ToEs1 = exprs(ToEs0), + Cs1 = icr_clauses(Cs0), + {'receive',Anno,Cs1,To1,ToEs1}; +expr({'try',Anno,Es0,Scs0,Ccs0,As0}) -> + Es1 = exprs(Es0), + Scs1 = icr_clauses(Scs0), + Ccs1 = icr_clauses(Ccs0), + As1 = exprs(As0), + {'try',Anno,Es1,Scs1,Ccs1,As1}; +expr({'fun',Anno,Body}) -> + case Body of + {clauses,Cs0} -> + Cs1 = fun_clauses(Cs0), + {'fun',Anno,{clauses,Cs1}}; + {function,F,A} -> + {'fun',Anno,{function,F,A}}; + {function,M0,F0,A0} -> + M = expr(M0), + F = expr(F0), + A = expr(A0), + {'fun',Anno,{function,M,F,A}} + end; +expr({named_fun,Anno,Name,Cs}) -> + {named_fun,Anno,Name,fun_clauses(Cs)}; +expr({call,Anno,F0,As0}) -> + F1 = expr(F0), + As1 = expr_list(As0), + {call,Anno,F1,As1}; +expr({'catch',Anno,E0}) -> + E1 = expr(E0), + {'catch',Anno,E1}; +expr({'maybe',MaybeAnno,Es0}) -> + Es = exprs(Es0), + {'maybe',MaybeAnno,Es}; +expr({'maybe',MaybeAnno,Es0,{'else',ElseAnno,Cs0}}) -> + Es = exprs(Es0), + Cs = clauses(Cs0), + {'maybe',MaybeAnno,Es,{'else',ElseAnno,Cs}}; +expr({maybe_match,Anno,P0,E0}) -> + E = expr(E0), + P = pattern(P0), + {maybe_match,Anno,P,E}; +expr({match,Anno,P0,E0}) -> + E1 = expr(E0), + P1 = pattern(P0), + {match,Anno,P1,E1}; + +expr({interpolation_no_subs,Anno,binary,_IsDebug,Str}) -> + S = {string, Anno, Str}, + {bin, Anno, [{bin_element,Anno,S,default,[utf8]}]}; +expr({interpolation_no_subs,Anno,list,_IsDebug,Str}) -> + {string, Anno, Str}; +expr({interpolation,Anno, + {interpolation_head,_,binary,IsDebug,HeadStr}, + Conts, + {interpolation_tail,_,TailStr}}) -> + DesugaredConts = desugar_binary_string_interpolation_conts(Conts, IsDebug), + HeadBin = {bin,Anno,[{bin_element,Anno,{string, Anno, HeadStr},default,[utf8]}]}, + TailBin = {bin,Anno,[{bin_element,Anno,{string, Anno, TailStr},default,[utf8]}]}, + BinComponents = [HeadBin] ++ (DesugaredConts ++ [TailBin]), + BinComponentListExpr = mk_list_expr(BinComponents, Anno), + {call,Anno, + {remote,Anno,{atom,Anno,erlang},{atom,Anno,list_to_binary}}, + [BinComponentListExpr]}; +expr({interpolation,Anno, + {interpolation_head,AnnoHead,list,IsDebug,HeadStr}, + Conts, + {interpolation_tail,AnnoTail,TailStr}}) -> + ListComponents = desugar_list_string_interpolation_conts(Conts, IsDebug), + mk_list_expr( + [{string,AnnoHead,HeadStr}] ++ + (ListComponents ++ + [{string,AnnoTail,TailStr}]), + Anno); + +expr({bin,Anno,Fs}) -> + Fs2 = pattern_grp(Fs), + {bin,Anno,Fs2}; +expr({op,Anno,Op,A0}) -> + A1 = expr(A0), + {op,Anno,Op,A1}; +expr({op,Anno,Op,L0,R0}) -> + L1 = expr(L0), + R1 = expr(R0), %They see the same variables + {op,Anno,Op,L1,R1}; +%% The following are not allowed to occur anywhere! +expr({remote,Anno,M0,F0}) -> + M1 = expr(M0), + F1 = expr(F0), + {remote,Anno,M1,F1}. + +desugar_list_string_interpolation_conts(Conts, IsDebug) -> + [desugar_list_string_interpolation_cont(Cont, IsDebug) || Cont <- Conts]. + +desugar_list_string_interpolation_cont({interpolation_cont,Anno,Str}, _IsDebug) -> + {string, Anno, Str}; +desugar_list_string_interpolation_cont({interpolation_subs,Anno,Expr}, _IsDebug=true) -> + Opts = + {cons, Anno, + {tuple, Anno, [ + {atom, Anno, encoding}, + {atom, Anno, unicode} + ]}, + {nil, Anno} + }, + {call,Anno, + {remote,Anno,{atom,Anno,io_lib},{atom,Anno,write}}, + [expr(Expr), Opts]}; +desugar_list_string_interpolation_cont({interpolation_subs,Anno,Expr}, _IsDebug=false) -> + {call,Anno, + {remote,Anno,{atom,Anno,io_lib},{atom,Anno,write_natural}}, + [expr(Expr)]}. + +desugar_binary_string_interpolation_conts(Conts, IsDebug) -> + [desugar_binary_string_interpolation_cont(Cont, IsDebug) || Cont <- Conts]. + +desugar_binary_string_interpolation_cont({interpolation_cont,Anno,Str}, _IsDebug) -> + S = {string, Anno, Str}, + {bin, Anno, [{bin_element,Anno,S,default,[utf8]}]}; +desugar_binary_string_interpolation_cont({interpolation_subs,Anno,Expr}, _IsDebug=true) -> + {call,Anno, + {remote,Anno,{atom,Anno,io_lib},{atom,Anno,write_bin}}, + [expr(Expr)]}; +desugar_binary_string_interpolation_cont({interpolation_subs,Anno,Expr}, _IsDebug=false) -> + {call,Anno, + {remote,Anno,{atom,Anno,io_lib},{atom,Anno,write_bin_natural}}, + [expr(Expr)]}. + +expr_list([E0|Es]) -> + E1 = expr(E0), + [E1|expr_list(Es)]; +expr_list([]) -> []. + +record_inits([{record_field,Af,{atom,Aa,F},Val0}|Is]) -> + Val1 = expr(Val0), + [{record_field,Af,{atom,Aa,F},Val1}|record_inits(Is)]; +record_inits([{record_field,Af,{var,Aa,'_'},Val0}|Is]) -> + Val1 = expr(Val0), + [{record_field,Af,{var,Aa,'_'},Val1}|record_inits(Is)]; +record_inits([]) -> []. + +record_updates([{record_field,Af,{atom,Aa,F},Val0}|Us]) -> + Val1 = expr(Val0), + [{record_field,Af,{atom,Aa,F},Val1}|record_updates(Us)]; +record_updates([]) -> []. + +icr_clauses([C0|Cs]) -> + C1 = clause(C0), + [C1|icr_clauses(Cs)]; +icr_clauses([]) -> []. + +comprehension_quals([{generate,Anno,P0,E0}|Qs]) -> + E1 = expr(E0), + P1 = pattern(P0), + [{generate,Anno,P1,E1}|comprehension_quals(Qs)]; +comprehension_quals([{b_generate,Anno,P0,E0}|Qs]) -> + E1 = expr(E0), + P1 = pattern(P0), + [{b_generate,Anno,P1,E1}|comprehension_quals(Qs)]; +comprehension_quals([{m_generate,Anno,P0,E0}|Qs]) -> + E1 = expr(E0), + P1 = pattern(P0), + [{m_generate,Anno,P1,E1}|comprehension_quals(Qs)]; +comprehension_quals([E0|Qs]) -> + E1 = expr(E0), + [E1|comprehension_quals(Qs)]; +comprehension_quals([]) -> []. + +fun_clauses([C0|Cs]) -> + C1 = clause(C0), + [C1|fun_clauses(Cs)]; +fun_clauses([]) -> []. + +function_type_list([{type,Anno,bounded_fun,[Ft,Fc]}|Fts]) -> + Ft1 = function_type(Ft), + Fc1 = function_constraint(Fc), + [{type,Anno,bounded_fun,[Ft1,Fc1]}|function_type_list(Fts)]; +function_type_list([Ft|Fts]) -> + [function_type(Ft)|function_type_list(Fts)]; +function_type_list([]) -> []. + +function_type({type,Anno,'fun',[{type,At,product,As},B]}) -> + As1 = type_list(As), + B1 = type(B), + {type,Anno,'fun',[{type,At,product,As1},B1]}. + +function_constraint([C|Cs]) -> + C1 = constraint(C), + [C1|function_constraint(Cs)]; +function_constraint([]) -> []. + +constraint({type,Anno,constraint,[{atom,Annoa,A},[V,T]]}) -> + V1 = type(V), + T1 = type(T), + {type,Anno,constraint,[{atom,Annoa,A},[V1,T1]]}. + +type({ann_type,Anno,[{var,Av,V},T]}) -> + T1 = type(T), + {ann_type,Anno,[{var,Av,V},T1]}; +type({atom,Anno,A}) -> + {atom,Anno,A}; +type({integer,Anno,I}) -> + {integer,Anno,I}; +type({char,Anno,C}) -> + {char,Anno,C}; +type({op,Anno,Op,T}) -> + T1 = type(T), + {op,Anno,Op,T1}; +type({op,Anno,Op,L,R}) -> + L1 = type(L), + R1 = type(R), + {op,Anno,Op,L1,R1}; +type({type,Anno,binary,[M,N]}) -> + M1 = type(M), + N1 = type(N), + {type,Anno,binary,[M1,N1]}; +type({type,Anno,'fun',[]}) -> + {type,Anno,'fun',[]}; +type({type,Anno,'fun',[{type,At,any},B]}) -> + B1 = type(B), + {type,Anno,'fun',[{type,At,any},B1]}; +type({type,Anno,range,[L,H]}) -> + L1 = type(L), + H1 = type(H), + {type,Anno,range,[L1,H1]}; +type({type,Anno,map,any}) -> + {type,Anno,map,any}; +type({type,Anno,map,Ps}) -> + Ps1 = map_pair_types(Ps), + {type,Anno,map,Ps1}; +type({type,Anno,record,[{atom,Aa,N}|Fs]}) -> + Fs1 = field_types(Fs), + {type,Anno,record,[{atom,Aa,N}|Fs1]}; +type({remote_type,Anno,[{atom,Am,M},{atom,An,N},As]}) -> + As1 = type_list(As), + {remote_type,Anno,[{atom,Am,M},{atom,An,N},As1]}; +type({type,Anno,tuple,any}) -> + {type,Anno,tuple,any}; +type({type,Anno,tuple,Ts}) -> + Ts1 = type_list(Ts), + {type,Anno,tuple,Ts1}; +type({type,Anno,union,Ts}) -> + Ts1 = type_list(Ts), + {type,Anno,union,Ts1}; +type({var,Anno,V}) -> + {var,Anno,V}; +type({user_type,Anno,N,As}) -> + As1 = type_list(As), + {user_type,Anno,N,As1}; +type({type,Anno,N,As}) -> + As1 = type_list(As), + {type,Anno,N,As1}. + +map_pair_types([{type,Anno,map_field_assoc,[K,V]}|Ps]) -> + K1 = type(K), + V1 = type(V), + [{type,Anno,map_field_assoc,[K1,V1]}|map_pair_types(Ps)]; +map_pair_types([{type,Anno,map_field_exact,[K,V]}|Ps]) -> + K1 = type(K), + V1 = type(V), + [{type,Anno,map_field_exact,[K1,V1]}|map_pair_types(Ps)]; +map_pair_types([]) -> []. + +field_types([{type,Anno,field_type,[{atom,Aa,A},T]}|Fs]) -> + T1 = type(T), + [{type,Anno,field_type,[{atom,Aa,A},T1]}|field_types(Fs)]; +field_types([]) -> []. + +type_list([T|Ts]) -> + T1 = type(T), + [T1|type_list(Ts)]; +type_list([]) -> []. + +mk_list_expr([], Anno) -> + {nil, Anno}; +mk_list_expr([Hd|Tl], Anno) -> + {cons, Anno, Hd, mk_list_expr(Tl, Anno)}. diff --git a/lib/stdlib/src/erl_eval.erl b/lib/stdlib/src/erl_eval.erl index f88cba1ba3da..c4eef33c819c 100644 --- a/lib/stdlib/src/erl_eval.erl +++ b/lib/stdlib/src/erl_eval.erl @@ -512,6 +512,12 @@ expr({'maybe',Anno,Es,{'else',_,Cs}}, Bs0, Lf, Ef, RBs, FUVs) -> apply_error({else_clause,Val}, ?STACKTRACE, Anno, Bs0, Ef, RBs) end end; +expr({'interpolation',_,_,_,_}=Interpolation, Bs, Lf, Ef, RBs, FUVs) -> + DesugaredInterpolation = erl_desugar_interpolation:expr(Interpolation), + expr(DesugaredInterpolation, Bs, Lf, Ef, RBs, FUVs); +expr({'interpolation_no_subs',_,_,_}=Interpolation, Bs, Lf, Ef, RBs, FUVs) -> + DesugaredInterpolation = erl_desugar_interpolation:expr(Interpolation), + expr(DesugaredInterpolation, Bs, Lf, Ef, RBs, FUVs); expr({op,Anno,Op,A0}, Bs0, Lf, Ef, RBs, FUVs) -> {value,A,Bs} = expr(A0, Bs0, Lf, Ef, none, FUVs), eval_op(Op, A, Anno, Bs, Ef, RBs); diff --git a/lib/stdlib/src/erl_lint.erl b/lib/stdlib/src/erl_lint.erl index f8d629c02c85..483d5638656b 100644 --- a/lib/stdlib/src/erl_lint.erl +++ b/lib/stdlib/src/erl_lint.erl @@ -2553,6 +2553,10 @@ expr({'maybe',MaybeAnno,Es,{'else',ElseAnno,Cs}}, Vt, St) -> Evt2 = vtmerge(Evt0, Evt1), Cvt2 = vtmerge(Cvt0, Cvt1), {vtmerge(Evt2, Cvt2),St2}; +expr({'interpolation',_Anno,_Head,Conts,_Tail}, Vt, St) -> + expr_list([SubsExpr || {interpolation_subs,_,SubsExpr} <- Conts], Vt, St); +expr({'interpolation_no_subs',_Anno,_Kind,_IsDebug,_Str}, _Vt, St) -> + {[],St}; %% No comparison or boolean operators yet. expr({op,_Anno,_Op,A}, Vt, St) -> expr(A, Vt, St); diff --git a/lib/stdlib/src/erl_parse.yrl b/lib/stdlib/src/erl_parse.yrl index 02adb842e928..ff27ce3edff4 100644 --- a/lib/stdlib/src/erl_parse.yrl +++ b/lib/stdlib/src/erl_parse.yrl @@ -41,7 +41,7 @@ fun_expr fun_clause fun_clauses atom_or_var integer_or_var try_expr try_catch try_clause try_clauses try_opt_stacktrace function_call argument_list exprs guard -atomic strings +atomic strings interpolation interpolation_elements prefix_op mult_op add_op list_op comp_op binary bin_elements bin_element bit_expr opt_bit_size_expr bit_size_expr opt_bit_type_list bit_type_list bit_type @@ -79,6 +79,8 @@ ssa_check_when_clauses. Terminals char integer float atom string var +interpolation_no_subs interpolation_head interpolation_cont interpolation_tail + '(' ')' ',' '->' '{' '}' '[' ']' '|' '||' '<-' ';' ':' '#' '.' 'after' 'begin' 'case' 'try' 'catch' 'end' 'fun' 'if' 'of' 'receive' 'when' 'maybe' 'else' @@ -105,6 +107,7 @@ Unary 0 'catch'. Right 100 '=' '!'. Right 150 'orelse'. Right 160 'andalso'. +Left 180 interpolation_cont. Nonassoc 200 comp_op. Right 300 list_op. Left 400 add_op. @@ -288,6 +291,7 @@ expr_max -> receive_expr : '$1'. expr_max -> fun_expr : '$1'. expr_max -> try_expr : '$1'. expr_max -> maybe_expr : '$1'. +expr_max -> interpolation : '$1'. pat_expr -> pat_expr '=' pat_expr : {match,first_anno('$1'),'$1','$3'}. pat_expr -> pat_expr comp_op pat_expr : ?mkop2('$1', '$2', '$3'). @@ -521,6 +525,12 @@ maybe_match_exprs -> expr ',' maybe_match_exprs : ['$1' | '$3']. maybe_match -> expr '?=' expr : {maybe_match,?anno('$2'),'$1','$3'}. +interpolation -> interpolation_no_subs : '$1'. +interpolation -> interpolation_head interpolation_elements interpolation_tail : {'interpolation', ?anno('$1'), '$1', '$2', '$3'}. + +interpolation_elements -> expr : [{interpolation_subs, ?anno('$1'), '$1'}]. +interpolation_elements -> interpolation_elements interpolation_cont interpolation_elements : ('$1' ++ ['$2'] ++ '$3'). + argument_list -> '(' ')' : {[],?anno('$1')}. argument_list -> '(' exprs ')' : {'$2',?anno('$1')}. @@ -850,7 +860,8 @@ Erlang code. | af_fun() | af_named_fun() | af_maybe() - | af_maybe_else(). + | af_maybe_else() + | af_interpolation(). -type af_record_update(T) :: {'record', anno(), @@ -1000,6 +1011,21 @@ Erlang code. -type af_maybe() :: {'maybe', anno(), af_body()}. -type af_maybe_else() :: {'maybe', anno(), af_body(), {'else', anno(), af_clause_seq()}}. +-type af_interpolation_kind() :: binary | list. + +-type af_interpolation_debug() :: boolean(). + +-type af_interpolation_head() :: {'interpolation_head', anno(), af_interpolation_kind(), af_interpolation_debug(), string()}. + +-type af_interpolation_cont() :: {'interpolation_cont', anno(), string()}. + +-type af_interpolation_subs() :: {'interpolation_subs', anno(), abstract_expr()}. + +-type af_interpolation_tail() :: {'interpolation_tail', anno(), string()}. + +-type af_interpolation() :: {'interpolation', anno(), af_interpolation_head(), [ af_interpolation_cont() | af_interpolation_subs() ], af_interpolation_tail()} + | {'interpolation_no_subs', anno(), af_interpolation_kind(), af_interpolation_debug(), string()}. + -type abstract_type() :: af_annotated_type() | af_atom() | af_bitstring_type() diff --git a/lib/stdlib/src/erl_scan.erl b/lib/stdlib/src/erl_scan.erl index b7975c6ed232..2e78ad79e125 100644 --- a/lib/stdlib/src/erl_scan.erl +++ b/lib/stdlib/src/erl_scan.erl @@ -102,17 +102,28 @@ -type error_description() :: term(). -type error_info() :: {erl_anno:location(), module(), error_description()}. +%% Whether the interpolation should evaluate to a list or a binary +-type interpolation_kind() :: binary | list. + +%% For a particular interpolation, whether it should evaluate to a list +%% or a binary, and whether it is a debug format or not (affects what +%% types of values can be formatted, and how they are rendered) +-type interpolation_state() :: {interpolation_kind(), Debug :: boolean()}. + %%% Local record. -record(erl_scan, - {resword_fun = fun reserved_word/1 :: resword_fun(), - text_fun = fun(_, _) -> false end :: text_fun(), - ws = false :: boolean(), - comment = false :: boolean(), - has_fun = false :: boolean(), + {resword_fun = fun reserved_word/1 :: resword_fun(), + text_fun = fun(_, _) -> false end :: text_fun(), + ws = false :: boolean(), + comment = false :: boolean(), + has_fun = false :: boolean(), %% True if requested to parse %ssa%-check comments - checks = false :: boolean(), + checks = false :: boolean(), %% True if we're scanning inside a %ssa%-check comment - in_check = false :: boolean()}). + in_check = false :: boolean(), + %% A stack of interpolation states, rather than just one, to + %% allow nesting of interpolations + interpolation_states = [] :: [interpolation_state()]}). %%---------------------------------------------------------------------------- @@ -407,6 +418,42 @@ scan1([$\%|Cs], St, Line, Col, Toks) when not St#erl_scan.comment -> scan1([$\%=C|Cs], St, Line, Col, Toks) -> scan_comment(Cs, St, Line, Col, Toks, [C]); %% More punctuation characters below. +%% Interpolated expression in a string +scan1([$~|Cs], #erl_scan{interpolation_states = [_|_]}=St, Line, Col, Toks) -> + State0 = {true,[],[],Line,Col}, + scan_interpolation(Cs, St, Line, incr_column(Col, 1), Toks, State0); +scan1("bf\""++Cs, St, Line, Col, Toks) -> + InterpolationSt = [{binary, false}|St#erl_scan.interpolation_states], + St1 = St#erl_scan{interpolation_states = InterpolationSt}, + State0 = {false,[],[],Line,Col}, + scan_interpolation(Cs, St1, Line, incr_column(Col, 3), Toks, State0); +scan1("lf\""++Cs, St, Line, Col, Toks) -> + InterpolationSt = [{list, false}|St#erl_scan.interpolation_states], + St1 = St#erl_scan{interpolation_states = InterpolationSt}, + State0 = {false,[],[],Line,Col}, + scan_interpolation(Cs, St1, Line, incr_column(Col, 3), Toks, State0); +scan1("bd\""++Cs, St, Line, Col, Toks) -> + InterpolationSt = [{binary, true}|St#erl_scan.interpolation_states], + St1 = St#erl_scan{interpolation_states = InterpolationSt}, + State0 = {false,[],[],Line,Col}, + scan_interpolation(Cs, St1, Line, incr_column(Col, 3), Toks, State0); +scan1("ld\""++Cs, St, Line, Col, Toks) -> + InterpolationSt = [{list, true}|St#erl_scan.interpolation_states], + St1 = St#erl_scan{interpolation_states = InterpolationSt}, + State0 = {false,[],[],Line,Col}, + scan_interpolation(Cs, St1, Line, incr_column(Col, 3), Toks, State0); +scan1("bf"=Cs, St, Line, Col, Toks) -> + {more,{Cs,St,Col,Toks,Line,[],fun scan/6}}; +scan1("lf"=Cs, St, Line, Col, Toks) -> + {more,{Cs,St,Col,Toks,Line,[],fun scan/6}}; +scan1("bd"=Cs, St, Line, Col, Toks) -> + {more,{Cs,St,Col,Toks,Line,[],fun scan/6}}; +scan1("ld"=Cs, St, Line, Col, Toks) -> + {more,{Cs,St,Col,Toks,Line,[],fun scan/6}}; +scan1("b"=Cs, St, Line, Col, Toks) -> + {more,{Cs,St,Col,Toks,Line,[],fun scan/6}}; +scan1("l"=Cs, St, Line, Col, Toks) -> + {more,{Cs,St,Col,Toks,Line,[],fun scan/6}}; scan1([C|_], _St, _Line, _Col0, _Toks) when not ?CHAR(C) -> error({not_character,C}); scan1([C|Cs], St, Line, Col, Toks) when C >= $A, C =< $Z -> @@ -828,6 +875,41 @@ scan_char([], St, Line, Col, Toks) -> scan_char(eof, _St, Line, Col, _Toks) -> scan_error(char, Line, Col, Line, incr_column(Col, 1), eof). +scan_interpolation(Cs, #erl_scan{}=St, Line, Col, Toks,{HasSubs,Wcs,Str,Line0,Col0}) -> + case scan_interpolation1(Cs, Line, Col, Str, Wcs) of + {more,Ncs,Nline,Ncol,Nstr,Nwcs} -> + State = {HasSubs,Nwcs,Nstr,Line0,Col0}, + {more,{Ncs,St,Ncol,Toks,Nline,State,fun scan_interpolation/6}}; + {char_error,Ncs,Error,Nline,Ncol,EndCol} -> + scan_error(Error, Nline, Ncol, Nline, EndCol, Ncs); + {error,Nline,Ncol,Nwcs,Ncs} -> + Estr = string:slice(Nwcs, 0, 16), % Expanded escape chars. + scan_error({string,$\",Estr}, Line0, Col0, Nline, Ncol, Ncs); + {terminated,Ncs,Nline,Ncol,_Nstr,Nwcs} -> + [{Kind,IsDebug}=_TerminatedInterpolation | OuterInterpolations] = + St#erl_scan.interpolation_states, + St1 = St#erl_scan{interpolation_states=OuterInterpolations}, + case HasSubs of + false -> + Anno = anno({Line0, Col0}), + scan1(Ncs, St1, Nline, Ncol, [{interpolation_no_subs,Anno,Kind,IsDebug,Nwcs}|Toks]); + true -> + Anno = anno({Line0, Col0}), + scan1(Ncs, St1, Nline, Ncol, [{interpolation_tail,Anno,Nwcs}|Toks]) + end; + {start_subs,Ncs,Nline,Ncol,_Nstr,Nwcs} -> + [{Kind,IsDebug}=_CurrentInterpolation | _OuterInterpolations] = + St#erl_scan.interpolation_states, + case HasSubs of + false -> + Anno = anno({Line0, Col0}), + scan1(Ncs, St, Nline, Ncol, [{interpolation_head,Anno,Kind,IsDebug,Nwcs}|Toks]); + true -> + Anno = anno({Line0, Col0}), + scan1(Ncs, St, Nline, Ncol, [{interpolation_cont,Anno,Nwcs}|Toks]) + end + end. + scan_string(Cs, #erl_scan{}=St, Line, Col, Toks, {Wcs,Str,Line0,Col0}) -> case scan_string0(Cs, St, Line, Col, $\", Str, Wcs) of %" {more,Ncs,Nline,Ncol,Nstr,Nwcs} -> @@ -935,6 +1017,45 @@ scan_string1([]=Cs, Line, Col, _Q, Str, Wcs) -> scan_string1(eof, Line, Col, _Q, _Str, Wcs) -> {error,Line,Col,lists:reverse(Wcs),eof}. +scan_interpolation1([$"|Cs], Line, Col, Str0, Wcs0) -> + Wcs = lists:reverse(Wcs0), + Str = lists:reverse(Str0), + {terminated, Cs,Line,incr_column(Col, 1),Str,Wcs}; +scan_interpolation1([$~|Cs], Line, Col, Str0, Wcs0) -> + Wcs = lists:reverse(Wcs0), + Str = lists:reverse(Str0), + {start_subs, Cs,Line,incr_column(Col, 1),Str,Wcs}; +scan_interpolation1([$\n=C|Cs], Line, Col, Str, Wcs) -> + Ncol = new_column(Col, 1), + scan_interpolation1(Cs, Line+1, Ncol, [C|Str], [C|Wcs]); +scan_interpolation1([$\\|Cs]=Cs0, Line, Col, Str, Wcs) -> + case scan_escape(Cs, Col) of + more -> + {more,Cs0,Line,Col,Str,Wcs}; + {error,Ncs,Error,Ncol} -> + {char_error,Ncs,Error,Line,Col,incr_column(Ncol, 1)}; + {eof,Ncol} -> + {error,Line,incr_column(Ncol, 1),lists:reverse(Wcs),eof}; + {nl,Val,ValStr,Ncs,Ncol} -> + Nstr = lists:reverse(ValStr, [$\\|Str]), + Nwcs = [Val|Wcs], + scan_interpolation1(Ncs, Line+1, Ncol, Nstr, Nwcs); + {Val,ValStr,Ncs,Ncol} -> + Nstr = lists:reverse(ValStr, [$\\|Str]), + Nwcs = [Val|Wcs], + scan_interpolation1(Ncs, Line, incr_column(Ncol, 1), Nstr, Nwcs) + end; +scan_interpolation1([C|Cs], Line, no_col=Col, Str, Wcs) when ?UNICODE(C) -> + scan_interpolation1(Cs, Line, Col, [C|Str], [C|Wcs]); +scan_interpolation1([C|Cs], Line, Col, Str, Wcs) when ?UNICODE(C) -> + scan_interpolation1(Cs, Line, Col+1, [C|Str], [C|Wcs]); +scan_interpolation1([C|Cs], Line, Col, _Str, _Wcs) when ?CHAR(C) -> + {char_error,Cs,{illegal,character},Line,Col,incr_column(Col, 1)}; +scan_interpolation1([]=Cs, Line, Col, Str, Wcs) -> + {more,Cs,Line,Col,Str,Wcs}; +scan_interpolation1(eof, Line, Col, _Str, Wcs) -> + {error,Line,Col,lists:reverse(Wcs),eof}. + -define(OCT(C), (is_integer(C) andalso $0 =< C andalso C =< $7)). -define(HEX(C), (is_integer(C) andalso (C >= $0 andalso C =< $9 orelse diff --git a/lib/stdlib/src/io_lib.erl b/lib/stdlib/src/io_lib.erl index 5f45165968f2..393c2ff0db1b 100644 --- a/lib/stdlib/src/io_lib.erl +++ b/lib/stdlib/src/io_lib.erl @@ -65,7 +65,9 @@ -export([print/1,print/4,indentation/2]). -export([write/1,write/2,write/3,nl/0,format_prompt/1,format_prompt/2]). +-export([write_bin/1,write_bin/2,write_bin/3,write_bin/4]). -export([write_binary/3]). +-export([write_natural/1, write_bin_natural/1]). -export([write_atom/1,write_string/1,write_string/2,write_latin1_string/1, write_latin1_string/2, write_char/1, write_latin1_char/1]). @@ -394,6 +396,146 @@ write1(T, D, E, O) when is_tuple(T) -> $}] end. +%% For printing terms with a natural string representation to a +%% unicode codepoint list string. +-spec write_natural(Term) -> list() when % list of unicode codepoints + Term :: term(). + +write_natural(Term) when is_binary(Term) -> + try + case unicode:characters_to_list(Term, utf8) of + Encoded when is_list(Encoded) -> Encoded; + _ -> erlang:error(badarg, [Term]) + end + catch _:_:_ -> + erlang:error(badarg, [Term]) + end; +write_natural(Term) when is_list(Term) -> + try + case unicode:characters_to_list(Term, unicode) of + Encoded when is_list(Encoded) -> Encoded; + _ -> erlang:error(badarg, [Term]) + end + catch _:_:_ -> + erlang:error(badarg, [Term]) + end; +write_natural(Term) when is_atom(Term) -> + atom_to_list(Term); +write_natural(Term) when is_integer(Term) -> + integer_to_list(Term); +write_natural(Term) -> + erlang:error(badarg, [Term]). + +to_bin(ListStr) -> + unicode:characters_to_binary(ListStr, unicode, utf8). + +to_bin(Prefix, ListStr) -> + L = to_bin(ListStr), + <>. + +%% For printing terms with a natural string representation to a +%% binary string. +-spec write_bin_natural(Term) -> binary() when % UTF-8 encoded binary + Term :: term(). + +write_bin_natural(Term) when is_binary(Term) -> + try + case unicode:characters_to_binary(Term, utf8, utf8) of + Encoded when is_binary(Encoded) -> Encoded; + _ -> erlang:error(badarg, [Term]) + end + catch _:_:_ -> + erlang:error(badarg, [Term]) + end; +write_bin_natural(Term) when is_list(Term) -> + try + case unicode:characters_to_binary(Term, unicode, utf8) of + Encoded when is_binary(Encoded) -> Encoded; + _ -> erlang:error(badarg, [Term]) + end + catch _:_:_ -> + erlang:error(badarg, [Term]) + end; +write_bin_natural(Term) when is_atom(Term) -> + atom_to_binary(Term, utf8); +write_bin_natural(Term) when is_integer(Term) -> + integer_to_binary(Term); +write_bin_natural(Term) -> + erlang:error(badarg, [Term]). + +-spec write_bin(Term) -> binary() when % UTF-8 encoded binary + Term :: term(). + +write_bin(Term) -> + write_bin1(<<>>, Term, -1, undefined). + +-spec write_bin(Prefix, Term) -> binary() when % UTF-8 encoded binary + Prefix :: binary(), + Term :: term(). + +write_bin(Prefix, Term) -> + write_bin1(Prefix, Term, -1, undefined). + +-spec write_bin(Prefix, Term, Depth) -> binary() when % UTF-8 encoded binary + Prefix :: binary(), + Term :: term(), + Depth :: -1 | integer(). + +write_bin(Prefix, Term, Depth) -> + write_bin1(Prefix, Term, Depth, undefined). + +-spec write_bin(Prefix, Term, Depth, MapsOrder) -> binary() when % UTF-8 encoded binary + Prefix :: binary(), + Term :: term(), + Depth :: -1 | integer(), + MapsOrder :: term(). + +write_bin(Prefix, Term, Depth, MapsOrder) -> + write_bin1(Prefix, Term, Depth, MapsOrder). + +% Appending to a binary is quite extensively optimised in the rest of the +% toolchain, so we attempt to make good use of that here +write_bin1(Acc, _Term, 0, _O) -> + <>; +write_bin1(Acc, Term, _D, _O) when is_integer(Term) -> + to_bin(Acc, integer_to_list(Term)); +write_bin1(Acc, Term, _D, _O) when is_float(Term) -> + to_bin(Acc, float_to_binary(Term, [short])); +write_bin1(Acc, Atom, _D, _O) when is_atom(Atom) -> + write_bin_atom(Acc, Atom); +write_bin1(Acc, Term, _D, _O) when is_port(Term) -> + to_bin(Acc, write_port(Term)); +write_bin1(Acc, Term, _D, _O) when is_pid(Term) -> + to_bin(Acc, pid_to_list(Term)); +write_bin1(Acc, Term, _D, _O) when is_reference(Term) -> + to_bin(Acc, write_ref(Term)); +write_bin1(Acc, <<_/bitstring>>=Term, D, _O) -> + write_bin_binary(Acc, Term, D); +write_bin1(Acc, [], _D, _O) -> + <>; +write_bin1(Acc, {}, _D, _O) -> + <>; +write_bin1(Acc, [H|T], D, O) -> + if + D =:= 1 -> <>; + true -> + Hd = write_bin1(<>, H, D-1, O), + Tl = write_bin_tail(Hd, T, D-1, O), + <> + end; +write_bin1(Acc, F, _D, _O) when is_function(F) -> + to_bin(Acc, erlang:fun_to_list(F)); +write_bin1(Acc, Term, D, O) when is_map(Term) -> + write_bin_map(Acc, Term, D, O); +write_bin1(Acc, T, D, O) when is_tuple(T) -> + if + D =:= 1 -> <>; + true -> + Acc1 = write_bin1(<>, element(1, T), D-1, O), + Acc2 = write_bin_tuple(Acc1, T, 2, D-1, O), + <> + end. + %% write_tail(List, Depth, Encoding) %% Test the terminating case first as this looks better with depth. @@ -409,6 +551,20 @@ write_tuple(_, _I, 1, _E, _O) -> [$, | "..."]; write_tuple(T, I, D, E, O) -> [$,,write1(element(I, T), D-1, E, O)|write_tuple(T, I+1, D-1, E, O)]. +write_bin_tail(Acc, [], _D, _O) -> Acc; +write_bin_tail(Acc, _, 1, _O) -> <>; +write_bin_tail(Acc, [H|T], D, O) -> + Acc1 = write_bin1(<>, H, D-1, O), + write_bin_tail(Acc1, T, D-1, O); +write_bin_tail(Acc, Other, D, O) -> + write_bin1(<>, Other, D-1, O). + +write_bin_tuple(Acc, T, I, _D, _O) when I > tuple_size(T) -> Acc; +write_bin_tuple(Acc, _, _I, 1, _O) -> <>; +write_bin_tuple(Acc, T, I, D, O) -> + Acc1 = write_bin1(<>, element(I, T), D-1, O), + write_bin_tuple(Acc1, T, I+1, D-1, O). + write_port(Port) -> erlang:port_to_list(Port). @@ -438,6 +594,37 @@ write_map_body(I, D, D0, E, O) -> write_map_assoc(K, V, D, E, O) -> [write1(K, D, E, O)," => ",write1(V, D, E, O)]. +write_bin_map(Acc, _, 1, _O) -> + <>; +write_bin_map(Acc, Map, D, O) when is_integer(D) -> + I = maps:iterator(Map, O), + case maps:next(I) of + {K, V, NextI} -> + D0 = D - 1, + Acc1 = <>, + Acc2 = write_bin_map_assoc(Acc1, K, V, D0, O), + Acc3 = write_bin_map_body(Acc2, NextI, D0, D0, O), + <>; + none -> + <<"#{}"/utf8>> + end. + +write_bin_map_body(Acc, _, 1, _D0, _O) -> + <>; +write_bin_map_body(Acc, I, D, D0, O) -> + case maps:next(I) of + {K, V, NextI} -> + Acc1 = write_bin_map_assoc(<>, K, V, D0, O), + write_bin_map_body(Acc1, NextI, D - 1, D0, O); + none -> + Acc + end. + +write_bin_map_assoc(Acc, K, V, D, O) -> + Acc1 = write_bin1(Acc, K, D, O), + Acc2 = < "/utf8>>, + write_bin1(Acc2, V, D, O). + write_binary(B, D) when is_integer(D) -> {S, _} = write_binary(B, D, -1), S. @@ -460,6 +647,37 @@ write_binary_body(B, _D, _T, Acc) -> <> = B, {[integer_to_list(L),$:,integer_to_list(X)|Acc], <<>>}. +write_bin_binary(Acc, B, D) when is_integer(D) -> + {S, _} = write_bin_binary(Acc, B, D, -1), + S. + +write_bin_binary(Acc, B, D, T) when is_integer(T) -> + Acc1 = <>, + {S, Rest} = write_bin_binary_body(B, D, tsub(T, 4), Acc1), + {<>"/utf8>>, Rest}. + +write_bin_binary_body(<<>> = B, _D, _T, Acc) -> + {Acc, B}; +write_bin_binary_body(B, D, T, Acc) when D =:= 1; T =:= 0 -> + {<>, B}; +write_bin_binary_body(<>, _D, _T, Acc) -> + {to_bin(Acc, integer_to_list(X)), <<>>}; +write_bin_binary_body(<>, D, T, Acc) -> + X1 = to_bin(integer_to_list(X)), + write_bin_binary_body( + Rest, + D-1, + tsub(T, string:length(X1) + 1), + <> + ); +write_bin_binary_body(B, _D, _T, Acc) -> + L = bit_size(B), + <> = B, + Acc1 = to_bin(Acc, integer_to_list(X)), + Acc2 = <>, + Acc3 = to_bin(Acc2, integer_to_list(L)), + {Acc3, <<>>}. + %% Make sure T does not change sign. tsub(T, _) when T < 0 -> T; tsub(T, E) when T >= E -> T - E; @@ -486,6 +704,19 @@ get_option(Key, TupleList, Default) -> write_atom(Atom) -> write_possibly_quoted_atom(Atom, fun write_string/2). +-spec write_bin_atom(Acc, Atom) -> binary() when + Acc :: binary(), + Atom :: atom(). + +write_bin_atom(Acc, Atom) -> + Bin = atom_to_binary(Atom, utf8), + case quote_bin_atom(Atom, Bin) of + true -> + write_bin_string(Acc, Bin, <<$'/utf8>>); + false -> + <> + end. + -spec write_atom_as_latin1(Atom) -> latin1_string() when Atom :: atom(). @@ -537,6 +768,33 @@ name_char($_) -> true; name_char($@) -> true; name_char(_) -> false. + +-spec quote_bin_atom(atom(), binary()) -> boolean(). + +quote_bin_atom(Atom, Bin0) -> + case erl_scan:reserved_word(Atom) of + true -> true; + false -> + case string:next_codepoint(Bin0) of + [C|Bin1] when is_integer(C), C >= $a, C =< $z -> + not name_chars_bin(Bin1); + [C|Bin1] when is_integer(C), C >= $ß, C =< $ΓΏ, C =/= $Γ· -> + not name_chars_bin(Bin1); + [C|_] when is_integer(C) -> true; + [] -> true + end + end. + +name_chars_bin(Bin) -> + name_chars_bin1(string:next_codepoint(Bin)). + +name_chars_bin1([C|Bin1]) when is_integer(C) -> + case name_char(C) of + true -> name_chars_bin(Bin1); + false -> false + end; +name_chars_bin1([]) -> true. + %%% There are two functions to write Unicode strings: %%% - they both escape control characters < 160; %%% - write_string() never escapes characters >= 160; @@ -613,6 +871,52 @@ string_char(_,C, _, Tail) when C < $\240-> %Other control characters. C3 = (C band 7) + $0, [$\\,C1,C2,C3|Tail]. +%% write_bin_string(binary(), [Char], char()) -> binary() +%% Generate the UTF-8 encoded binary needed to print a string, appended to the +%% given UTF-8 encoded prefix, using the given quotation (UTF-8 encoded binary) +%% character. + +-spec write_bin_string(binary(), string(), binary()) -> binary(). + +write_bin_string(<<_/binary>> = Acc, S, Q) -> + write_bin_string1(<>, S, Q). + +write_bin_string1(Acc, [], Q) -> + <>; +write_bin_string1(Acc, [C|Cs], Q) when is_integer(C) -> + Acc1 = string_char_bin(Acc, C, Q), + write_bin_string1(Acc1, Cs, Q). + +string_char_bin(Acc, QChar, QBin) when <> =:= QBin -> % Must check these first! + <>; +string_char_bin(Acc, $\\, _) -> + <>; +string_char_bin(Acc, C, _) when C >= $\s, C =< $~ -> + <>; +string_char_bin(Acc, C, _) when C >= $\240 -> + <>; +string_char_bin(Acc,$\n, _) -> + <>; %\n = LF +string_char_bin(Acc,$\r, _) -> + <>; %\r = CR +string_char_bin(Acc,$\t, _) -> + <>; %\t = TAB +string_char_bin(Acc,$\v, _) -> + <>; %\v = VT +string_char_bin(Acc,$\b, _) -> + <>; %\b = BS +string_char_bin(Acc,$\f, _) -> + <>; %\f = FF +string_char_bin(Acc,$\e, _) -> + <>; %\e = ESC +string_char_bin(Acc,$\d, _) -> + <>; %\d = DEL +string_char_bin(Acc,C, _) when C < $\240-> % Other control characters + C1 = (C bsr 6) + $0, + C2 = ((C bsr 3) band 7) + $0, + C3 = (C band 7) + $0, + <>. + %%% There are two functions to write a Unicode character: %%% - they both escape control characters < 160; %%% - write_char() never escapes characters >= 160; diff --git a/lib/stdlib/src/stdlib.app.src b/lib/stdlib/src/stdlib.app.src index 69bff1511b0b..89c6c439a194 100644 --- a/lib/stdlib/src/stdlib.app.src +++ b/lib/stdlib/src/stdlib.app.src @@ -45,6 +45,7 @@ erl_anno, erl_bits, erl_compile, + erl_desugar_interpolation, erl_error, erl_eval, erl_expand_records, diff --git a/lib/stdlib/test/Makefile b/lib/stdlib/test/Makefile index 5d4ffcf86e9b..2e1b7d1c5a8a 100644 --- a/lib/stdlib/test/Makefile +++ b/lib/stdlib/test/Makefile @@ -27,6 +27,7 @@ MODULES= \ edlin_context_SUITE \ epp_SUITE \ erl_anno_SUITE \ + erl_desugar_interpolation_SUITE \ erl_eval_SUITE \ erl_expand_records_SUITE \ erl_internal_SUITE \ diff --git a/lib/stdlib/test/erl_desugar_interpolation_SUITE.erl b/lib/stdlib/test/erl_desugar_interpolation_SUITE.erl new file mode 100644 index 000000000000..c762d59302c3 --- /dev/null +++ b/lib/stdlib/test/erl_desugar_interpolation_SUITE.erl @@ -0,0 +1,575 @@ +-module(erl_desugar_interpolation_SUITE). + +-export([all/0,suite/0]). + +-export([empty/1, head/1, head_subs_no_tail/1, head_subs_tail/1, + head_subs_cont_subs_tail/1, multiline_head_subs_cont_subs_tail/1, + lots_of_subs/1, tilde_can_be_escaped/1, + unicode_list_string_subs_in_list/1, utf8_binary_string_subs_in_list/1, + unicode_list_string_subs_in_tuple/1, utf8_binary_string_subs_in_tuple/1, + variable_subs/1, macro_subs/1, special_characters/1, all_types_subs/1, + all_production_format_types_subs/1, floats_round_trip_and_are_the_same_between_lists_and_binaries/1, + block_substitutions/1, function_call_substitutions/1, + back_to_back_substitutions/1, + homogenous_interpolations_inside_interpolations/1, + heterogenous_interpolations_inside_interpolations/1]). + +-include_lib("stdlib/include/assert.hrl"). + +suite() -> + [{timetrap,{minutes,1}}]. + +all() -> + [ empty, head, head_subs_no_tail, head_subs_tail, + head_subs_cont_subs_tail, multiline_head_subs_cont_subs_tail, + lots_of_subs, tilde_can_be_escaped, + unicode_list_string_subs_in_list, utf8_binary_string_subs_in_list, + unicode_list_string_subs_in_tuple, utf8_binary_string_subs_in_tuple, + variable_subs, macro_subs, special_characters, all_types_subs, + all_production_format_types_subs, floats_round_trip_and_are_the_same_between_lists_and_binaries, + block_substitutions, function_call_substitutions, + back_to_back_substitutions, + homogenous_interpolations_inside_interpolations, + heterogenous_interpolations_inside_interpolations + ]. + +-define(assertEqualStr(A,B), ?assertEqual(A,lists:flatten(B))). + +empty(Config) when is_list(Config) -> + ?assertEqual( + <<>>, + bf"" + ), + ?assertEqualStr( + "", + lf"" + ), + ?assertEqual( + <<>>, + bd"" + ), + ?assertEqualStr( + "", + ld"" + ). + +head(Config) when is_list(Config) -> + ?assertEqual( + <<"A head"/utf8>>, + bf"A head" + ), + ?assertEqualStr( + "A head", + lf"A head" + ), + ?assertEqual( + <<"A head"/utf8>>, + bd"A head" + ), + ?assertEqualStr( + "A head", + ld"A head" + ). + +head_subs_tail(Config) when is_list(Config) -> + ?assertEqual( + <<"Two plus two is 4!"/utf8>>, + bf"Two plus two is ~2 + 2~!" + ), + ?assertEqualStr( + "Two plus two is 4!", + lf"Two plus two is ~2 + 2~!" + ), + ?assertEqual( + <<"Two plus two is 4!"/utf8>>, + bd"Two plus two is ~2 + 2~!" + ), + ?assertEqualStr( + "Two plus two is 4!", + ld"Two plus two is ~2 + 2~!" + ). + +head_subs_no_tail(Config) when is_list(Config) -> + ?assertEqual( + <<"Two plus two is 4"/utf8>>, + bf"Two plus two is ~2 + 2~" + ), + ?assertEqualStr( + "Two plus two is 4", + lf"Two plus two is ~2 + 2~" + ), + ?assertEqual( + <<"Two plus two is 4"/utf8>>, + bd"Two plus two is ~2 + 2~" + ), + ?assertEqualStr( + "Two plus two is 4", + ld"Two plus two is ~2 + 2~" + ). + +head_subs_cont_subs_tail(Config) when is_list(Config) -> + ?assertEqual( + <<"Two plus two is 4, and three times three is 9!"/utf8>>, + bf"Two plus two is ~2 + 2~, and three times three is ~3 * 3~!" + ), + ?assertEqualStr( + "Two plus two is 4, and three times three is 9!", + lf"Two plus two is ~2 + 2~, and three times three is ~3 * 3~!" + ), + ?assertEqual( + <<"Two plus two is 4, and three times three is 9!"/utf8>>, + bd"Two plus two is ~2 + 2~, and three times three is ~3 * 3~!" + ), + ?assertEqualStr( + "Two plus two is 4, and three times three is 9!", + ld"Two plus two is ~2 + 2~, and three times three is ~3 * 3~!" + ). + +multiline_head_subs_cont_subs_tail(Config) when is_list(Config) -> + ?assertEqual( + <<"Two plus two is 4, + and three times three is 9!"/utf8>>, + bf"Two plus two is ~2 + 2~, + and three times three is ~3 * 3~!" + ), + ?assertEqualStr( + "Two plus two is 4, + and three times three is 9!", + lf"Two plus two is ~2 + 2~, + and three times three is ~3 * 3~!" + ), + ?assertEqual( + <<"Two plus two is 4, + and three times three is 9!"/utf8>>, + bd"Two plus two is ~2 + 2~, + and three times three is ~3 * 3~!" + ), + ?assertEqualStr( + "Two plus two is 4, + and three times three is 9!", + ld"Two plus two is ~2 + 2~, + and three times three is ~3 * 3~!" + ). + +lots_of_subs(Config) when is_list(Config) -> + ?assertEqual( + <<"A 4, B 6, C 81, D -6, E 0, F 8"/utf8>>, + bf"A ~2 + 2~, B ~3 + 3~, C ~9 * 9~, D ~-6~, E ~0~, F ~2 * (2 + 2)~" + ), + ?assertEqualStr( + "A 4, B 6, C 81, D -6, E 0, F 8", + lf"A ~2 + 2~, B ~3 + 3~, C ~9 * 9~, D ~-6~, E ~0~, F ~2 * (2 + 2)~" + ), + ?assertEqual( + <<"A 4, B 6, C 81, D -6, E 0, F 8"/utf8>>, + bd"A ~2 + 2~, B ~3 + 3~, C ~9 * 9~, D ~-6~, E ~0~, F ~2 * (2 + 2)~" + ), + ?assertEqualStr( + "A 4, B 6, C 81, D -6, E 0, F 8", + ld"A ~2 + 2~, B ~3 + 3~, C ~9 * 9~, D ~-6~, E ~0~, F ~2 * (2 + 2)~" + ). + +tilde_can_be_escaped(Config) when is_list(Config) -> + ?assertEqual( + <<"Two plus t\~wo is ~4!"/utf8>>, + bf"Two plus t\~wo is \~~2 + 2~!" + ), + ?assertEqualStr( + "Two plus t\~wo is ~4!", + lf"Two plus t\~wo is \~~2 + 2~!" + ), + ?assertEqual( + <<"Two plus t\~wo is ~4!"/utf8>>, + bd"Two plus t\~wo is \~~2 + 2~!" + ), + ?assertEqualStr( + "Two plus t\~wo is ~4!", + ld"Two plus t\~wo is \~~2 + 2~!" + ). + +unicode_list_string_subs_in_list(Config) when is_list(Config) -> + ?assertEqual( + <<"Emoji: πŸ™‚πŸ‘"/utf8>>, + bf"Emoji: ~[$πŸ™‚, $πŸ‘]~" + ), + ?assertEqualStr( + "Emoji: πŸ™‚πŸ‘", + lf"Emoji: ~[$πŸ™‚, $πŸ‘]~" + ), + ?assertEqual( + <<"Emoji: [128578,128077]"/utf8>>, + bd"Emoji: ~[$πŸ™‚, $πŸ‘]~" + ), + ?assertEqualStr( + "Emoji: [128578,128077]", + ld"Emoji: ~[$πŸ™‚, $πŸ‘]~" + ). + +utf8_binary_string_subs_in_list(Config) when is_list(Config) -> + ?assertEqual( + <<"Emoji: πŸ™‚πŸ‘"/utf8>>, + bf"Emoji: ~<<"πŸ™‚πŸ‘"/utf8>>~" + ), + ?assertEqualStr( + "Emoji: πŸ™‚πŸ‘", + lf"Emoji: ~<<"πŸ™‚πŸ‘"/utf8>>~" + ), + ?assertEqual( + <<"Emoji: <<240,159,153,130,240,159,145,141>>"/utf8>>, + bd"Emoji: ~<<"πŸ™‚πŸ‘"/utf8>>~" + ), + ?assertEqualStr( + "Emoji: <<240,159,153,130,240,159,145,141>>", + ld"Emoji: ~<<"πŸ™‚πŸ‘"/utf8>>~" + ). + +unicode_list_string_subs_in_tuple(Config) when is_list(Config) -> + ?assertError( + badarg, + bf"Emoji: ~{a,[$πŸ™‚,$πŸ‘],c}~" + ), + ?assertError( + badarg, + lf"Emoji: ~{a,[$πŸ™‚,$πŸ‘],c}~" + ), + ?assertEqual( + <<"Emoji: {a,[128578,128077],c}"/utf8>>, + bd"Emoji: ~{a,[$πŸ™‚,$πŸ‘],c}~" + ), + ?assertEqualStr( + "Emoji: {a,[128578,128077],c}", + ld"Emoji: ~{a,[$πŸ™‚,$πŸ‘],c}~" + ). + +utf8_binary_string_subs_in_tuple(Config) when is_list(Config) -> + ?assertError( + badarg, + bf"Emoji: ~{a,<<"πŸ™‚πŸ‘"/utf8>>,c}~" + ), + ?assertError( + badarg, + lf"Emoji: ~{a,<<"πŸ™‚πŸ‘"/utf8>>,c}~" + ), + ?assertEqual( + <<"Emoji: {a,<<240,159,153,130,240,159,145,141>>,c}"/utf8>>, + bd"Emoji: ~{a,<<"πŸ™‚πŸ‘"/utf8>>,c}~" + ), + ?assertEqualStr( + "Emoji: {a,<<240,159,153,130,240,159,145,141>>,c}", + ld"Emoji: ~{a,<<"πŸ™‚πŸ‘"/utf8>>,c}~" + ). + +variable_subs(Config) when is_list(Config) -> + X = 2 + 2, + ?assertEqual( + <<"X: 4"/utf8>>, + bf"X: ~X~" + ), + ?assertEqualStr( + "X: 4", + lf"X: ~X~" + ), + ?assertEqual( + <<"X: 4"/utf8>>, + bd"X: ~X~" + ), + ?assertEqualStr( + "X: 4", + ld"X: ~X~" + ). + +macro_subs(Config) when is_list(Config) -> + ?assertEqual( + <<"This module: erl_desugar_interpolation_SUITE"/utf8>>, + bf"This module: ~?MODULE~" + ), + ?assertEqualStr( + "This module: erl_desugar_interpolation_SUITE", + lf"This module: ~?MODULE~" + ), + ?assertEqual( + <<"This module: erl_desugar_interpolation_SUITE"/utf8>>, + bd"This module: ~?MODULE~" + ), + ?assertEqualStr( + "This module: erl_desugar_interpolation_SUITE", + ld"This module: ~?MODULE~" + ). + +special_characters(Config) when is_list(Config) -> + ?assertEqual( + <<"\nstuff\tmore stuff: \neven more stuff"/utf8>>, + bf"\nstuff\tmore stuff: ~<<"\n"/utf8>>~even more stuff" + ), + ?assertEqualStr( + "\nstuff\tmore stuff: \neven more stuff", + lf"\nstuff\tmore stuff: ~<<"\n"/utf8>>~even more stuff" + ), + ?assertEqual( + <<"\nstuff\tmore stuff: <<10>>even more stuff"/utf8>>, + bd"\nstuff\tmore stuff: ~<<"\n"/utf8>>~even more stuff" + ), + ?assertEqualStr( + "\nstuff\tmore stuff: <<10>>even more stuff", + ld"\nstuff\tmore stuff: ~<<"\n"/utf8>>~even more stuff" + ). + +back_to_back_substitutions(Config) when is_list(Config) -> + ?assertEqual( + <<"Here are several expression back-to-back 3my_atomfoo"/utf8>>, + bf"Here are several expression back-to-back ~1+2~~my_atom~~<<"foo"/utf8>>~" + ), + ?assertEqualStr( + "Here are several expression back-to-back 3my_atomfoo", + lf"Here are several expression back-to-back ~1+2~~my_atom~~<<"foo"/utf8>>~" + ), + ?assertEqual( + <<"Here are several expression back-to-back 3my_atom<<102,111,111>>"/utf8>>, + bd"Here are several expression back-to-back ~1+2~~my_atom~~<<"foo"/utf8>>~" + ), + ?assertEqualStr( + "Here are several expression back-to-back 3my_atom<<102,111,111>>", + ld"Here are several expression back-to-back ~1+2~~my_atom~~<<"foo"/utf8>>~" + ). + +all_production_format_types_subs(Config) when is_list(Config) -> + ?assertEqual( + <<"integer: 2 + atom: foo + list-string: a string + binary-string: another string">>, + bf"integer: ~1 + 1~ + atom: ~foo~ + list-string: ~"a string"~ + binary-string: ~<<"another string"/utf8>>~" + ), + ?assertEqualStr( + "integer: 2 + atom: foo + list-string: a string + binary-string: another string", + lf"integer: ~1 + 1~ + atom: ~foo~ + list-string: ~"a string"~ + binary-string: ~<<"another string"/utf8>>~" + ), + ?assertEqual( + <<"integer: 2 + atom: foo + list-string: [97,32,115,116,114,105,110,103] + binary-string: <<97,110,111,116,104,101,114,32,115,116,114,105,110,103>>">>, + bd"integer: ~1 + 1~ + atom: ~foo~ + list-string: ~"a string"~ + binary-string: ~<<"another string"/utf8>>~" + ), + ?assertEqualStr( + "integer: 2 + atom: foo + list-string: [97,32,115,116,114,105,110,103] + binary-string: <<97,110,111,116,104,101,114,32,115,116,114,105,110,103>>", + ld"integer: ~1 + 1~ + atom: ~foo~ + list-string: ~"a string"~ + binary-string: ~<<"another string"/utf8>>~" + ). + +floats_round_trip_and_are_the_same_between_lists_and_binaries(Config) when is_list(Config) -> + ?assertError( + badarg, + bf"A float: ~1000000000000 + 0.1 + 0.2~" + ), + ?assertError( + badarg, + lf"A float: ~1000000000000 + 0.1 + 0.2~" + ), + ?assertEqual( + <<"A float: 1000000000000.2999"/utf8>>, + bd"A float: ~1000000000000 + 0.1 + 0.2~" + ), + ?assertEqualStr( + "A float: 1000000000000.2999", + ld"A float: ~1000000000000 + 0.1 + 0.2~" + ), + ?assertEqual( + 1000000000000 + 0.1 + 0.2, + binary_to_float(string:trim(bd"~1000000000000 + 0.1 + 0.2~", both, [$"])) + ), + ?assertEqual( + 1000000000000 + 0.1 + 0.2, + list_to_float(string:trim(ld"~1000000000000 + 0.1 + 0.2~", both, [$"])) + ), + ?assertEqual( + binary_to_float(string:trim(bd"~1000000000000 + 0.1 + 0.2~", both, [$"])), + list_to_float(string:trim(ld"~1000000000000 + 0.1 + 0.2~", both, [$"])) + ). + +all_types_subs(Config) when is_list(Config) -> + ?assertError( + badarg, + bf"integer: ~1 + 1~ + float: ~2.0 * 2.0~ + atom: ~foo~ + port: ~open_port({spawn, ls}, [])~ + pid: ~self()~ + ref: ~make_ref()~ + bitstring: ~<<6:4>>~ + list: ~[1,2,3]~ + tuple: ~{1,b,3.0}~ + map: ~#{key => value}~ + function: ~fun (X) -> X + 1 end~" + ), + ?assertError( + badarg, + lf"integer: ~1 + 1~ + float: ~2.0 * 2.0~ + atom: ~foo~ + port: ~open_port({spawn, ls}, [])~ + pid: ~self()~ + ref: ~make_ref()~ + bitstring: ~<<6:4>>~ + list: ~[1,2,3]~ + tuple: ~{1,b,3.0}~ + map: ~#{key => value}~ + function: ~fun (X) -> X + 1 end~" + ), + InterpolatedBd = + bd"integer: ~1 + 1~ + float: ~2.0 * 2.0~ + atom: ~foo~ + port: ~open_port({spawn, ls}, [])~ + pid: ~self()~ + ref: ~make_ref()~ + bitstring: ~<<6:4>>~ + list: ~[1,2,3]~ + tuple: ~{1,b,3.0}~ + map: ~#{key => value}~ + function: ~fun (X) -> X + 1 end~", + ?assert( + is_binary(InterpolatedBd) + ), + ?assertMatch( + {match, _}, + re:run( + InterpolatedBd, + <<"integer: 2 + float: 4\\.0 + atom: foo + port: #Port<.*> + pid: <.*> + ref: #Ref<.*> + bitstring: <<6:4>> + list: \\[1,2,3\\] + tuple: \\{1,b,3\.0\\} + map: #\\{key => value\\} + function: #Fun">>, + [multiline] + ), + lists:flatten(io_lib:format("Interpolation (binary, debug) result was:~n~ts~n", [InterpolatedBd])) + ), + InterpolatedLd = + ld"integer: ~1 + 1~ + float: ~2.0 * 2.0~ + atom: ~foo~ + port: ~open_port({spawn, ls}, [])~ + pid: ~self()~ + ref: ~make_ref()~ + bitstring: ~<<6:4>>~ + list: ~[1,2,3]~ + tuple: ~{1,b,3.0}~ + map: ~#{key => value}~ + function: ~fun (X) -> X + 1 end~", + ?assert( + is_list(InterpolatedLd) + ), + ?assertMatch( + {match, _}, + re:run( + InterpolatedLd, + "integer: 2 + float: 4\\.0 + atom: foo + port: #Port<.*> + pid: <.*> + ref: #Ref<.*> + bitstring: <<6:4>> + list: \\[1,2,3\\] + tuple: \\{1,b,3\.0\\} + map: #\\{key => value\\} + function: #Fun", + [multiline] + ), + lists:flatten(io_lib:format("Interpolation (list, debug) result was:~n~ts~n", [InterpolatedLd])) + ). + +function_call_substitutions(Config) when is_list(Config) -> + ?assertEqual( + <<"erlang:is_binary(\"hi\") = false"/utf8>>, + bf"erlang:is_binary(\"hi\") = ~erlang:is_binary("hi")~" + ), + ?assertEqualStr( + "erlang:is_binary(\"hi\") = false", + lf"erlang:is_binary(\"hi\") = ~erlang:is_binary("hi")~" + ), + ?assertEqual( + <<"erlang:is_binary(\"hi\") = false"/utf8>>, + bd"erlang:is_binary(\"hi\") = ~erlang:is_binary("hi")~" + ), + ?assertEqualStr( + "erlang:is_binary(\"hi\") = false", + ld"erlang:is_binary(\"hi\") = ~erlang:is_binary("hi")~" + ). + +block_substitutions(Config) when is_list(Config) -> + ?assertEqual( + <<"block value: 7"/utf8>>, + bf"block value: ~begin A = 6, (fun (X) -> X + 1 end)(A) end~" + ), + ?assertEqualStr( + "block value: 7", + lf"block value: ~begin A = 6, (fun (X) -> X + 1 end)(A) end~" + ), + ?assertEqual( + <<"block value: 7"/utf8>>, + bd"block value: ~begin A = 6, (fun (X) -> X + 1 end)(A) end~" + ), + ?assertEqualStr( + "block value: 7", + ld"block value: ~begin A = 6, (fun (X) -> X + 1 end)(A) end~" + ). + +homogenous_interpolations_inside_interpolations(Config) when is_list(Config) -> + ?assertEqual( + <<"Yo dawg, I heard you like string interpolations, so I put a string in your string so you can interpolate while you interpolate"/utf8>>, + bf"Yo dawg, I heard you like string interpolations, ~bf"so I put a ~bf"string in your"~ string"~ so you can interpolate while you interpolate" + ), + ?assertEqualStr( + "Yo dawg, I heard you like string interpolations, so I put a string in your string so you can interpolate while you interpolate", + lf"Yo dawg, I heard you like string interpolations, ~lf"so I put a ~lf"string in your"~ string"~ so you can interpolate while you interpolate" + ), + ?assertEqual( + <<"Yo dawg, I heard you like string interpolations, <<115,111,32,73,32,112,117,116,32,97,32,60,60,49,49,53,44,49,49,54,44,49,49,52,44,49,48,53,44,49,49,48,44,49,48,51,44,51,50,44,49,48,53,44,49,49,48,44,51,50,44,49,50,49,44,49,49,49,44,49,49,55,44,49,49,52,62,62,32,115,116,114,105,110,103>> so you can interpolate while you interpolate"/utf8>>, + bd"Yo dawg, I heard you like string interpolations, ~bd"so I put a ~bd"string in your"~ string"~ so you can interpolate while you interpolate" + ), + ?assertEqualStr( + "Yo dawg, I heard you like string interpolations, [[115,111,32,73,32,112,117,116,32,97,32],[91,[[49,49,53],44,[49,49,54],44,[49,49,52],44,[49,48,53],44,[49,49,48],44,[49,48,51],44,[51,50],44,[49,48,53],44,[49,49,48],44,[51,50],44,[49,50,49],44,[49,49,49],44,[49,49,55],44,[49,49,52]],93],[32,115,116,114,105,110,103]] so you can interpolate while you interpolate", + ld"Yo dawg, I heard you like string interpolations, ~ld"so I put a ~ld"string in your"~ string"~ so you can interpolate while you interpolate" + ). + +heterogenous_interpolations_inside_interpolations(Config) when is_list(Config) -> + ?assertEqual( + <<"Yo dawg, I heard you like string interpolations, so I put a string in your string so you can interpolate while you interpolate"/utf8>>, + bf"Yo dawg, I heard you like string interpolations, ~lf"so I put a ~bd"string in your"~ string"~ so you can interpolate while you interpolate" + ), + ?assertEqualStr( + "Yo dawg, I heard you like string interpolations, so I put a string in your string so you can interpolate while you interpolate", + lf"Yo dawg, I heard you like string interpolations, ~bf"so I put a ~ld"string in your"~ string"~ so you can interpolate while you interpolate" + ), + ?assertEqual( + <<"Yo dawg, I heard you like string interpolations, [[115,111,32,73,32,112,117,116,32,97,32],[60,60,[[49,49,53],44,[49,49,54],44,[49,49,52],44,[49,48,53],44,[49,49,48],44,[49,48,51],44,[51,50],44,[49,48,53],44,[49,49,48],44,[51,50],44,[49,50,49],44,[49,49,49],44,[49,49,55],44,[49,49,52]],62,62],[32,115,116,114,105,110,103]] so you can interpolate while you interpolate"/utf8>>, + bd"Yo dawg, I heard you like string interpolations, ~ld"so I put a ~bf"string in your"~ string"~ so you can interpolate while you interpolate" + ), + ?assertEqualStr( + "Yo dawg, I heard you like string interpolations, <<115,111,32,73,32,112,117,116,32,97,32,91,49,49,53,44,49,49,54,44,49,49,52,44,49,48,53,44,49,49,48,44,49,48,51,44,51,50,44,49,48,53,44,49,49,48,44,51,50,44,49,50,49,44,49,49,49,44,49,49,55,44,49,49,52,93,32,115,116,114,105,110,103>> so you can interpolate while you interpolate", + ld"Yo dawg, I heard you like string interpolations, ~bd"so I put a ~lf"string in your"~ string"~ so you can interpolate while you interpolate" + ). diff --git a/system/doc/reference_manual/data_types.xml b/system/doc/reference_manual/data_types.xml index 6cbf864a799a..efa527c026e2 100644 --- a/system/doc/reference_manual/data_types.xml +++ b/system/doc/reference_manual/data_types.xml @@ -358,7 +358,7 @@ a
String -

Strings are enclosed in double quotes ("), but is not a +

Strings literals are enclosed in double quotes ("), but is not a data type in Erlang. Instead, a string "hello" is shorthand for the list [$h,$e,$l,$l,$o], that is, [104,101,108,108,111].

@@ -370,6 +370,69 @@ a

is equivalent to

 "string42"
+ +
+ String interpolation +

Interpolated strings are string literals which allow other + expressions to be embedded into the,m. Interpolated strings format + the expressions they contain, and substitute their values into + the resulting string. Any expression can be placed inside an interpolated + string, including variables, function calls, macros, comprehensions, + and even other interpolated strings. +

+ +

Interpolated strings offer a more readable alternative to + io_lib:format/2 + and related formatting functions by keeping the interpolated + expressions in the position that their formatted value will + appear in the resulting string. +

+ +

Example:

+
+Name = "Alice".
+
+get_unread_messages() ->
+  [ {"Bob", "Let's grab lunch after the meeting?"},
+    {"Charlie", "When would you like that report submitting?"}
+  ].
+
+1> io_lib:format("Hello ~ts, you have ~B unread messages.", [Name, length(get_unread_messages())]).
+"Hello Alice, you have 2 unread messages."
+2> lf"Hello ~Name~, you have ~get_unread_messages()~ unread messages."
+"Hello Alice, you have 2 unread messages."
+
+ +

There are four forms of interpolated strings, differentiated by their prefix:

+ + lf"..." ("list format"), for strictly formatting unicode-codepoint-list strings + bf"..." ("binary format"), for strictly formatting UTF-8-encoded binary strings + ld"..." ("list debug"), for flexibly formatting unicode-codepoint-list strings + bd"..." ("binary debug"), for flexibly formatting UTF-8-encoded binary strings + + +

The list variants are typically most useful for backwards compatibility + and convenience. The binary variants are typically most useful for + memory-compactness and interacting with external systems. +

+ +

Whilst the list and binary variants yield unicode codepoint lists + and UTF-8-encoded binaries respectively, they both accept each other + as interpolated values. +

+ +

The lf"..." and bf"..." variants accept only unicode + codepoint lists, UTF-8-encoded binaries or integers as interpolated values, + with the expectation that more complex formatting decisions would be + factored out into interpolated function calls which can be tested and reused + in isolation. +

+ +

The ld"..." and bd"..." variants accept any Erlang term, + and are focussed around logging and debugging. +

+ +