Skip to content

Commit c1e5a31

Browse files
authored
Merge pull request #117 from mlabs-haskell/jared/116-strange-parses
Fixing strange parses
2 parents 938e56d + 72086d2 commit c1e5a31

File tree

9 files changed

+671
-157
lines changed

9 files changed

+671
-157
lines changed

_typos.toml

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
[default.extend-words]
22
substituters = "substituters"
3-
hask= "hask"
3+
hask = "hask"
4+
Nd = "Nd"
45

56
[type.pdf]
67
extend-glob = ["*.pdf"]
78
check-file = false
89

910
[type.png]
1011
extend-glob = ["*.png"]
11-
check-file = false
12+
check-file = false

docs/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
- [LambdaBuffers to Purescript](purescript.md)
77
- [Design](design.md)
88
- [API](api.md)
9+
- [LambdaBuffers Frontend (.lbf) syntax](syntax.md)
910
- [Compiler](compiler.md)
1011
- [Codegen](codegen.md)
1112
- [Command line interface](command-line-interface.md)

docs/syntax.md

+254
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,254 @@
1+
# LambdaBuffers Frontend (.lbf) syntax
2+
3+
The input to the LambdaBuffers Frontend is a text file which contains a module that defines a specification of the types and type class instances you want to generate. This chapter gives the exact syntax of a LambdaBuffers Frontend file, and informally describes meaning of the syntactic constructs.
4+
5+
The name of a LambdaBuffers Frontend file must end with `.lbf`, and hence may also be referred to as a .lbf file or a .lbf schema.
6+
7+
## Notation
8+
9+
In the following description of a LambdaBuffers Frontend file's syntax, we use a similar BNF syntax from [Section 10.1 of the Haskell Report](https://www.haskell.org/onlinereport/haskell2010/). So, the following notational conventions are used for presenting syntax.
10+
11+
| Syntax | Description |
12+
| ------------- | --------------------------------------------------------------------------- |
13+
| `[pattern]` | optional |
14+
| `{pattern}` | zero or more repetitions |
15+
| `(pattern)` | grouping |
16+
| `pat1⎮pat2` | choice |
17+
| `pat1\pat2` | difference -- elements generated by `pat1` except those generated by `pat2` |
18+
| `'terminal'` | terminal syntax surrounded by single quotes |
19+
20+
<!-- Apparently, `mdbook`'s markdown can't escape the vertical bar in codeblocks in a table....
21+
So, we're using code point U+23AE to look like a vertical bar when it really isn't...
22+
23+
| `pat1|pat2` | choice |
24+
-->
25+
26+
Note that the terminal syntax permits C-style escape sequences e.g. `'\n'` denotes line feed (newline), and `'\r'` denotes carriage return.
27+
28+
Productions will be of the form:
29+
30+
```text
31+
nonterm -> alt1 | ... | altn
32+
```
33+
34+
## Input file representation
35+
36+
The input file is Unicode text where the encoding is subject to the system locale. We will often use the unqualified term *character* to refer to a Unicode code point in the input file.
37+
38+
## Characters
39+
40+
The following terms are used to denote specific Unicode character categories:
41+
42+
- `upper` denotes a Unicode code point categorized as an uppercase letter or titlecase letter (i.e., with General Category value Lt or Lu).
43+
44+
- `lower` denotes a Unicode code point categorized as a lower-case letter (i.e., with General Category value Ll).
45+
46+
- `alphanum` denotes either `upper` or `lower`; or a Unicode code point categorized as a modifier letter, other letter, decimal digit number, letter number, or other number (i.e., with General Category value Lt, Lu, Ll, Lm, Lo, Nd, Nl or No).
47+
48+
- `space` denotes a Unicode code point categorized as a separator space (i.e., with General Category value Zs), or any of the control characters `'\t'`, `'\n'`, `'\r'`, `'\f'`, or `'\v'`.
49+
50+
Interested readers may find details of Unicode character categories in [Section 4.5 of The Unicode Standard 15.1.0](https://www.unicode.org/versions/Unicode15.1.0/), and the [Unicode Character Database](https://unicode.org/ucd/).
51+
52+
## Lexical syntax
53+
54+
Tokens form the vocabulary of LambdaBuffers Frontend files. The classes of tokens are defined as follows.
55+
56+
```text
57+
keyword -> 'module' | 'sum' | 'prod' | 'record'
58+
| 'opaque' | 'class' | 'instance' | 'import'
59+
| 'qualified' | 'as'
60+
modulename -> uppercamelcase
61+
longmodulename -> modulealias modulename
62+
typename -> uppercamelcase
63+
fieldname -> lowercamelcase\keyword
64+
longtypename -> modulealias typename
65+
varname -> lowers\keyword
66+
punctuation -> '<=' | ',' | '(' | ')' | '{' | '}'
67+
| ':' | ':-' | '=' | '|'
68+
classname -> uppercamelcase
69+
longclassname -> modulealias uppercamelcase
70+
```
71+
72+
where
73+
74+
```text
75+
uppercamelcase -> upper { alphanum }
76+
lowercamelcase -> lower { alphanum }
77+
modulealias -> { uppercamelcase '.' }
78+
lowers -> lower { lower }
79+
```
80+
81+
Input files are broken into *tokens* which use the *maximal munch* rule i.e., at each point, the next token is the longest sequence of characters that form a valid token. `space`s or line comments are ignored except as it separates tokens that would otherwise combine into a single token.
82+
83+
### Line comments
84+
85+
A *line comment* starts with the terminal `'--'` followed by zero or more printable Unicode characters stopping at the first end of line (`'\n'` or `'\r\n'`).
86+
87+
## Syntax of LambdaBuffers Frontend files
88+
89+
A LambdaBuffers Frontend file defines a module that is a collection of data types, classes, instance clauses, and derive clauses.
90+
91+
The overall layout of a LambdaBuffers Frontend file is:
92+
93+
```text
94+
module -> 'module' longmodulename { import } { statement }
95+
```
96+
97+
The file must specify the module's `longmodulename` where its `modulename` must match the LambdaBuffers Frontend file's file name not including the `.lbf` extension.
98+
After, the file may contain a sequence of `import`s followed by a sequence of `statement`s.
99+
100+
### Import
101+
102+
Imports bring *entities* (types and classes) of other modules into scope.
103+
104+
```text
105+
import -> 'import' [ 'qualified' ] longmodulename [ 'as' longmodulename ] [ importspec ]
106+
importspec -> '(' [ { typename ',' } typename [','] ] ')'
107+
```
108+
109+
If `importspec` is omitted, then all entities specified in the module are imported; otherwise only the specified entities are imported.
110+
111+
### Statement
112+
113+
Statements define types, classes, instance clauses, and derive clauses.
114+
115+
```text
116+
statement -> typedef
117+
| classdef
118+
| instanceclause
119+
| deriveclause
120+
```
121+
122+
#### Type definitions
123+
124+
Types may be either sum types, product types, record types, or opaque types.
125+
126+
```text
127+
typedef -> prodtypedef | sumtypedef | recordtypedef | opaquetypedef
128+
```
129+
130+
##### Product type definition
131+
132+
A product type definition defines a new product type.
133+
134+
```text
135+
prodtypedef -> 'prod' typename { varname } '=' prod
136+
prod -> { typeexp }
137+
typeexp -> varname
138+
| longtypename
139+
| '(' prod ')'
140+
```
141+
142+
Product type definitions instruct the code generator to generate a product type for the target language.
143+
144+
##### Sum type definition
145+
146+
A sum type definition defines a new sum type.
147+
148+
```text
149+
sumtypedef -> 'sum' typename { varname } '=' sum
150+
sum -> sumconstructor { '|' sumconstructor }
151+
sumconstructor -> typename prod
152+
```
153+
154+
Sum type definitions instruct the code generator to generate a sum type for the target language.
155+
156+
##### Record type definition
157+
158+
A record type definition defines a new record type.
159+
160+
```text
161+
recordtypedef -> 'record' typename { varname } '=' record
162+
record -> '{' [ field { ',' field } ] '}'
163+
field -> fieldname ':' prod
164+
````
165+
166+
Record type definitions instruct the code generator to generate a record type for the target language.
167+
168+
##### Opaque type definition
169+
170+
An opaque type definition defines a new opaque type.
171+
172+
```text
173+
opaquetypedef -> 'opaque' typename { varname }
174+
```
175+
176+
Opaque type definitions must map to existing types in the target language and it's up to the Codegen module to determine how that's exactly done.
177+
178+
#### Class definition
179+
180+
A class definition introduces a new class.
181+
182+
```text
183+
classdef -> 'class' [ constraintexps '<=' ] classname { varname }
184+
constraintexp -> classref { varname }
185+
| '(' constraintexps ')'
186+
constraintexps -> [ constraintexp { ',' constraintexp } ]
187+
```
188+
189+
Class definitions communicate with the code generator the implementations that already exist (via instance clauses) or that one would like to generate (via derive clauses).
190+
191+
#### Instance clause
192+
193+
An instance clause specifies a type is an instance of a class.
194+
195+
```text
196+
instanceclause -> 'instance' constraint [ ':-' constraintexps ]
197+
constraint -> classref { typeexp }
198+
```
199+
200+
Instance clauses do not instruct the code generator to generate code, but instead instructs the compiler (semantic checking) that the target language environment provides type class implementations for the given type (provided that the given `constraintexps` also have implementations).
201+
202+
#### Derive clause
203+
204+
Derive clauses instruct the code generator to generate code for a type so that it is an instance of a class.
205+
206+
```text
207+
deriveclause -> 'derive' constraint
208+
```
209+
210+
Note the code generation of a type for a class is implemented via builtin derivation rules (which developers may extend).
211+
212+
### Syntax reference
213+
214+
The summarized productions of a LambdaBuffers Frontend file is as follows.
215+
216+
```text
217+
module -> 'module' longmodulename { import } { statement }
218+
219+
import -> 'import' [ 'qualified' ] longmodulename [ 'as' longmodulename ] [ importspec ]
220+
importspec -> '(' [ { typename ',' } typename [','] ] ')'
221+
222+
statement -> typedef
223+
| classdef
224+
| instanceclause
225+
| deriveclause
226+
227+
typedef -> prodtypedef | sumtypedef | recordtypedef | opaquetypedef
228+
229+
prodtypedef -> 'prod' typename { varname } '=' prod
230+
prod -> { typeexp }
231+
typeexp -> varname
232+
| longtypename
233+
| '(' prod ')'
234+
235+
sumtypedef -> 'sum' typename { varname } '=' sum
236+
sum -> sumconstructor { '|' sumconstructor }
237+
sumconstructor -> typename prod
238+
239+
recordtypedef -> 'record' typename { varname } '=' record
240+
record -> '{' [ field { ',' field } ] '}'
241+
field -> fieldname ':' prod
242+
243+
opaquetypedef -> 'opaque' typename { varname }
244+
245+
classdef -> 'class' [ constraintexps '<=' ] classname { varname }
246+
constraintexp -> classref { varname }
247+
| '(' constraintexps ')'
248+
constraintexps -> [ constraintexp { ',' constraintexp } ]
249+
250+
instanceclause -> 'instance' constraint [ ':-' constraintexps ]
251+
constraint -> classref { typeexp }
252+
253+
deriveclause -> 'derive' constraint
254+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
module GoodInstance
2+
3+
instance MyClass A
4+
5+
class MyClass a
6+
7+
sum A = A
8+
9+
-- if we're wondering why this test case is here, previous parser versions
10+
-- confused 'instance' with 'import' and reported an unexpected 'n' in the
11+
-- 'instance' keyword.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
2+
-- Some documentation here
3+
4+
module ModuleDocumentation
5+
6+
-- More documentation
7+
sum A = A
8+
9+
10+
-- Woo hoo, documentation is great
11+
-- (who reads it anyways)
12+
13+
-- dog pomeranian yorkie maltese

0 commit comments

Comments
 (0)