Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a simple example without Langium #59

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"typescript.tsdk": "node_modules/typescript/lib"
}
131 changes: 131 additions & 0 deletions examples/expression/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# A handwritten parser

This package contains a handwritten parser for a simple expression language.
The language supports:

- Variables declarations with the types `number` and `string`
- Variable Assignments
- Arithmetic expressions
- Print statements
- Expressions, like basic arithmetic operations, string concatenation, variable references, literals and parentheses

## How does it work?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice graphics, thanks a lot!


Parsing is a linear process that takes a string of text and produces a tree-like structure that represents the structure of the text.

```mermaid
flowchart LR
A[Lexer] --> B[Parser]
CC@{shape: brace-r, label: "Typir is applied here"} --> C
B --> C[Type System]
C --> D[Validator]

style C fill:#f9f,stroke:#333,stroke-width:4px
```

The following sections describe each step in the process.

### Lexer

**Input**: A string of text

**Output**: A list of tokens

**Task**: Splits the text to tokens and classifies each token.

```mermaid
flowchart LR
subgraph Text
A["variable = 123"]
end
A --> B[Lexer]
B --> Tokens
subgraph Tokens
T1[variable:ID]
T2[=:ASSIGN]
T3[123:NUMBER]
end
```

### Parser

**Input**: A list of tokens

**Output**: An Abstract Syntax Tree (AST)

**Task**: Takes token and arranges them as a tree.

```mermaid
flowchart LR
subgraph Tokens
T1[variable:ID]
T2[=:ASSIGN]
T3[123:NUMBER]
end

Tokens --> D[Parser]
subgraph AST
EE1[variable]
EE2[=]
EE3[123]
EE2 --> EE1
EE2 --> EE3
end
D --> AST
```

### Type system

**Input**: An AST

**Output**: A typed AST

**Task**: Assigns types to the nodes of the AST.

```mermaid
flowchart LR
subgraph AST
EE1[variable]
EE2[=]
EE3[123]
EE2 --> EE1
EE2 --> EE3
end
FF@{shape: brace-r, label: "described by Typir"} --> F
AST --> F[Type System]
F --> AST2
subgraph AST2["Typed AST"]
FF1[variable:STRING]
FF2[=]
FF3[123:NUMBER]
FF2 --> FF1
FF2 --> FF3
end

style F fill:#f9f,stroke:#333,stroke-width:4px
```

### Validator

**Input**: A typed AST

**Output**: a list of errors

**Task**: Checks if the AST is valid.

```mermaid
flowchart LR
subgraph AST["Typed AST"]
FF1[variable:STRING]
FF2[=]
FF3[123:NUMBER]
FF2 --> FF1
FF2 --> FF3
end
AST --> H[Validator]
H --> Errors
subgraph Errors
I1["Variable got wrong type assigned"]
end
style I1 fill:#fdd,stroke:#333,stroke-width:4px
```
30 changes: 30 additions & 0 deletions examples/expression/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"name": "typir-example-expression",
"displayName": "expression",
"version": "0.0.2",
"private": true,
"description": "",
"author": {
"name": "TypeFox",
"url": "https://www.typefox.io"
},
"license": "MIT",
"type": "module",
"engines": {
"vscode": "^1.67.0"
},
"volta": {
"node": "18.20.4",
"npm": "10.7.0"
},
"scripts": {
"build": "tsc -b tsconfig.json",
"clean": "shx rm -rf out",
"lint": "eslint src --ext ts",
"test": "vitest",
"watch": "concurrently -n tsc -c blue \"tsc -b tsconfig.json --watch\""
},
"dependencies": {
"typir": "0.1.2"
}
}
141 changes: 141 additions & 0 deletions examples/expression/src/ast.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
/******************************************************************************
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea: For the examples OX and LOX, we use the prefixes ox- and lox- for the typescript files. Does it makes sense to have a similar prefix (maybe expr- or expressions-) for your expressions examples as well?

* Copyright 2024 TypeFox GmbH
* This program and the accompanying materials are made available under the
* terms of the MIT License, which is available in the project root.
******************************************************************************/
export interface BinaryExpression {
type: 'binary';
left: Expression;
right: Expression;
op: '+'|'-'|'/'|'*'|'%';
}

export function isBinaryExpression(node: unknown): node is BinaryExpression {
return isAstNode(node) && node.type === 'binary';
}

export interface UnaryExpression {
type: 'unary';
operand: Expression;
op: '+'|'-';
}

export function isUnaryExpression(node: unknown): node is UnaryExpression {
return isAstNode(node) && node.type === 'unary';
}

export interface VariableUsage {
type: 'variable-usage';
ref: VariableDeclaration;
}


export function isVariableUsage(node: unknown): node is VariableUsage {
return isAstNode(node) && node.type === 'variable-usage';
}


export interface Numeric {
type: 'numeric';
value: number;
}

export function isNumeric(node: unknown): node is Numeric {
return isAstNode(node) && node.type === 'numeric';
}

export interface CharString {
type: 'string';
value: string;
}

export function isCharString(node: unknown): node is CharString {
return isAstNode(node) && node.type === 'string';
}

export type Expression = UnaryExpression | BinaryExpression | VariableUsage | Numeric | CharString;

export interface VariableDeclaration {
type: 'variable-declaration';
name: string;
value: Expression;
}

export function isVariableDeclaration(node: unknown): node is VariableDeclaration {
return isAstNode(node) && node.type === 'variable-declaration';
}

export interface Assignment {
type: 'assignment';
variable: VariableDeclaration;
value: Expression;
}

export function isAssignment(node: unknown): node is Assignment {
return isAstNode(node) && node.type === 'assignment';
}


export interface Printout {
type: 'printout';
value: Expression;
}

export function isPrintout(node: unknown): node is Printout {
return isAstNode(node) && node.type === 'printout';
}

export type Statement = VariableDeclaration | Printout | Assignment;

export type Model = Statement[];

export type Node = Expression | Printout | VariableDeclaration | Assignment;

export function isAstNode(node: unknown): node is Node {
return Object.getOwnPropertyNames(node).includes('type') && ['variable-usage', 'unary', 'binary', 'numeric', 'string', 'printout', 'variable-declaration', 'assignment'].includes((node as Node).type);
}

export namespace AST {
export function variable(name: string, value: Expression): VariableDeclaration {
return { type: 'variable-declaration', name, value };
}
export function assignment(variable: VariableDeclaration, value: Expression): Assignment {
return { type: 'assignment', variable, value };
}
export function printout(value: Expression): Printout {
return { type: 'printout', value };
}
export function num(value: number): Numeric {
return {
type: 'numeric',
value
};
}
export function string(value: string): CharString {
return {
type: 'string',
value
};
}
export function binary(left: Expression, op: BinaryExpression['op'], right: Expression): BinaryExpression {
return {
type: 'binary',
left,
op,
right
};
}
export function unary(op: UnaryExpression['op'], operand: Expression): UnaryExpression {
return {
type: 'unary',
op,
operand
};
}
export function useVariable(variable: VariableDeclaration): VariableUsage {
return {
ref: variable,
type: 'variable-usage'
};
}
}
68 changes: 68 additions & 0 deletions examples/expression/src/lexer.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
/******************************************************************************
* Copyright 2024 TypeFox GmbH
* This program and the accompanying materials are made available under the
* terms of the MIT License, which is available in the project root.
******************************************************************************/

/**
* This is a table of all token type definitions required to parse the language.
* The order is important! The last token type ERROR is meant to catch all bad input.
*/
const TokenDefinitions = {
WS: /\s+/,
VAR: /VAR/,
PRINT: /PRINT/,
LPAREN: /\(/,
RPAREN: /\)/,
ASSIGN: /=/,
SEMICOLON: /;/,
ID: /[A-Z_][A-Z_0-9]*/,
NUM: /[0-9]+/,
STRING: /"([^"\\]|\\["\\])*"/,
ADD_OP: /\+|-/,
MUL_OP: /\*|\/|%/,
ERROR: /./
} satisfies Record<string, RegExp>;

export type TokenType = keyof typeof TokenDefinitions;

export type Token = {
type: TokenType;
content: string;
};

/**
* A tokenizer (or lexer) takes a string, analyzes it and returns tokens representing the split text.
* Each token is meant as a piece of text with a special token type or character class.
* @param text
*/
export function* tokenize(text: string): Generator<Token, void> {
let position = 0;
const definitions = stickyfy(TokenDefinitions);
while(position < text.length) {
for (const [type, regexp] of Object.entries(definitions)) {
regexp.lastIndex = position;
const match = regexp.exec(text);
if(match) {
const content = match[0];
position += content.length;
yield {
type: type as TokenType,
content
};
break;
}
}
}
}

/**
* Stickify helps to transform the token type RegExps to become `sticky` (y) and `case-insensitive` (i).
* Sticky means that we can set the offset where the RegExp has to start matching.
*/
function stickyfy(definitions: typeof TokenDefinitions) {
return Object.fromEntries(
Object.entries(definitions)
.map(([name, regexp]) => [name, new RegExp(regexp, 'yi')])
) as typeof TokenDefinitions;
}
Loading