toylisp devlog

toylisp is a to-be DSL for my game SnowRL.

Background

I wanted to add scripting support for my game (SnowRL) and GameLisp looked nice, but it didn’t have :named arg syntax. Then I suddenly wanted to make my own Lisp: toylisp (tlp).

I did Crafting Interpreters (part II: tree-walk interpreter) in years ago. I made an Emacs package in ELisp (toy-scheme), still I know little about Lisp. I’d have to struggle a lot.

Initial, abandoned approach

Compilation passes: source string → AST → bytecode. This approach was abandoned in favor of following rust-analyzer source code.

Details

=== Lexer (tokenizer)

Jun 9, 2021

I added a WIP lexer (tokenizer) which converts given string (&str) into a vector of tokens.
- The lexer handles non-streaming input/output for simplicity.
- The lexer handles UTF-8 string as bytes (as &[u8]), not as Iterator<char> because we’re only interested in ASCII characters while lexing.
- Each token slices the source string with ByteSpan { lo, hi }.
Jun 12, 2021

Just a note: nice resource from rustc dev guide: Lexing and Parsing.
Jun 17, 2021

I parsed Vec<Token> into a hierarchy of tokens. I’m not sure if it’s good idea to stick with tokens and source string though; the API is already uncomfortable.

=== Compiler and bytecode virtual machine

Jun 17, 2021

I added a simple VM which can calculate arithmetics. It’s a stack-based VM as one in the book (Crafting Interpreters).

I also added a simple compiler, which converts AST to bytecode. Now (/ (- 64.0 32.0) 2) evaluates to 16.0.

Setting up frontend

I started to learn from rust-analyzer.

CST

I want my ideal lang to be statically typed, and the parser should be IDE-oriented. toylisp is not that language, still it would be great if I can learn from rust-analyzer. EDIT: ToyLisp has static typing today.

Jun 21, 2021

Accoording to the doc, rust-analyzer converts source file string into a lossless (concrete) syntax tree with rowan ^[1].

CST is a tree representation of text (syntax), where each element ^[2] is tagged with syntax kind s. One good thing about CST is that each element is homogeneous and untyped, so we can easily create CST even for invalid source text in a grammer.

Today I parsed simple arithmetics into CST with rowan. I can’t run the code yet (and for a while).
Jun 25, 2021

Today I added a basic AST. In the context of rowan, an AST node is just a wrapper around a CST node with accessor methods; AST node is always backed by a CST node! For example, a Literal CST node may be wrapped as an Expr AST node.

Language server and syntax highlighting (tree-sitter)

I’ll develop both the language and a language server as the same time.

Jul 1, 2021

I’m making a simple language server using tower-lsp. It got connected to VSCode:

Awesome!

Next thing to do was syntax highlighting. But as of today, LS can’t provide with syntax highlighting ^[3].

OMG!

And every text editor uses different regular expression for highlights. But don’t too worry, tree-sitter would be the write-once-run-everywhere syntax highlighter! I made a simple tree-sitter project and it runs on Emacs:

E p i c o !
Jul 4, 2021

The language server is working. Summon EoF error!

There are lots more things to, such as syncing the buffers, but now the front-end development is ready, so every work could directly improve the user experience. Let’s become a toylisp user!
Jul 11, 2021

I hear semantic tokens of language server can provide syntax highlighting without highlighting by regex. But I could not let semantic_highlight_full called somehow. I’ll leave it for later day.

HIR

Introducing `salsa`

Jul 17, 2021

rust-analyzer is built around salsa. salsa is an incremental computation framework. I expect it’s a wizard’s magic, just like rowan is!

By the way, rowan and salsa are written in 4,000 lines of Rust. I also counted rust-analyzer source lines:

Details

crate            lines
----------------------
base_db          1014
cfg              781
flycheck         347
hir              5695
hir_def          21240
hir_expand       3579
hir_ty           26269
ide              26290
ide_assists      33940
ide_completion   14292
ide_db           14116
ide_diagnostics  4427
ide_ssr          3798
mbe              5458
parser           4886
paths            227
proc_macro_api   766
proc_macro_srv   3830
proc_macro_test  3
profile          716
project_model    2111
rust-analyzer    13137
stdx             542
syntax           11753
test_utils       1333
text_edit        186
toolchain        60
tt               497
vfs              881
vfs-notify       219
----------------------
sum              202393

`hir_def` not implemented

Aug 1, 2021

Me of this day tried to make notes about ra hir-def, but it looked too complex to wrap my head around it.
Aug 28, 2021

Mun Programming Language has document where Salsa integration is explained.

`hir_def` implemented

Aug 21, 2022

Hey it took me a year to come back to toylisp. I learned incremental development from the compilerbook and I made up tlp::hir_def. Now I can run a simple toylisp program:

(proc main ()
    (let a 10)
    (+ a 2))
;; => 2

The module hierarchy shows how I understand hir_def today. It’s basically about items and bodies:

[source]
hir_def
├── body
│   ├── expr.rs
│   ├── expr_scope.rs
│   └── pat.rs
├── db
├── ids
├── item_list
│   └── item.rs
└── lower
    ├── collect_item_scope.rs
    ├── lower_body.rs
    └── lower_item.rs

Hey it looks rather simpler than ra hir_def.

EDIT: The body / item separation was not accurate; they share some common language constrcuts. For example, procedures use patterns for parameters.

(Not) Looking into `hir_ty`

I want to add support for basic primitive types: i32, f32 and bool. I need to know the type of variable to apply suitable arithmetic operators. So I’m tucking into the`hir-ty` module.

Aug 21, 2022

hir_ty database returns InferenceResult, which assignes type information to every expression and pattern:

#[derive(Debug)]
pub struct InferenceResult {
    // Its's basically:
    pub type_of_expr: ArenaMap<Idx<Expr>, Ty>,
    pub type_of_pat: ArenaMap<Idx<Pat>, Ty>,
}

`salsa-2022`

Aug 29, 2022

salsa-2022 is coming! Tracking issue. The only advanced code base based on it out there is dada. I looked into dada and got the basic idea of salsa-2022.

In the context of rust-analyzer, salsa-2022 much simplifies the hir_def crate. We don’t have to keep holding ItemTree. Each Item is indexed by their name as #[id]. We don’t have to mess around with data location representations.

Accessing ID-relevant data is trivial. I’m re-implementing hir_def alternative (just called ir), and here’s a simple test case:

use tlp::Db;

#[test]
fn items() {
    let db = &mut Db::default();

    let src = r"(proc f ())
                (proc g ())
                (proc h ())";

    let f = db.new_input_file("main.tlp", src.to_string());
    let items = f.items(db);

    assert_eq!(
        &items
            .iter()
            .map(|i| i.name(db).as_str(db))
            .collect::<Vec<_>>(),
        &["f", "g", "h"],
    );
}

Wow it’s too easy! I’m liking salsa-2022.

Sep 23, 2022

I implemented a bunch of basic language constructs, along with a simple type inference system. For example, the following codes work today:
```
;; We have `let`, `while`, comparison operators and `set`:
(let a 0)
(while (< a 3)
    (set a (+ a 1)))
;; => 3
```
```
;; Procedure call works:
(proc fib (x)
    (cond ((= x 0) 0)
          ((= x 1) 1)
          (true (+ (fib (- x 1)) (fib (- x 2))))))

(proc main ()
    (fib 10))
```
The type inference system is roguhly based on the MinCaml crash course. It often works as type validation system without much inferencing.

The idea to to compare expression types with expected types. For example, the predicate of while expression must have bool type. If not, we emit an error and continue the analysis. More interesting case would be cond. If cond is an expression, each cond case must return the same type. And because cond cases must be comprehensive, we require true case.

Function parameter types are restricted to i32 for now. I will add syntax for annotating parameter types.
Oct 2, 2022

Parameter type annotation is ready:
```
(proc fib (x:i32)
    (cond ((= x 0) 0)
          ((= x 1) 1)
          (true (+ (fib (- x 1)) (fib (- x 2))))))

(proc main ()
    (fib 10))
```
Type mismatches are reported and printed beautifully:
main.tlp
```
(proc main ()
    (+ 4.5 3))
```
Figure 1. Like Rust

Figure 2. Like Rust and ariadne

Figure 3. Diagnostic in multiple lines / multiple diagnostics

Colors / bolds are by colored.

1. rowan was doing aggressive optimization: deduplication of subtree and use of thin pointers. I couldn’t do better than that, so I decided to just use rowan instead of re-writing it.

2. element = sub tree (node) or leaf (token)

3. It was wrong; see Jul 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!