Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIR #2702

Merged
merged 16 commits into from
Oct 26, 2023
Merged

BIR #2702

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions gcc/rust/Make-lang.in
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,9 @@ GRS_OBJS = \
rust/rust-hir-type-check-stmt.o \
rust/rust-hir-type-check-enumitem.o \
rust/rust-hir-type-check-implitem.o \
rust/rust-borrow-checker.o \
rust/rust-bir-builder-expr-stmt.o \
rust/rust-bir-dump.o \
rust/rust-hir-dot-operator.o \
rust/rust-hir-path-probe.o \
rust/rust-type-util.o \
Expand Down Expand Up @@ -371,6 +374,7 @@ RUST_INCLUDES = -I $(srcdir)/rust \
-I $(srcdir)/rust/checks/lints \
-I $(srcdir)/rust/checks/errors \
-I $(srcdir)/rust/checks/errors/privacy \
-I $(srcdir)/rust/checks/errors/borrowck \
-I $(srcdir)/rust/util \
-I $(srcdir)/rust/metadata \
-I $(srcdir)/../libgrust
Expand Down Expand Up @@ -448,6 +452,11 @@ rust/%.o: rust/checks/errors/privacy/%.cc
$(COMPILE) $(RUST_CXXFLAGS) $(RUST_INCLUDES) $<
$(POSTCOMPILE)

# build borrow checking pass files in rust folder
rust/%.o: rust/checks/errors/borrowck/%.cc
$(COMPILE) $(RUST_CXXFLAGS) $(RUST_INCLUDES) $<
$(POSTCOMPILE)

# build rust/metadata files in rust folder
rust/%.o: rust/metadata/%.cc
$(COMPILE) $(RUST_CXXFLAGS) $(RUST_INCLUDES) $<
Expand Down
188 changes: 188 additions & 0 deletions gcc/rust/checks/errors/borrowck/bir-design-notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
# Borrow-checker IR (BIR) design notes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have all these doc files in a doc/ subdir?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with either variant. If docs/ is more consistent, we can go with that.


## Design goals

Rust GCC project aims to use the [Polonius project](https://github.com/rust-lang/polonius) as its borrow-checker.
Polonius operates on a set of [facts](https://github.com/rust-lang/polonius/blob/master/polonius-engine/src/facts.rs) about the program to determine
the actual lifetimes of borrows.
As Polonius's primary analysis is location sensitive, the facts are tied to the program's control flow graph (CFG).
Unlike rustc, which has
its
own three address language specific representation called [MIR](https://rustc-dev-guide.rust-lang.org/mir/index.html), gccrs uses gcc's AST based
representation GENERIC.
GIMPLE (the three address IR) is generated from GENERIC inside GCC and can carry no language-specific information.
Therefore, we
need to generate our own representation of the program's CFG.
Since gccrs has no intention of having a MIR-like IR, the BIR is not to be used for
code generation.
Therefore, BIR carries only minimal information that is necessary for the borrow-checker and for BIR debugging.

Since BIR is in fact a dead branch of the compilation pipeline, the only way to verify its generations is through manual inspection.
To have some frame of reference for testing, BIR build and dump are carefully designed to resemble the textual dump of rustc's MIR as much as
possible.
This includes the style of the output, numbering of locals and order of basic blocks (when possible).

## BIR Dump Example

An example program calculating the i-th fibonacci number:

```rust

fn fib(i: usize) -> i32 {
if i == 0 || i == 1 {
1
} else {
fib(i - 1) + fib(i - 2)
}
}
```

Here is an example of BIR dump (note: this needs to be updated regularly):

```
fn fib(_1: usize) -> i32 {
let _0: i32;
let _2: i32;
let _3: bool;
let _4: bool;
let _5: bool;
let _6: usize;
let _7: i32;
let _8: usize;
let _9: i32;
let _10: i32;

bb0: {
_4 = Operator(_1, const usize);
switchInt(_4) -> [bb1, bb2];
}

bb1: {
_3 = const bool;
goto -> bb3;
}

bb2: {
_5 = Operator(_1, const usize);
_3 = _5;
goto -> bb3;
}

bb3: {
switchInt(_3) -> [bb4, bb7];
}

bb4: {
_2 = const i32;
goto -> bb8;
}

bb5: {
_6 = Operator(_1, const usize);
_7 = Call(fib)(_6, ) -> [bb6];
}

bb6: {
_8 = Operator(_1, const usize);
_9 = Call(fib)(_8, ) -> [bb7];
}

bb7: {
_10 = Operator(_7, _9);
_2 = _10;
goto -> bb8;
}

bb8: {
_0 = _2;
return;
}
}


```

The dump consists of:

- A function header with arguments: `fn fib(_1: usize) -> i32 { ... }`.
- Declaration of locals: `let _0: i32;`, where `_0` is the return value (even if it is of the unit type). Arguments are not listed here, they are
listed in the function header.
- A list of basic blocks: `bb0: { ... }`. The basic block name is the `bb` prefix followed by a number.
- Each basic block consists of a list of BIR nodes (instructions). Instruction can be either assigned to a local (place) or be a statement.
Instructions take locals (places) as arguments.
- Each basic block is terminated with a control flow instruction followed by a list of destinations:
- `goto -> bb3;` - a goto instruction with a single destination.
- `switchInt(_3) -> [bb4, bb7];` - a switch instruction with multiple destinations.
- `return;` - a return instruction with no destinations.
- `Call(fib)(_6, ) -> [bb6];` - a call instruction with a single destination. This section is prepared for panic handling.

## BIR Structure

BIR structure is defined in `gcc/rust/checks/errors/borrowck/rust-bir.h`. It is heavily inspired by rustc's MIR. The main difference is that BIR
drastically reduces the amount of information carried to only borrow-checking relevant information.

As borrow-checking is performed on each function independently, BIR represents a single function (`struct Function`). A `Function` consists of a list
of basic blocks, list of arguments (for dump only) and place database, which keeps track of locals.

### Basic Blocks

A basic block is identified by its index in the function's basic block list. It contains a list of BIR nodes (instructions) and a list of successor
basic block indices in CFG.

### BIR Nodes (Instructions)

BIR nodes are of three categories:

- An assignment of an expression to a local (place).
- A control flow operation (switch, return).
- A special node (not executable) node, which carries additional information for borrow-checking (`StorageDead`, `StorageLive`).

#### Expressions

Expressions represent the executable parts of the rust code. Many different Rust contracts are represented by a single expression, as only data (and
lifetime) flow needs to be tracked.

- `InitializerExpr` represents any kind of struct initialization. It can be either explicit (struct expression) or implicit (range expression,
e.g. `0..=5`).
- `Operator<ARITY>` represents any kind of operation, except the following, where special information is needed either for borrow-checking or for
better debugging.
- `BorrowExpr` represents a borrow operation.
- `AssignmentExpr` holds a place for a node of assignment (i.e., no operation is done on the place, it is just assigned).
- `CallExpr` represents a function call.
- For functions, the callable is represented by a constant place (see below). (E.i. all calls use the same constant place.)
- For closures and function pointers, the callable is represented by a (non-constant) place.

### Places

Places are defined in `gcc/rust/checks/errors/borrowck/rust-bir-place.h`.

Places represent locals (variables), their field, and constants. They are identified by their index (`PlaceId`) in the function's place database. For
better dump correspondence to MIR, constants use a different index range.

Non-constant places are created according to Polonius path [documentation](https://rust-lang.github.io/polonius/rules/atoms.html). The following
grammar describes
possible path elements:

```
Path = Variable
| Path "." Field // field access
| Path "[" "]" // index
| "*" Path
```

It is important to highlight that different fields are assigned to different places; however, all indices are assigned to the same place.
Also, to match the output of rustc.
In dump, paths contain at most one dereference and are split otherwise.
Same paths always result in the same place.

Variables are identified by `AST` `NodeId`. Fields indexes are taken from `TyTy` types.

Each place holds indices to its next relatives (in the path tree), `TyTy` type, lifetime and information whether the type can be copies or it needs to
be moved. Not that unlike rustc, we copy any time we can (for simplicity), while rustc prefers to move if possible (only a single copy is held).

## BIR Builders

There are multiple builders (visitor classes) for BIR based on what context is needed in them.
provides the entry point that handles function parameters and return values, and it creates the BIR main unit `Function`.
`rust-bir-internal.h` provides abstract builder classes with common helper methods for all builder and for expression builders.
Specific builders are then defined for expressions+statements, lazy boolean expressions, patterns, and struct initialization.
40 changes: 40 additions & 0 deletions gcc/rust/checks/errors/borrowck/dev-notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Borrow-checker Development Notes

## Testing BIR building

There is no way to test BIR building directly, since it is a dead branch of the compilation pipeline.
The only way to verify its generations is through manual inspection.
The best way to inspect the BIR is to compare it with rustc's MIR.

The following command will compile a rust file into a library and dump its MIR:

```shell
rustc --crate-type=lib -A dead_code -A unused -Z dump-mir="" <file>
```

The MIR dump directory `mir_dump` contains a dump before and after each MIR pass.
We are interested in the one used for borrow-checking, which is called `<crate>.<function>.002-000.analysis.after.mir`.

BIR dump is emitted to a `bir_dump` directory. With the following naming scheme: `<crate>.<function>.bir.dump`.

At this point, MIR dump contains helper constructions that BIR does not contain yet (like storage live/dead annotations). To remove them from the MIR dump, run the following command:

```shell
awk -i inplace '!/^\s*(\/\/|StorageLive|StorageDead|FakeRead)/' mir_dump/*
```

To get the BIR dump into a similar format, run the following command:

```shell
./crab1 <file> -frust-incomplete-and-experimental-compiler-do-not-use -frust-borrowcheck -frust-dump-bir -frust-compile-until=compilation
```


## TODO

- scope handling, cleanup
- switch coercions to adjustments from typechecking
- operator overloading
- match selection
- let without an initializer
- lifetime parameters
Loading
Loading