Skip to content
This repository has been archived by the owner on Jul 8, 2021. It is now read-only.

Waffle architecture overview

playX edited this page Mar 9, 2020 · 2 revisions

Waffle is a virtual machine designed for dynamically typed programming languages.

Compilation pipeline

Waffle bytecode represented as simple register based dynamically typed IR.

When the byte code compiles, it is an SSA. When bytecode is emitted then it's time to invoke module optimizations. These optimizations currently implemented:

  • CFG Simplifier
  • Tail call elimination
  • Function inlining (W.I.P)
  • Constant folding (W.I.P)
  • CSE (W.I.P)

But it's important to mention another thing: Before every optimizations register allocation is executed on SSA bytecode representation. Current register allocation algorithm is graph coloring which is slow and complex, but does it's job really nicely. In future Waffle may include linear scan algorithm for fast compilation times.

Execution pipeline

Waffle supports a multi-tiered architecture - one which utilizes an interpreter for very fast startup, fast processes scheduler, parallel JIT workers to generate optimized code, and GC which you have chosen.

When Waffle notices that a function or loop is being invoked multiple times in the interpreter, it queues up the function in background JIT compiler pipeline. Once the JIT'ed code is ready, Waffle replaces the function or loop with native code.

Waffle backgound JIT compiler when first executed generates unoptimized C code and after a ~50 function or loop invocations starts to generate optimized C code.

Processes

Instead of using native threads or fibers Waffle uses lightweight "green" processes (just like in Erlang). Each process is scheduled into process queue and then executed in primary or blocking worker. This allows easy concurrency without any opportunity to get data races in program.

JIT Compiler (W.I.P)

Waffle has a two-tier JIT compiler. On the same concurrent background thread,Waffle has a Fusion JIT compiler, which generates highly optimized C code, and a Baseline JIT compiler, whis is essentially a less optimizing version of the Fusion JIT. In the execution pipeline, Waffle first switches over from executing a function in the interpreter to executing baseline JIT'ed code, then to fully optimized JIT'ed code once generated by Fusion JIT. In most cases, Baseline JIT compilation costs less time than a full JIT compilation. The other inherent advantage of Baseline JIT is that in case a bailout happens, the function execution can utilize the faster switchover from interpreter to Baseline JIT, till the time the fully optimized re-JIT'ed code is available. The Baseline JIT'ed code execution pipeline also continues to collect type information and other useful data for optimizations in Fusion JIT.

Waffle JIT compiles bytecode into C and uses c2mir/rcc at baseline tier for very fast compilation, and GCC/Clang on Fusion JIT tier.

Garbage Collector(s)

Waffle has a multiple GCs implementations for your choice. Every GC implemented in Waffle does not provide parallel phases because collection happens per process.

Incremental (generational) mark-sweep

Waffle has a generational and incremental mark-and-sweep garbage collector. This GC is fast and simple,but does not provide heap defragmentation and allocation times may be really big after some time.

Ieiunium GC

Ieiunium is generational garbage collector that consists of 3 spaces: nursery, intermediate, and old spaces. Nursery space uses scavenging,intermediate space is semispace and uses copying algorithm and old space is just mark&sweep. It takes 5 generations for object to get into old space and most objects should die in nursery or intermediate set,so old space should not be really fragmented and thus we do not need compacting/copying there.

Copying GC

This GC implements simple Cheney's algorithm. I do not see any point in explaining it's algorithm there, it's better to read about it there

On The Fly GC

On The Fly garbage collector is a concurrent garbage collector. This GC is incremental and when collection happens GC would perform an initial marking pass that will create snapshot of the roots and flip colors so new objects allocated when collection happens will not be sweeped. While initial marking is done GC schedules marking task into background marking tasks executor. Marking is done per process and does not support parallel marking. While marking is finished process which is collected suspended again and sweep phase is scheduled into "Sweeper", a background sweep phase executor. Sweeping is done on 2,4 or 8 threads and does not depends on process, sweep is work on one memory page (16kb or 32kb) scanning for dead objects and then finalizing them and adding free memory to freelist.

After sweep pass process suspended and all new freelist constructed in Sweeper is "merged" into process local memory allocator and process execution is resumed.

Clone this wiki locally