Skip to content

Add lecture notes for the runtime lecture #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 140 additions & 0 deletions 17-runtime/runtime.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
% Implementing Haskell: the Runtime System by Edward Z Yang
% Thomas Dimson (Scriber)
% May 27, 2014

# Implementing Haskell: The Runtime System

A guest lecture by Edward Z Yang, scribed by Thomas Dimson.

Previously, we have discussed the details of the Haskell compiler
but did not discuss another large component of Haskell: the runtime system.
For example, what makes Haskell green threads cheaper than implementations
in other languages? Although we may not hack the GHC runtime, the internals
are "good for the soul" and helps us understand other systems like the JVM.


# What is a run-time system?

* Runtimes are a blob of code which sits between C client code and
compiled Haskell code
* The runtime handles things like garbage collection, thread scheduling,
dynamic linking, software transactional memory, profiling, etc.
* The Haskell run-time also includes a bytecode interpreter if you use
GHCi

# Garbage collection

* Garbage collection allows us to pretend programs have infinite memory
and allocate indefinitely
* Instead of explicitly freeing memory, we reclaim dead data
* One approach: reference counting.
* We store a count along with each object, representing the number of pointers
still pointing to it. When the count goes to zero, we free the memory.
* Unfortunately this can't handled cyclical references and may cost more than
expected if we deallocate a large structure.
* Upgraded approach: mark and sweep
* Mark phase: stop the world, go over the heap and mark objects which are alive (i.e. reachable
from the root set of memory)
* Sweep phase: clear free memory that wasn't marked. Unfortunately, this causes fragmentation
and needs to sweep the entire heap. To reallocate, one needs to a traverse a list of free memory
* Fragmentation can be partially solved by having a compaction phase

## Generational copying collector
* A modern approach utilized by GHC and the JVM
* Generational hypothesis: "most objects die young".
* In other words, if you allocated the memory recently then it is very likely to
disappear.
* Especially true for functional languages because most (or all) data is
immutable
* Cheney's algorithm
* Maintain a "from space" and a "to space". The "from space" contains
objects that are both dead and alive while the "to space" will only
contain alive references.
* Begin with "evacuation phase" where we move an object from the "from
space" to the"to space". We set up a forwarding pointer in the old
location to the new location
* We don't naively recurse because this may use a large stack. Instead, we
use a queue-based approach liked breadth-first search.
* Next phase is "scavenge" where we start modifying pointers. We maintain a
scavenge pointer which advances over objects in the "to space". Everything
less than the scavenge pointer has been rewritten with new pointers
* After all is complete, we change the "to space" into the "from space"

* Generational portion
* Instead of just having one "from" or "to" space, we have different spaces for
different generations.
* Fresh objects are allocated in the nursery. If an object from the nursery
survives garbage collection, it is promoted to an older generation

* Copying collectors come with advantages
* The more garbage you have, the faster it runs since it only traverses live
objects.
* Free memory is always contiguous

* We can perform GC whenever the free heap pointer advances past the limit of
the heap (super easy check)

## Write barriers and purity
* If generational garbage collectors are so good, why doesn't every language use
them?
* You need to know all the pointers in an object, which can be difficult to
maintain.
* We assume that nothing from an older generation points to the nursery. This
makes sense in a purely functional language, but doesn't hold in the
presence of mutation (even in Haskell there are IORefs)
* Solution: we can maintain "mutable set" which also gets traced during a
minor GC. This complicates generational GC and forces an extra pointer write
for mutable memory.
* Fortunately, in pure languages mutation is rare. IORefs are slow to begin
with, and lazy mutation (thunks) can be specialized

* Special thunk behavior
* When a thunk gets evaluated, it is immutable afterwards. Thus, when the
thunk gets evaluated we can immediately promote the pointed objects into
an older generation
* Doesn't work for IORefs: they are expected to be mutated a lot, and so we
would end up promoted lots of things into the older generation

## Parallel garbage collection
* Key idea: split the heap into blocks and parallelize the scavenge process
* Contrast to _concurrent_ GC, which doesn't stop the world. Parallel garbage
collection still stops the world
* Problem: two gc threads might try to scavenge something that points to the
same object, forcing us to take a lock
* Fortunately, if the object pointed to is pure we can just copy the object
twice and allow thread to race without locks


# Scheduler
* Schedulers are the heart of the run-time system. It is responsible for calling
into Haskell code, which will eventually yield back to the scheduler
* In Haskell, we can force a yield by setting the heap limit to zero and the
thread will yield for garbage collection.
* Each thread contains a stack object pointer with all the stack frames from the
thread. Thus, it can resume where it left off when it is scheduled again
* Threads are fast to allocate because they only consist of a small initial
stack object (lives on the heap, collected as part of the young generation)
* Scheduler inner loop, single threaded edition:
* Thread queue maintains all the threads that need to run
* Scheduler picks up a thread, waits for the thread to run for a while then
interrupts and puts it back into the queue
* Scheduler inner loop, multi threaded edition:
* Each scheduler runs in its own OS thread
* If FFI is marked as safe, then it gives up the scheduler lock while
running. Unsafe doesn't give up the lock, so probably shouldn't block.
* Garbage collection always takes locks

## MVars
* MVars are implemented as a struct with a value and a queue of threads blocks
on it
* These threads are in a "blocked queue" instead of the run queue, so blocked
threads are never scheduled
* If an MVar is dead and there are still waiters, then we have a provable
deadlock. The GHC runtime detects this and starts throwing exceptions

## Scheduler take-aways
* Everything live on the heap
* Purity means that most code is threadsafe by default
* Both rust and go have tried to use segmented stacks (i.e. allocated and
grown) but gave them up because they are too slow. Design decisions have to
be made to enable segmented stacks.