The code in this repository is split in two parts:
- A library part, whose
Cargo.toml
is found at the root of the repository and whose source code is insrc
. - Binaries, found in the
bin
directory.
The coding rules of these two parts differ. The code in the library part provides a set of unopinionated tools (more information below), while the binaries take an opinionated approach as to how a node should behave.
Most of the rules below only apply to the library part, which also contains the vast majority of the code. The code in the binaries mostly consists in gluing together pieces of code found in the library part.
In general, the binaries should be seen more as examples of how to use the library part, rather than actual projects on their own.
The source code of the smoldot library offers a set of tools. Each Rust module provides one specific feature, documented in the module-level documentation, and does not, in particular, attempt to play a specific role in a binary.
For example, there isn't any module that is the code that verifies block, instead there is a module that verifies blocks. It is not excluded for other modules, that also verify blocks, to be added in the future, for example with a different API or with different performance characteristics. It would then be up to the user to choose which block verification module to use.
In the smoldot library, functions are not a way to hide behind an opaque API which operations are performed. Instead, all functions must precisely document everything they do. Calling a function is a way to improve readability, and not a way to delegate the way the program should behave.
When applicable, code in the library part must not have any side effect and must only ever return an output that directly depends on its inputs.
The code in the smoldot library must also, as much as possible, not rely on the presence of an operating system. For example, it must not attempt to open files, spawn threads, create sockets, or get the current time. Smoldot tries as much as possible to be a program in the mathematical sense of the term, where the output only depends on the input.
Note: While the code in the smoldot library isn't marked as
#![no_std]
yet, it is a medium-term objective.
There exists one exception at the moment in the form of randomness generation in cryptographic signatures. In order to respect the above rule, one would need to ask the user to inject a random value through a public API. However, if this is not done correctly by the user, it might result in their private key(s) getting leaked. It is unclear whether this exception will stay in the long term.
In practice:
- No global variables or thread-local variables (except for niche optimizations).
- Never sleep the current thread (directly or indirectly). Every function that involves waiting for something must be asynchronous.
- Do not: spawn threads, read files, create sockets, get the current time, generate a random number (apart for the aforementioned exception).
- Dependency injection is almost always a bad thing. Any complex trait definition ("complex" here meaning "more complex than the ones found in the standard library such as
Clone
orEq
") is forbidden.
Note: These rules apply only to the library part of the code. The binaries is where the glue between the operating system and the library actually happens, in other words the binaries is where threads are spawned, sockets created, and so on.
It is important, when debugging a program, to understand what is going on in this program. For many projects this is done by adding logging, gathering metrics, and so on.
In the smoldot library, any form of logging or metrics gathering is forbidden. These aspects must be scoped entirely to the binaries. Libraries such as log
, slog
or tracing
are banned from the library part of the code.
In exchange, the code in the smoldot library must be as predictable as possible. It is encouraged to split functions that perform multiple steps at once into multiple smaller functions that each perform one step. This way, the upper layers can be aware of when each step starts and ends.
The number one objective of this code base is to conform to the Substrate/Polkadot specs and not have any bug or security issue. Beyond that, the most important metric of quality of this source code is how easy it is to understand.
The reference point of this metric is the documentation generated by cargo doc
. It can more or less be seen as cargo-doc-driven development. The documentation of the main
is continuously published here.
In practice:
- Code must be properly documented, and the context for why the code exists should be given.
- Examples should be written as much as possible.
- When possible, use types found in the standard library rather than types defined locally or defined in other libraries. For example, always use
[u8; 32]
rather thanH256
. In particular, try as much as possible to not expose types of third-party libraries in the public API. - Do not try too hard to apply the "Don't Repeat Yourself" principle. Having to jump to a different file to understand what is going on is a big hit to the objective of readability.
- Trait definitions are only ever allowed for implementation details. Custom traits must not be exposed in any public API.
- The code base should not be split into multiple crates unless there is a good reason to (improving the compilation time would be considered a good reason only if objective measurements show big differences).
- Macros, custom derives, and procedural macros are allowed only if it is straight-forward for a human being to understand which code the macro generates on usage.
Substrate is organized around components that are considered as core components of blockchains in general: a database, a client (which manages blocks on top of the database), the networking, a transaction pool, and so on. These components are tightly coupled together and would be very difficult to extract individually.
The blockchain-related logic is plugged on top of these core components and can, however, be changed. One can, for example, remove everything related to the Babe consensus algorithm and replace with by another consensus algorithm.
smoldot, on the other hand, is implemented following the current state of the Substrate/Polkadot specification, and assumes that this specification will never change. This assumption allows, in turn, for better readability and more flexibility when it comes to the purely engineering aspects of the codebase.
For example, the code that decodes block headers is written in a way that would be relatively painful (but straight-forward) to modify, if the format of a block header ever changed. However, smoldot simply assumes that the format of block headers will rarely, if ever, change.
In particular, there is an assumption that the list of consensus algorithms is known in advance and will rarely change. Smoldot prefers the explicitness of code specific to every single consensus code, rather than possibly giving a fake impression to the user that they can write any custom consensus algorithm without having to tweak the core components to function accordingly.
Code must not panic as a result of unexpected input from the user or from the network.
However, code should panic if its internal consistency is compromised. In other words, if the only possible reason for the panic is a bug in the logic of the code.
The reasons are as follow:
- If it is known that a program no longer runs according to expectations, it is more sane to restart it.
- End users tend to not care about error messages as long as things continue working. An inconvenience such as crashing increases the chances of an issue being reported rather than ignored.
- Attempting to be tolerant towards semi-corrupted states complicates the code, potentially leading to additional bugs even when the state isn't corrupted.
While there is no rule in this source code about unwrap
, the programmer is expected to think about every single unwrap()
that they write and is concious that a None
or an Err
cannot happen unless a bug is present.