diff --git a/content/en/docs/contributing/coding-conventions.md b/content/en/docs/contributing/coding-conventions.md new file mode 100644 index 00000000..58983f7b --- /dev/null +++ b/content/en/docs/contributing/coding-conventions.md @@ -0,0 +1,823 @@ +# Unikraft: Coding Conventions (RFC) + +These are conventions for writing code and naming and placing things in Unikraft. +Maintainers would be the ones who judge if there are certain corner cases that a rule doesn’t need to be obeyed. +Add some examples with extreme cases. + +## Basics + +The layout uses the following basic rules. +You may install [editorconfig](https://editorconfig.org/) as there is a `.editorconfig` file in the top-level dir to automatically enforce these. + +* Unikraft uses **tabs + space** for indentation, with tabs being 8 characters wide. +Tabs first, further indentation less than 8 characters are done with spaces (Linux style). +* Source files are **80 columns** wide. + * Add some examples where a limit could be passed, due to readability reasons. + * Add a WARNING (80 characters) and an ERROR (85 characters) level. +* No trailing whitespaces. +* Insert a final newline. + +Always run [support/scripts/checkpatch.pl](https://github.com/unikraft/unikraft/blob/stable/support/scripts/checkpatch.pl) to make sure your source file complies with the latest Unikraft coding standards. + +### License Header + +Each file should have an SPDX identifier and license comment. +There is an empty newline after the license. +Unikraft uses the [BSD-3-Clause](https://opensource.org/license/bsd-3-clause/). + +```c +/* SPDX-License-Identifier: BSD-3-Clause */ +/* Copyright (c) 2023, Unikraft GmbH and The Unikraft Authors. + * Licensed under the BSD-3-Clause License (the "License"). + * You may not use this file except in compliance with the License. + */ + +#include +``` + +### Using Foreign Code + +When using foreign code, always make sure that the code uses a **compatible (permissive) license**. +We have a **strong preference** for BSD-3-Clause. +Other compatible licenses are, for example: Public domain, [BSD-2-Clause](https://opensource.org/license/bsd-2-clause/), [MIT](https://opensource.org/license/mit/), [Apache 2.0](https://opensource.org/license/apache-2-0/). +Incompatible licenses are, for example: [GPL](https://opensource.org/license/gpl-3-0/), [LGPL](https://opensource.org/license/lgpl-3-0/). + +> **Warning** +This in particular means that basing your work on Linux kernel sources is not permitted ⚠️ + +### Minor foreign contributions + +If you based only parts of your implementation on some other author’s work (e.g., importing a function, using an algorithm etc.) you can add a separate block after the license header or before the particular piece of code: + +```c +/* You may not use this file except in compliance with the License. + */ + +/* Parts of the implementation are based on + * PROJECT (VERSION) by AUTHOR + * Git repository, commit ID + * LICENSE IF NEEDED + */ + +#include +``` + +The version information is the software version number. +If a Git repository is available, provide a link to the repository and the commit hash used. +If the foreign license is the same as the source file’s license you do not need to provide the additional license copy. + +### Major foreign contributions + +If you import a foreign source file, keep the existing author and license information. +Note that when you make changes to the file, some licenses (e.g., `Apache 2.0`) require that you explicitly state that you changed the file. + +### Adding Additional Authors + +If your changes are small (e.g., a bug fix or a small extension) and it is not required by the license, consider not adding your name to the list of authors (and the corresponding copyright) if present. +This keeps the information about who the original authors are more prominently visible. +Your changes will be tracked by the Git history anyways and show up with git blame. + +## Coding Style + +For improved readability and maintainability your code should adhere to the rules laid out in this chapter. +The conventions are similar to the Linux kernel coding style to make developing code for Unikraft a familiar experience for people who have already worked with the Linux kernel code. +Thus, some of the examples have been taken from the Linux kernel coding style guide. + +### Braces + +Opening braces are placed at the end of the line that opens the new scope. Closing braces are put on a separate line: + +```c +if (condition) { + /* … */ +} +``` + +```c +struct my_struct { + /* … */ +}; +``` + +However, there are some exceptions. +Opening braces for functions are placed on a new line: + +```c +void func(void) +{ + /* … */ +} +``` + +Separate lines for the closing brace also not used in the following cases to reduce the number of empty lines: + +```c +if (condition) { + /* … */ +} else if (condition) { + /* … */ +} else { + /* … */ +} +``` + +```c +do { + /* … */ +} while (condition); +``` + +**Do not use braces** for single line statements: + +```c +if (condition) + single_statement(); +else + single_statement(); +``` + +```c +while (condition) + single_statement(); +``` + +But use braces for the `else block` if the `if block` requires them: + +```c +if (condition) { + single_statement(); + single_statement(); +} else { + single_statement(); +} +``` + +### Spaces + +Generally, put a space after keywords like `if`, `switch`, `case`, `for`, `do`, `while`, `return`. + +```c +if (condition) ✅ +``` + +```c +if(condition) ❌ +``` + +Also put a space after a comma: + +```c +void func(int arg0, int arg2) ✅ +``` + +```c +void func(int arg0,int arg2) ❌ +``` + +And around most operators (except unary operators): + +```c +a = b + c; +``` + +```c +if (a == b) +``` + +```c +a = (condition) ? 1 : 2; +``` + +For unary operators and structure member operators **do not use** a space: + +```c +p = &var; +``` + +```c +if (!condition) +``` + +```c +a = ~c & b; +``` + +```c +a->field = 0xdeadb0b0; +``` + +Do not put spaces around keywords like *sizeof*, *typeof*, *alignof*, *__attribute__* and around (inside) parenthesized expressions: + +```c +s = sizeof(p); ✅ +``` + +```c +s = sizeof ( p ); ❌ +``` + +```c +if ( condition ) ❌ +``` + +Always put the asterisk (\*) for pointer definitions adjacent to the data or function name. +This is to avoid the misconception that char\* a, b defines two pointers, whereas in fact only a is a pointer. + +```c +char *myfunc(void *p) +{ + char *a, *b, c; + /* ... */ + + return (char *) p; +} +``` + +### Comments + +* Always use multi-line comments even for single line comments: + +```c +/* Single line comment */ +if (condition) ✅ +``` + +```c +int var; /* comment */ ✅ +``` + +```c +int var; // comment ❌ +``` + +* Use a separate line to close the comment for actual multi-line comments. +There is a leading space to align the lines: + +```c +/* This is a multi line comment because it is so incredibly + * useful that the contents does not fit into one line + */ +``` + +* Trailing dots are not necessary, unless there is a sentence that you would end with a do. +Just keep it consistent. + +* Use spaces instead of tabs to indent within comments. + +* Start comments with capital letters in case of sentences. + +### Function Documentation + +Functions that are exported by a library or define an otherwise publicly visible interface need to be properly documented in the header file. +Use [doxygen-style](https://www.doxygen.nl/) comments for this. +Note the double-asterisk (\*\*) at the beginning: + +```c +/** + * Tries to allocate the requested number of bytes from the default memory + * allocator. The actual amount of allocated memory can be smaller. + * + * @param size + * Number of bytes to allocate + * @param[out] p + * Pointer to the newly allocated memory on success, NULL otherwise + * + * @return + * The number of bytes actually allocated + */ +size_t try_alloc(size_t size, void **p); +``` + +* Use dots when ending a sentence or when a new sentence starts afterwards. + +* Start the description on a new line for doxygen variables (@param, @return). + +* TODO: Add info on referencing Doxygen variables. + +* The return can be omitted for trivial getter functions: + +```c +/** + * Returns the size of the given buffer + */ +size_t get_size(void *p); +``` + +For return values, use the convention: +* \- (0): Successfully initialized +* \- (<0): Negative value with error code + +Basically, list return values as list items. + +### Struct Member Documentation + +* Use double asterisk for multi-line comments (/**) + +* When having flags, point to defines of macros in the same or another file. + +* Check how Doxygen renders macro comments and then decide on the best approach to format. + +### Definitions + +#### Variable Definitions + +Variables are defined at the beginning of the block. +The block is separated from the code with a blank line. + +```c +int my_func(void) +{ + int *ptr; + int i, num; + + + ptr = malloc(10 * sizeof(int)); + if (unlikely(!ptr)) + return -ENOMEM; + + + for (i = 0; i < 10; i++) + num = /* ... */; +} +✅ +``` + +```c +int my_func(void) +{ + void *ptr = malloc(10 * sizeof(int)); + if (unlikely(!ptr)) + return -ENOMEM; + + for (int i = 0; i < 10; i++) + int num = /* ... */; +} +❌ +``` + +#### Naming + +Identifiers in Unikraft use `snake case`: + +```c +void my_func(void); /* Snake Case */ ✅ +``` + +```c +void myFunc(void); /* Camel Case */ ❌ +``` + +```c +void MyFunc(void); /* Pascal Case */ ❌ +``` + +An exception to this are constants and preprocessor macros, which should use all uppercase and words separated by underscores: + +```c +const int MAGIC_VALUE = 0xdeadb0b0; +``` + +```c +#define IS_DEAD(bobo) ((bobo) == MAGIC_VALUE) +``` + +Macro names may also be regular function names if they are just wrappers for actual functions (see `uk_vprintk()` below) or if the functionality might be either implemented as macro or function (e.g., depending on architecture). + +### Leading Underscores + +A leading single underscore is **discouraged** but may be used selectively to mark identifiers which should not be used directly, like in the header ``: + +```c +#if CONFIG_LIBUKDEBUG_PRINTK +/* please use the uk_printk(), uk_vprintk() macros because they compile in the + * function calls only if the configured debug level requires it + */ +void _uk_vprintk(int lvl, const char *libname, const char *srcname, + unsigned int srcline, const char *fmt, va_list ap); + + +#define uk_vprintk(lvl, fmt, ap) \ + do { \ + if ((lvl) <= KLVL_MAX) \ + _uk_vprintk((lvl), __STR_LIBNAME__, __STR_BASENAME__, \ + __LINE__, (fmt), ap); \ + } while (0) +``` + +Prefer using `UK_PAGE_SIZE` instead of `_PAGE_SIZE` if required to avoid a symbol clash. + +However, leading underscores should **not** be used to name private functions within the same source file. +The `static` keyword already provides the same information. + +```c +static void _i_am_private(int arg) +{ + /* ... */ +} + +void i_am_public(int arg) +{ + _i_am_private(arg); +} +❌ +``` + +Leading double underscores may only be used for: + +* low-level native Unikraft types like `__u64` and `__spinlock` +* header inclusion guards +* preprocessor macros to add another level of indirection for macro resolution + +> Note: +Rule of thumb: include `libc` headers first and then Unikraft headers. + +TODO: Do we replace `__nonnull` with `uk_nonnull` and `__NULL` with `UK_NULL`? + +### Prefixes + +Unikraft uses name prefixes to express that an identifier is part of a certain API, component or namespace. + +```console +__ +``` + +Unikraft reserves the following API prefixes to describe **functions**, **function-like macros** and **data types** of the Unikraft API. This also applies (CAPITALIZED) for macros: + +* `uk_` - denotes a public Unikraft API at the library level. +The implementation is usually found in one of the libraries in `/lib` or defined in the headers in /include/uk. +For example: `uk_alloc()` + +* `ukplat_` - denotes a public Unikraft API, whose implementation is usually provided in /plat and depends on the platform (e.g., KVM) for which the Unikernel is compiled. +For example: `ukplat_page_map()` + +* `ukarch_` - denotes a public Unikraft API, whose implementation is usually provided in /arch and depends on the architecture (e.g., x86-64) for which the Unikernel is compiled. +For example: `ukarch_tlb_flush()` + +Use a verb at the end of the function name (`_init`, `_clean`, `_add,` `_clone`). + +The component prefix makes it easy to identify related symbols. For example, functions related to thread management use the thread prefix and functions related to page mapping use the page prefix: + +```c +void uk_thread_block(struct uk_thread *thread); +``` + +```c +int ukplat_page_map(...); +``` + +The component prefix should also be used for private functions, but without the public API prefix (cc = coding conventions): + +```c +static int cc_myfunc(int arg) +{ + /* ... */ +} + +int uk_cc_myfunc(int arg) +{ + return cc_myfunc(arg); +} +``` + +If the library defines functions that are private, but architecture specific, they should follow the pattern used for the public Unikraft API (here ccarch_): + +```c +static int cc_myfunc(int arg) +{ + if (arg == 42) + ccarch_do_something(); +} +``` + +* **Public is defined in exportsyms.uk** + +* **Private is not as part of exportsyms.uk or part of localsyms.uk.** + +#### Typedefs + +*Rule of thumb*: **Do not** use typedefs. + +**Do not** use the _t suffix when defining types. +It seems that `_t` types are POSIX-reserved. +Consider using `_func` suffix if you require defining function pointer data types (those situations should be rare - such as passing a function pointer to another function or **using it in multiple places**). + +```c +typedef int (*cmp_func)(int, int); +void sort(int *array1, cmp_func *f) +``` + +### Usage of Unikraft Types + +Unikraft defines aliases for most of the basic data types (e.g., `__u8`, `__sz`). +Use them where possible, when the component can live without `libc dependency` (`nolibc`, `musl`, `newlib`); use them instead of libc data types such as uint32_t. +For sure, architecture, platform code, drivers. + +#### sizeof + +Prefer to take the size of a variable instead of the variable’s type so that the size remains valid if the variable’s type changes: + +```c +struct my_struct p; +s = sizeof(p); + +struct my_struct *p; +s = sizeof(*p); +✅ +``` + +```c +struct my_struct p; +s = sizeof(struct my_struct); +❌ +``` + +### Return Values + +For functions that should return error values, use negative error codes from `errno.h` if possible. +These are well known and provide adequate values in most cases. +Avoid defining custom error codes. +Use 0 for success. +This is to be used both for Unikraft APIs and for lower-level components. + +**Rationale**: errno values have good coverage in the types of errors. +It is easier to use a “standard” set of values (errno) to signal errors, even in lower-level components. + +```c +#include + +int do_something(void arg) +{ + if (arg > 42) + return -EINVAL; /* failure */ + + return 0; /* success */ +} +``` + +Include `errno.h` in the source code files, not in the API header files. +Platform API shouldn’t provide `errno.h`. + +Boolean-type functions should use `__bool` as return type and indicate: + +* true (success) with a non-zero value +* false (failure) with 0. + +> Do not assume that true is 1; true may be a non-zero value. +When using the return value of a boolean function, use `if (result)` or `if (!result)` to cover the case for true being a non-zero value. + +```c +__bool is_bobo(int bobo) +{ + if (bobo == 0xb0b0) + return 1; /* true, it is b0b0 */ + + + return 0; /* false */ +} +``` + +#### Switch Statements + +The cases in switch statements are **not indented**. + +```c +switch (var) { +case 0: + /* … */ + break; +case 1: + /* … */ + break; +default: + /* … */ + break; +} +``` + +### Goto Statements + +Goto labels should not **be indented**. +The name of the label should be: `err_…` or `out_…`, depending on the context. + +Generally, goto should be avoided for simple “retry”-style constructs. + Use a while-loop instead. + +Gotos can be a good solution to implement a common error/finalization path. +This reduces the probability that some error paths may leak resources or do not release all locks (in the correct order). +This can easily happen as error paths are usually less well tested. If, however, possible, return directly. + +```c +int cc_myfunc(void) +{ + void *a, *b; + int rc = 0; + + a = malloc(...); + if (unlikely(!a)) { + rc = -ENOMEM; + goto err_out; + } + + b = malloc(...); + if (unlikely(!b)) { + rc = -ENOMEM; + goto err_free_a; + } + + /* ... */ + free(b); + +err_free_a: + free(a); +err_out: + return rc; +} +✅ +``` + +```c +int cc_myfunc(void) +{ + void *a, *b; + + a = malloc(...); + if (unlikely(!a)) + return -ENOMEM; + + + b = malloc(...); + if (unlikely(!b)) { + free(a); + return -ENOMEM; + } + + /* ... */ + free(b); + free(a); + return 0; +} +❌ +``` + +### Assembly + +#### Code + +Indentation for ASM code: + +* Multi line (tab at the beginning, tab between instr. and operands, `\n` at the end) +* Could also remove tab but that means `\n\t` at the end, which is not nice, if at some point code is changed and label has to be added. + +#### Inline Assembly + +```c +__asm__ __volatile__ ( + “1: instr op, op\n” + “ instr2 op, op\n” + :::); +``` + +* Single-line +* With / without label +* `__asm__ __volatile__` vs. `asm volatile` +* Aim to add a comment for each inline assembly line, detailing what it does + * Ignore for repetitive operations (such as mov-s) + * Ignore long lines for inline assembly, when including comments + * Aim to have short comments + We could also have multi-line comments + +### Breaking Long Lines + +TODO: Two questions: + +* When do we break? +When reaching 80 characters? +* Where do we break? +* How do we align after the break? + +1. Proposal MR: Break before function name (after function return type) +1. Proposal MR: Align function arguments after break to open parenthesis +1. Decision: agreed + +TODO: How do we align when we break on conditions, for loops? + +For functions, align arguments on different line: + +```c +UK_SYSCALL_R_DEFINE(int, getpeername, int, sock, +» » struct sockaddr *restrict, addr, +» » socklen_t *restrict, addr_len) +``` + +For conditions, always aligned to the parenthesis. +To be voted via chat. +One breaking rule to rule them all. +TODO: Wait for Michalis’s insight. + +```c +if (a == 0 && \ + (b == 10 || \ + c == 12) || \ + d == 15) + +if (a == 0 \ + && (b == 10 \ + || c == 12) \ + || d == 15) +``` + +Wait for Michalis’s insight. + +Function name and first argument are atomic. + +Prefer the Linux style with the return type on the same line as the function name. +In case of long functions + first argument, you can break after the return type. + +### Preprocessor Macros + +This is referenced in TODO for prefixes (i.e. the use of `uk_`, `ukplat_`, `ukarch_` prefixes). +Recommendation for using macros as (inline) functions? + +* Follow the same conventions as functions +* This brings flexibility in switching between one or the other +* Some macros as functions would still be uppercase (such as `UK_MIN`, `UK_MAX`) +* Place each argument (that is a variable) between parentheses +* If using a variable multiple times, it’s better to use a temporary variable declaration in the macro that stores the variable + +### UK_ASSERT + +1. Proposal MR: Have UK_ASSERT even in the absence of ukdebug +Decision: Agree (part of the plat re-arch discussion) +Where possible use the compile time assert (UK_CTASSERT): CT for compile-time +1. Proposal MP: Guideline when using UK_ASSERT and when using UK_CRASH + UK_CRASH is for situations where you can not formalize an assertion + Something like UK_ASSERT(false) +1. MS: Some should be present in release builds +UK_BUG_ON should never be removed from the code (with condition) +The same for UK_CRASH (without condition) +Change behavior of UK_BUG_ON to always be part of the code, even in release builds + +### Parameter Validation + +Do rigorous parameter validation. +However, parameter validation should be done using assertions where the parameter can be expected to be not directly determined by external input. +This means in particular that `uk_`, `ukplat_`, and `ukarch_` APIs should use `UK_ASSERT()`: + +```c +int uk_cc_myfunc(int bobo) +{ + UK_ASSERT(bobo == 0xdead); + + /* do something */ +} +✅ +``` + +```c +int uk_cc_myfunc(int bobo) +{ + if (unlikely(bobo == 0xdead)) + return -EINVAL; + + /* do something */ +} +❌ +``` + +This is in contrast to, for example, POSIX, where explicit parameter validation is expected. +In a Unikernel, though, we see the OS components as an (internal) extension to the application. +Users of the APIs are thus expected to make sure only sane parameters are used just like with internal application functions. + +Generally, use the `unlikely()` macro when checking for error cases (only). +It is defined in ``. + +When working with the low-level Unikraft API, it is essential to validate parameters using UK_ASSERT when specific parameter ranges need to be enforced. In cases where there is no explicit specification, performing an error check is recommended to ensure parameter validity. + +When working with higher-level APIs, parameter validation should be handled by returning appropriate error codes according to the relevant standard (such as POSIX). + +### Comment Keywords + +1. `NOTE`: Description of how the code works (when it isn't self evident) +1. `XXX:` Warning about possible pitfalls, can be used as `NOTE:XXX:` +1. `HACK`: Not very well written or malformed code to circumvent a problem/bug +Should be used as HACK:FIXME: +1. `FIXME`: This works, sort of, but it could be done better. (usually code written in a hurry that needs rewriting) +1. `BUG`: There is a problem here +1. `TODO`: No problem, but additional code needs to be written, usually when you are skipping something +1. `UNDONE`: A reversal or "roll back" of previous code +1. `OPTIMIZE`: Code can / should be optimized + +### Magic Numbers + +These are some magic numbers used in the Unikraft code: + +1. `UKPLAT_BOOTINFO_MAGIC` 0xB007B0B0 (Boot Bobo) +1. `FDT_MAGIC 0xd00dfeed` (FDT specification) +1. `UK_TP_HEADER_MAGIC` 0x64685254 (TRhd) +1. `UK_TP_DEF_MAGIC` 0x65645054 (TPde) +1. `CL_UKTLS_SANITY_MAGIC` 0xb0b0f00d (Bobo food) +1. `0xDEADB0B0`, `0xB0B0CAFE ` (in ukvmem) +1. `MULTIBOOT_HEADER_MAGIC` 0x1BADB002 +1. `MULTIBOOT_BOOTLOADER_MAGIC` 0x2BADB002 +``` +