diff --git a/content/en/docs/contributing/coding-conventions.md b/content/en/docs/contributing/coding-conventions.md new file mode 100644 index 00000000..afede49f --- /dev/null +++ b/content/en/docs/contributing/coding-conventions.md @@ -0,0 +1,823 @@ +# Unikraft: Coding Conventions (RFC) + +Coding conventions are important to maintain consistency and readability in software development, making code easier to understand, debug, review and maintain by multiple developers over time. + +These are conventions for writing code and naming and placing things in Unikraft. +Maintainers would be the ones who judge if there are certain corner cases that a rule doesn’t need to be obeyed. +Add some examples with extreme cases. + +## Basics + +The layout uses the following basic rules. +You may install [editorconfig](https://editorconfig.org/) as there is a `.editorconfig` file in the top-level dir to automatically enforce these. + +* Unikraft uses **tabs + spaces** for indentation, with tabs being 8 characters wide. +Tabs first, further indentation less than 8 characters are done with spaces (Linux style). +* Source files are **80 columns** wide. + * There are some scenarios where this limit could be passed, due to [readability reasons](https://github.com/torvalds/linux/commit/bdc48fa11e46f867ea4d75fa59ee87a7f48be144). + * Add a WARNING (80 characters) and an ERROR (85 characters) level. +* No trailing whitespaces. +* Insert a final newline. + +Always run [support/scripts/checkpatch.uk](https://github.com/unikraft/unikraft/blob/staging/support/scripts/checkpatch.uk) to make sure your source file complies with the latest Unikraft coding standards. + +### License Header + +Each file should have an SPDX identifier and license comment. +There is an empty newline after the license. +Unikraft uses the [BSD-3-Clause](https://github.com/unikraft/unikraft/blob/staging/COPYING.md). + +```c +/* SPDX-License-Identifier: BSD-3-Clause */ +/* Copyright (c) 2023, Unikraft GmbH and The Unikraft Authors. + * Licensed under the BSD-3-Clause License (the "License"). + * You may not use this file except in compliance with the License. + */ + +#include +``` + +### Using Foreign Code + +When using foreign code, always make sure that the code uses a **compatible (permissive) license**. +We have a **strong preference** for BSD-3-Clause. +Other compatible licenses are, for example: Public domain, [BSD-2-Clause](https://opensource.org/license/bsd-2-clause/), [MIT](https://opensource.org/license/mit/), [Apache 2.0](https://opensource.org/license/apache-2-0/). +Incompatible licenses are, for example: [GPL](https://opensource.org/license/gpl-3-0/), [LGPL](https://opensource.org/license/lgpl-3-0/). + + +{{% alert theme="info" %}} +**Warning** +This in particular means that basing your work on Linux kernel sources is not permitted ⚠️ +{{% /alert %}} + +### Minor foreign contributions + +If you based only parts of your implementation on some other author’s work (e.g., importing a function, using an algorithm etc.) you can add a separate block after the license header or before the particular piece of code: + +```c +/* You may not use this file except in compliance with the License. + */ + +/* Parts of the implementation are based on + * PROJECT (VERSION) by AUTHOR + * Git repository, commit ID + * LICENSE IF NEEDED + */ + +#include +``` + +The version information is the software version number. +If a Git repository is available, provide a link to the repository and the commit hash used. +If the foreign license is the same as the source file’s license you do not need to provide the additional license copy. + +### Major foreign contributions + +If you import a foreign source file, keep the existing author and license information. +Note that when you make changes to the file, some licenses (e.g., `Apache 2.0`) require that you explicitly state that you changed the file. + +### Adding Additional Authors + +If your changes are small (e.g., a bug fix or a small extension) and it is not required by the license, consider not adding your name to the list of authors (and the corresponding copyright) if present. +This keeps the information about who the original authors are more prominently visible. +Your changes will be tracked by the Git history anyways and show up with git blame. + +## Coding Style + +For improved readability and maintainability your code should adhere to the rules laid out in this chapter. +The conventions are similar to the Linux kernel coding style to make developing code for Unikraft a familiar experience for people who have already worked with the Linux kernel code. +Thus, some of the examples have been taken from the Linux kernel coding style guide. + +### Braces + +Opening braces are placed at the end of the line that opens the new scope. Closing braces are put on a separate line: + +```c +if (condition) { + /* … */ +} +``` + +```c +struct my_struct { + /* … */ +}; +``` + +However, there are some exceptions. +Opening braces for functions are placed on a new line: + +```c +void func(void) +{ + /* … */ +} +``` + +Separate lines for the closing brace also not used in the following cases to reduce the number of empty lines: + +```c +if (condition) { + /* … */ +} else if (condition) { + /* … */ +} else { + /* … */ +} +``` + +```c +do { + /* … */ +} while (condition); +``` + +**Do not use braces** for single line statements: + +```c +if (condition) + single_statement(); +else + single_statement(); +``` + +```c +while (condition) + single_statement(); +``` + +But use braces for the `else block` if the `if block` requires them: + +```c +if (condition) { + single_statement(); + single_statement(); +} else { + single_statement(); +} +``` + +### Spaces + +Generally, put a space after keywords like `if`, `switch`, `case`, `for`, `do`, `while`, `return`. + +```c +if (condition) ✅ +``` + +```c +if(condition) ❌ +``` + +Also put a space after a comma: + +```c +void func(int arg0, int arg2) ✅ +``` + +```c +void func(int arg0,int arg2) ❌ +``` + +And around most operators (except unary operators): + +```c +a = b + c; +``` + +```c +if (a == b) +``` + +```c +a = (condition) ? 1 : 2; +``` + +For unary operators and structure member operators **do not use** a space: + +```c +p = &var; +``` + +```c +if (!condition) +``` + +```c +a = ~c & b; +``` + +```c +a->field = 0xdeadb0b0; +``` + +Do not put spaces around keywords like `sizeof`, `typeof`, `alignof`,`__attribute__` and around (inside) parenthesized expressions: + +```c +s = sizeof(p); ✅ +``` + +```c +s = sizeof ( p ); ❌ +``` + +```c +if ( condition ) ❌ +``` + +Always put the asterisk (\*) for pointer definitions adjacent to the data or function name. +This is to avoid the misconception that char\* a, b defines two pointers, whereas in fact only a is a pointer. + +```c +char *myfunc(void *p) +{ + char *a, *b, c; + /* ... */ + + return (char *) p; +} +``` + +### Comments + +* Always use multi-line comments even for single line comments: + +```c +/* Single line comment */ +if (condition) ✅ +``` + +```c +int var; /* comment */ ✅ +``` + +```c +int var; // comment ❌ +``` + +* Use a separate line to close the comment for actual multi-line comments. +There is a leading space to align the lines: + +```c +/* This is a multi line comment because it is so incredibly + * useful that the contents does not fit into one line + */ +``` + +* Trailing dots are not necessary, unless there is a sentence that you would end with a do. +Just keep it consistent. + +* Use spaces instead of tabs to indent within comments. + +* Start comments with capital letters in case of sentences. + +### Function Documentation + +Functions that are exported by a library or define an otherwise publicly visible interface need to be properly documented in the header file. +Use [doxygen-style](https://www.doxygen.nl/) comments for this. +Note the double-asterisk (\*\*) at the beginning: + +```c +/** + * Tries to allocate the requested number of bytes from the default memory + * allocator. The actual amount of allocated memory can be smaller. + * + * @param size + * Number of bytes to allocate + * @param[out] p + * Pointer to the newly allocated memory on success, NULL otherwise + * + * @return + * The number of bytes actually allocated + */ +size_t try_alloc(size_t size, void **p); +``` + +* Use dots when ending a sentence or when a new sentence starts afterwards. + +* Start the description on a new line for doxygen variables (`@param`, `@return`). + +* TODO: Add info on referencing Doxygen variables. + +* The return can be omitted for trivial getter functions: + +```c +/** + * Returns the size of the given buffer + */ +size_t get_size(void *p); +``` + +For return values, use the convention: +* \- (0): Successfully initialized +* \- (<0): Negative value with error code + +Basically, list return values as list items. + +### Struct Member Documentation + +* Use double asterisk for multi-line comments (/**) + +* When having flags, point to defines of macros in the same or another file. + +* Check how Doxygen renders macro comments and then decide on the best approach to format. + +### Definitions + +#### Variable Definitions + +Variables are defined at the beginning of the block. +The block is separated from the code with a blank line. + +```c +int my_func(void) +{ + int *ptr; + int i, num; + + ptr = malloc(10 * sizeof(int)); + if (unlikely(!ptr)) + return -ENOMEM; + + for (i = 0; i < 10; i++) + num = /* ... */; +} +✅ +``` + +```c +int my_func(void) +{ + void *ptr = malloc(10 * sizeof(int)); + + if (unlikely(!ptr)) + return -ENOMEM; + + for (int i = 0; i < 10; i++) + int num = /* ... */; +} +❌ +``` + +#### Naming + +Identifiers in Unikraft use `snake case`: + +```c +void my_func(void); /* Snake Case */ ✅ +``` + +```c +void myFunc(void); /* Camel Case */ ❌ +``` + +```c +void MyFunc(void); /* Pascal Case */ ❌ +``` + +An exception to this are constants and preprocessor macros, which should use all uppercase and words separated by underscores: + +```c +const int MAGIC_VALUE = 0xdeadb0b0; +``` + +```c +#define IS_DEAD(bobo) ((bobo) == MAGIC_VALUE) +``` + +Macro names may also be regular function names if they are just wrappers for actual functions (see `uk_vprintk()` below) or if the functionality might be either implemented as macro or function (e.g., depending on architecture). + +### Leading Underscores + +A leading single underscore is **discouraged** but may be used selectively to mark identifiers which should not be used directly, like in the header ``: + +```c +#if CONFIG_LIBUKDEBUG_PRINTK +/* please use the uk_printk(), uk_vprintk() macros because they compile in the + * function calls only if the configured debug level requires it + */ +void _uk_vprintk(int lvl, const char *libname, const char *srcname, + unsigned int srcline, const char *fmt, va_list ap); + + +#define uk_vprintk(lvl, fmt, ap) \ + do { \ + if ((lvl) <= KLVL_MAX) \ + _uk_vprintk((lvl), __STR_LIBNAME__, __STR_BASENAME__, \ + __LINE__, (fmt), ap); \ + } while (0) +``` + +Prefer using `UK_PAGE_SIZE` instead of `_PAGE_SIZE` if required to avoid a symbol clash. + +However, leading underscores should **not** be used to name private functions within the same source file. +The `static` keyword already provides the same information. + +```c +static void _i_am_private(int arg) +{ + /* ... */ +} + +void i_am_public(int arg) +{ + _i_am_private(arg); +} +❌ +``` + +Leading double underscores may only be used for: + +* low-level native Unikraft types like `__u64` and `__spinlock` +* header inclusion guards +* preprocessor macros to add another level of indirection for macro resolution + +{{% alert theme="info" %}} +Rule of thumb: include `libc` headers first and then Unikraft headers. +{{% /alert %}} + +TODO: Do we replace `__nonnull` with `uk_nonnull` and `__NULL` with `UK_NULL`? + +### Prefixes + +Unikraft uses name prefixes to express that an identifier is part of a certain API, component or namespace. + +```text +__ +``` + +Unikraft reserves the following API prefixes to describe **functions**, **function-like macros** and **data types** of the Unikraft API. This also applies (CAPITALIZED) for macros: + +* `uk_` - denotes a public Unikraft API at the library level. +The implementation is usually found in one of the libraries in `/lib` or defined in the headers in `/include/uk`. +For example: `uk_alloc()` + +* `ukplat_` - denotes a public Unikraft API, whose implementation is usually provided in `/plat` and depends on the platform (e.g., KVM) for which the Unikernel is compiled. +For example: `ukplat_page_map()` + +* `ukarch_` - denotes a public Unikraft API, whose implementation is usually provided in `/arch` and depends on the architecture (e.g., x86-64) for which the Unikernel is compiled. +For example: `ukarch_tlb_flush()` + +Use a verb at the end of the function name (`_init`, `_clean`, `_add,` `_clone`). + +The component prefix makes it easy to identify related symbols. +For example, functions related to thread management use the thread prefix and functions related to page mapping use the page prefix: + +```c +void uk_thread_block(struct uk_thread *thread); +``` + +```c +int ukplat_page_map(...); +``` + +The component prefix should also be used for private functions, but without the public API prefix (cc = coding conventions): + +```c +static int cc_myfunc(int arg) +{ + /* ... */ +} + +int uk_cc_myfunc(int arg) +{ + return cc_myfunc(arg); +} +``` + +If the library defines functions that are private, but architecture specific, they should follow the pattern used for the public Unikraft API (here ccarch_): + +```c +static int cc_myfunc(int arg) +{ + if (arg == 42) + ccarch_do_something(); +} +``` + +{{% alert theme="info" %}} +* Public is defined in exportsyms.uk + +* Private is not as part of exportsyms.uk or part of localsyms.uk. + +* If there is no `exportsyms.uk` defined, then all functions are exported. +{{% /alert %}} + +#### Typedefs + +*Rule of thumb*: **Do not** use typedefs. + +**Do not** use the _t suffix when defining types. +It seems that `_t` types are POSIX-reserved. +Consider using `_func` suffix if you require defining function pointer data types (those situations should be rare - such as passing a function pointer to another function or **using it in multiple places**). + +```c +typedef int (*cmp_func)(int, int); +void sort(int *array1, cmp_func *f) +``` + +### Usage of Unikraft Types + +Unikraft defines aliases for most of the basic data types (e.g., `__u8`, `__sz`). +Use them where possible, when the component can live without `libc dependency` (`nolibc`, `musl`, `newlib`); use them instead of libc data types such as uint32_t. +For sure, architecture, platform code, drivers. + +#### sizeof + +Prefer to take the size of a variable instead of the variable’s type so that the size remains valid if the variable’s type changes: + +```c +struct my_struct p; +s = sizeof(p); + +struct my_struct *p; +s = sizeof(*p); +✅ +``` + +```c +struct my_struct p; +s = sizeof(struct my_struct); +❌ +``` + +### Return Values + +For functions that should return error values, use negative error codes from `errno.h` if possible. +These are well known and provide adequate values in most cases. +Avoid defining custom error codes. +Use 0 for success. +This is to be used both for Unikraft APIs and for lower-level components. + +{{% alert theme="info" %}} +**Rationale**: errno values have good coverage in the types of errors. +It is easier to use a “standard” set of values (errno) to signal errors, even in lower-level components. +{{% /alert %}} + +```c +#include + +int do_something(void arg) +{ + if (arg > 42) + return -EINVAL; /* failure */ + + return 0; /* success */ +} +``` + +Include `errno.h` in the source code files, not in the API header files. +Platform API shouldn’t provide `errno.h`. + +Boolean-type functions should use `__bool` as return type and indicate: + +* true (success) with a non-zero value +* false (failure) with 0. + +{{% alert theme="info" %}} +Do not assume that true is 1; true may be a non-zero value. +When using the return value of a boolean function, use `if (result)` or `if (!result)` to cover the case for true being a non-zero value. +{{% /alert %}} + +```c +__bool is_bobo(int bobo) +{ + if (bobo == 0xb0b0) + return 1; /* true, it is b0b0 */ + + + return 0; /* false */ +} +``` + +#### Switch Statements + +The cases in switch statements are **not indented**. + +```c +switch (var) { +case 0: + /* … */ + break; +case 1: + /* … */ + break; +default: + /* … */ + break; +} +``` + +### Goto Statements + +Goto labels should **NOT** be indented. +The name of the label should be: `err_…` or `out_…`, depending on the context. + +Generally, goto should be avoided for simple “retry”-style constructs. + Use a while-loop instead. + +Gotos can be a good solution to implement a common error/finalization path. +This reduces the probability that some error paths may leak resources or do not release all locks (in the correct order). +This can easily happen as error paths are usually less well tested. +If, however, possible, return directly. + +```c +int cc_myfunc(void) +{ + void *a, *b; + int rc = 0; + + a = malloc(...); + if (unlikely(!a)) { + rc = -ENOMEM; + goto err_out; + } + + b = malloc(...); + if (unlikely(!b)) { + rc = -ENOMEM; + goto err_free_a; + } + + /* ... */ + free(b); + +err_free_a: + free(a); +err_out: + return rc; +} +✅ +``` + +```c +int cc_myfunc(void) +{ + void *a, *b; + + a = malloc(...); + if (unlikely(!a)) + return -ENOMEM; + + + b = malloc(...); + if (unlikely(!b)) { + free(a); + return -ENOMEM; + } + + /* ... */ + free(b); + free(a); + return 0; +} +❌ +``` + +### Assembly + +#### Code + +Indentation for ASM code: + +* Multi line (tab at the beginning, tab between instr. and operands, `\n` at the end) +* Could also remove tab but that means `\n\t` at the end, which is not nice, if at some point code is changed and label has to be added. + +#### Inline Assembly + +```c +__asm__ __volatile__ ( + “1: instr op, op\n” + “ instr2 op, op\n” + :::); +``` + +* Single-line +* With / without label +* `__asm__ __volatile__` vs. `asm volatile` +* Aim to add a comment for each inline assembly line, detailing what it does + * Ignore for repetitive operations (such as mov-s) + * Ignore long lines for inline assembly, when including comments + * Aim to have short comments + We could also have multi-line comments + +### Breaking Long Lines + + +1. Break before function name (after function return type) +1. Align function arguments after break to open parenthesis + + +For functions, align arguments on different line: + +```c +UK_SYSCALL_R_DEFINE(int, getpeername, int, sock, +» » struct sockaddr *restrict, addr, +» » socklen_t *restrict, addr_len) +``` + +For conditions, always align to the parenthesis. + +```c +if (a == 0 && \ + (b == 10 || \ + c == 12) || \ + d == 15) +``` + +Function name and first argument are atomic. + +Prefer the Linux style with the return type on the same line as the function name. +In case of long functions + first argument, you can break after the return type. + +### Preprocessor Macros + +This is referenced in TODO for prefixes (i.e. the use of `uk_`, `ukplat_`, `ukarch_` prefixes). +Recommendation for using macros as (inline) functions? + +* Follow the same conventions as functions +* This brings flexibility in switching between one or the other +* Some macros as functions would still be uppercase (such as `UK_MIN`, `UK_MAX`) +* Place each argument (that is a variable) between parentheses +* If using a variable multiple times, it’s better to use a temporary variable declaration in the macro that stores the variable + +### UK_ASSERT + +1. Proposal MR: Have UK_ASSERT even in the absence of ukdebug +Decision: Agree (part of the plat re-arch discussion) +Where possible use the compile time assert (UK_CTASSERT): CT for compile-time +1. Proposal MP: Guideline when using UK_ASSERT and when using UK_CRASH + UK_CRASH is for situations where you can not formalize an assertion + Something like UK_ASSERT(false) +1. MS: Some should be present in release builds +UK_BUG_ON should never be removed from the code (with condition) +The same for UK_CRASH (without condition) +Change behavior of UK_BUG_ON to always be part of the code, even in release builds + +### Parameter Validation + +Do rigorous parameter validation. +However, parameter validation should be done using assertions where the parameter can be expected to be not directly determined by external input. +This means in particular that `uk_`, `ukplat_`, and `ukarch_` APIs should use `UK_ASSERT()`: + +```c +int uk_cc_myfunc(int bobo) +{ + UK_ASSERT(bobo == 0xdead); + + /* do something */ +} +✅ +``` + +```c +int uk_cc_myfunc(int bobo) +{ + if (unlikely(bobo == 0xdead)) + return -EINVAL; + + /* do something */ +} +❌ +``` + +This is in contrast to, for example, POSIX, where explicit parameter validation is expected. +In a Unikernel, though, we see the OS components as an (internal) extension to the application. +Users of the APIs are thus expected to make sure only sane parameters are used just like with internal application functions. + +Generally, use the `unlikely()` macro when checking for error cases (only). +It is defined in ``. + +When working with the low-level Unikraft API, it is essential to validate parameters using UK_ASSERT when specific parameter ranges need to be enforced. +In cases where there is no explicit specification, performing an error check is recommended to ensure parameter validity. + +When working with higher-level APIs, parameter validation should be handled by returning appropriate error codes according to the relevant standard (such as POSIX). + +### Comment Keywords + +1. `NOTE`: Description of how the code works (when it isn't self evident) +1. `XXX:` Warning about possible pitfalls, can be used as `NOTE:XXX:` +1. `HACK`: Not very well written or malformed code to circumvent a problem/bug +Should be used as HACK:FIXME: +1. `FIXME`: This works, sort of, but it could be done better. (usually code written in a hurry that needs rewriting) +1. `BUG`: There is a problem here +1. `TODO`: No problem, but additional code needs to be written, usually when you are skipping something +1. `UNDONE`: A reversal or "roll back" of previous code +1. `OPTIMIZE`: Code can / should be optimized + +### Magic Numbers + +These are some magic numbers used in the Unikraft code: + +1. `UKPLAT_BOOTINFO_MAGIC` 0xB007B0B0 (Boot Bobo) +1. `FDT_MAGIC 0xd00dfeed` (FDT specification) +1. `UK_TP_HEADER_MAGIC` 0x64685254 (TRhd) +1. `UK_TP_DEF_MAGIC` 0x65645054 (TPde) +1. `CL_UKTLS_SANITY_MAGIC` 0xb0b0f00d (Bobo food) +1. `0xDEADB0B0`, `0xB0B0CAFE` (in ukvmem) +1. `MULTIBOOT_HEADER_MAGIC` 0x1BADB002 +1. `MULTIBOOT_BOOTLOADER_MAGIC` 0x2BADB002 + +{{% alert theme="info" %}} +When used in source code, the symbol of the magic number should be used, never the value. +{{% /alert %}}