Skip to content

Latest commit

 

History

History
751 lines (592 loc) · 25.6 KB

CONTRIBUTING.md

File metadata and controls

751 lines (592 loc) · 25.6 KB

Contributing to rv32emu

👍🎉 First off, thanks for taking the time to contribute! 🎉👍

The following is a set of guidelines for contributing to rv32emu hosted on GitHub. These are mostly guidelines, not rules. Use your best judgment, and feel free to propose changes to this document in a pull request.

Issues

This project uses GitHub Issues to track ongoing development, discuss project plans, and keep track of bugs. Be sure to search for existing issues before you create another one.

Initially, it is advisable to create an issue on GitHub for bug reports, feature requests, or substantial pull requests, as this offers a platform for discussion with both the community and project maintainers.

Engaging in a conversation through a GitHub issue before making a contribution is crucial to ensure the acceptance of your work. We aim to prevent situations where significant effort is expended on a pull request that might not align with the project's design principles. For example, it might turn out that the feature you propose is more suited as an independent module that complements this project, in which case we would recommend that direction.

For minor corrections, such as typo fixes, small refactoring, or updates to documentation/comments, filing an issue is not typically necessary. What constitutes a "minor" fix involves discretion; however, examples include:

  • Correcting spelling mistakes
  • Minor code refactoring
  • Updating or editing documentation and comments

Nevertheless, there may be instances where, upon reviewing your pull requests, we might request an issue to be filed to facilitate discussion on broader design considerations.

Visit our Issues page on GitHub to search and submit.

Coding Convention

Contributions from developers across corporations, academia, and individuals are welcome. However, participation requires adherence to fundamental ground rules:

  • Code must strictly adhere to the established C coding style (refer to the guidelines below). While there is some flexibility in basic style, it is crucial to stick to the current coding standards. Complex algorithmic constructs without proper comments will not be accepted.
  • External pull requests should include thorough documentation in the pull request comments for consideration.
  • When composing documentation, code comments, and other materials in English, please adhere to the American English (en_US) dialect. This variant should be considered the standard for all documentation efforts. For instance, opt for "initialize" over "initialise" and "color" rather than "colour".

Software requirement: clang-format version 18 or later.

This repository consistently contains an up-to-date .clang-format file with rules that match the explained ones. For maintaining a uniform coding style, execute the command clang-format -i *.{c,h}.

Coding Style for Modern C

This coding style is a variant of the K&R style. Adhere to established practices while being open to innovation. Maintain consistency, adopt the latest C standards, and embrace modern compilers along with their advanced static analysis capabilities and sanitizers.

Indentation

In this coding style guide, the use of 4 spaces for indentation instead of tabs is strongly enforced to ensure consistency. Always apply a single space before and after comparison and assignment operators to maintain readable code. Additionally, it is crucial to include a single space after every comma. e.g.,

for (int i = 0; i < 10; i++) {
    printf("%d\n", i);
    /* some operations */
}

The tab character (ASCII 0x9) should never appear within any source code file. When indentation is needed in the source code, align using spaces instead. The width of the tab character varies by text editor and programmer preference, making consistent visual layout a continual challenge during code reviews and maintenance.

Line length

All lines should typically remain within 80 characters, with longer lines wrapped as needed. This practice is supported by several valid rationales:

  • It encourages developers to write concise code.
  • Smaller portions of information are easier for humans to process.
  • It assists users of vi/vim (and potentially other editors) who use vertical splits.
  • It is especially helpful for those who may want to print code on paper.

Comments

Multi-line comments should have the opening and closing characters on separate lines, with the content lines prefixed by a space and an asterisk (*) for alignment, e.g.,

/*
 * This is a multi-line comment.
 */

/* One line comment. */

Use multi-line comments for more elaborate descriptions or before significant logical blocks of code.

Single-line comments should be written in C89 style:

    return (uintptr_t) val;  /* return a bitfield */

Leave two spaces between the statement and the inline comment. Avoid commenting out code directly. Instead, use #if 0 ... #endif when it is intentional.

All assumptions should be clearly explained in comments. Use the following markers to highlight issues and make them searchable:

  • WARNING: Alerts a maintainer to the risk of changing this code. e.g., a delay loop counter's terminal value was determined empirically and may need adjustment when the code is ported or the optimization level is tweaked.
  • NOTE: Provides descriptive comments about the "why" of a chunk of code, as opposed to the "how" usually included in comments. e.g., a chunk of driver code may deviate from the datasheet due to a known erratum in the chip, or an assumption made by the original programmer is explained.
  • TODO: Indicates an area of the code that is still under construction and explains what remains to be done. When appropriate, include an all-caps programmer name or set of initials before the word TODO.

Keep the documentation as close to the code as possible.

Spacing and brackets

Ensure that the keywords if, while, for, switch, and return are always followed by a single space when there is additional code on the same line. Follow these spacing guidelines:

  • Place one space after the keyword in a conditional or loop.
  • Do not use spaces around the parentheses in conditionals or loops.
  • Insert one space before the opening curly bracket.

For example:

do {
    /* some operations */
} while (condition);

Functions (their declarations or calls), sizeof operator or similar macros shall not have a space after their name/keyword or around the brackets, e.g.,

unsigned total_len = offsetof(obj_t, items[n]);
unsigned obj_len = sizeof(obj_t);

Use brackets to avoid ambiguity and with operators such as sizeof, but otherwise avoid redundant or excessive brackets.

Assignment operators (=, +=, -=, *=, /=, %=, &=, |=, ^=, ~=, and !=) should always have a space before and after them. For example:

count += 1;

Binary operators (+, -, *, /, %, <, <=, >, >=, ==, !=, <<, >>, &, |, ^, &&, and ||) should also be surrounded by spaces. For example:

current_conf = prev_conf | (1 << START_BIT);

Unary operators (++, --, !, and ~) should be written without spaces between the operator and the operand. For example:

bonus++;
if (!play)
    return STATE_QUITE;

The ternary operator (? and :) should have spaces on both sides. For example:

uint32_t max(uint32_t a, uint32_t b)
{
    return (a > b) ? a : b;
}

Structure pointer (->) and member (.) operators should not have surrounding spaces. Similarly, array subscript operators ([ and ]) and function call parentheses should be written without spaces around them.

Parentheses

Avoid relying on C’s operator precedence rules, as they may not be immediately clear to those maintaining the code. To ensure clarity, always use parentheses to enforce the correct execution order within a sequence of operations, or break long statements into multiple lines if necessary.

When using logical AND (&&) and logical OR (||) operators, each operand should be enclosed in parentheses, unless it is a single identifier or constant. For example:

if ((count > 100) && (detected == false)) {
    character = C_ASSASSIN;
}

Variable names and declarations

Ensure that functions, variables, and comments are consistently named using English words. Global variables should have descriptive names, while local variables can have shorter names. It is important to strike a balance between being descriptive and concise. Each variable's name should clearly reflect its purpose.

Use snake_case for naming conventions, and avoid using "camelCase." Additionally, do not use Hungarian notation or any other unnecessary prefixes or suffixes.

When declaring pointers, follow these spacing conventions:

const char *name;  /* Const pointer; '*' with the name and a space before it */
conf_t * const cfg;  /* Pointer to const data; spaces around 'const' */
const uint8_t * const charmap;  /* Const pointer and const data */
const void * restrict key;  /* Const pointer that does not alias */

Local variables of the same type should be declared on the same line. For example:

void func(void)
{
    char a, b;  /* OK */

    char a;
    char b;     /* Incorrect: a variable with char type already exists. */
}

Always include a trailing comma in the last element of a structure initialization, including its nested elements, to help clang-format correctly format the structure. However, this comma can be omitted in very simple and short structures.

typedef struct {
    int width, height;
} screen_t;

screen_t s = {
    .width = 640,
    .height = 480,   /* comma here */
}

Type definitions

Declarations shall be on the same line, e.g.,

typedef void (*dir_iter_t)(void *, const char *, struct dirent *);

Typedef structures rather than pointers. Note that structures can be kept opaque if they are not dereferenced outside the translation unit where they are defined. Pointers can be typedefed only if there is a very compelling reason.

New types may be suffixed with _t. Structure name, when used within the translation unit, may be omitted, e.g.:

typedef struct {
    unsigned if_index;
    unsigned addr_len;
    addr_t next_hop;
} route_info_t;

Initialization

Do not initialize static and global variables to 0; the compiler will do this. When a variable is declared inside a function, it is not automatically initialized.

static uint8_t a;  /* Global variable 'a' is set to 0 by the compiler */

void foo()
{
    /* 'b' is uninitialized and set to whatever happens to be in memory */
    uint8_t b;
    ...
}

Embrace C99 structure initialization where reasonable, e.g.,

static const crypto_ops_t openssl_ops = {
    .create = openssl_crypto_create,
    .destroy = openssl_crypto_destroy,
    .encrypt = openssl_crypto_encrypt,
    .decrypt = openssl_crypto_decrypt,
    .hmac = openssl_crypto_hmac,
};

Embrace C99 array initialization, especially for the state machines, e.g.,

static const uint8_t tcp_fsm[TCP_NSTATES][2][TCPFC_COUNT] = {
    [TCPS_CLOSED] = {
        [FLOW_FORW] = {
            /* Handshake (1): initial SYN. */
            [TCPFC_SYN] = TCPS_SYN_SENT,
        },
    },
    ...
}

Any pointer variable that does not have an initial address should be explicitly initialized to NULL. This practice helps prevent undefined behavior caused by dereferencing uninitialized pointers.

In accordance with modern C standards (such as C99 and later), it is preferable to define local variables as needed, rather than declaring them all at the beginning of a function. Declaring variables close to their first use enhances code readability and helps maintain a clear, logical flow within the function.

Additionally, static analysis tools can be employed to scan the entire source code before each build, providing warnings about variables that are used before being properly initialized. This helps catch potential bugs early and ensures code quality and safety.

Control structures

Try to make the control flow easy to follow. Avoid long convoluted logic expressions; try to split them where possible (into inline functions, separate if-statements, etc).

The control structure keyword and the expression in the brackets should be separated by a single space. The opening curly bracket shall be in the same line, also separated by a single space. Example:

    for (;;) {
        obj = get_first();
        while ((obj = get_next(obj))) {
            ...
        }
        if (done)
            break;
    }

Do not add inner spaces around the brackets. There should be one space after the semicolon when for has expressions:

    for (unsigned i = 0; i < __arraycount(items); i++) {
        ...
    }

Avoid unnecessary nesting levels

It is generally preferred to place the shorter clause (measured in lines of code) first in if and else if statements. Long clauses can distract the reader from the core decision-making logic, making the code harder to follow. By placing the shorter clause first, the decision path becomes clearer and easier to understand, which can help reduce bugs.

Avoid nesting if-else statements deeper than two levels. Instead, consider using function calls or switch statements to simplify the logic and enhance readability. Deeply nested if-else statements often indicate a complex and fragile state machine implementation, which can be refactored into a safer and more maintainable structure.

For example, avoid this:

int inspect(obj_t *obj)
{
    if (cond) {
        ...
        /* long code block */
        ...
        return 0;
    }
    return -1;
}

Instead, consider this approach:

int inspect(obj_t *obj)
{
    if (!cond)
        return -1;
    ...
    return 0;
}

However, be careful not to make the logic more convoluted in an attempt to simplify nesting.

if statements

Curly brackets and spacing follow the K&R style:

    if (a == b) {
        ..
    } else if (a < b) {
        ...
    } else {
        ...
    }

Simple and succinct one-line if-statements may omit curly brackets:

    if (!valid)
        return -1;

However, do prefer curly brackets with multi-line or more complex statements. If one branch uses curly brackets, then all other branches shall use the curly brackets too.

Wrap long conditions to the if-statement indentation adding extra 4 spaces:

    if (some_long_expression &&
        another_expression) {
        ...
    }

Avoid redundant else

Avoid:

    if (flag & F_FEATURE_X) {
        ...
        return 0;
    } else {
        return -1;
    }

Consider:

    if (flag & F_FEATURE_X) {
        ...
        return 0;
    }
    return -1;

switch statements

Switch statements should have the case blocks at the same indentation level, e.g.:

    switch (expr) {
    case A:
        ...
        break;
    case B:
        /* fallthrough */
    case C:
        ...
        break;
    }

If the case block does not break, then it is strongly recommended to add a comment containing "fallthrough" to indicate it. Modern compilers can also be configured to require such comment (see gcc -Wimplicit-fallthrough).

Function definitions

The opening and closing curly brackets shall also be in the separate lines (K&R style).

ssize_t hex_write(FILE *stream, const void *buf, size_t len)
{
    ...
}

Do not use old style K&R style C definitions.

For function parameters, place one space after each comma, except at the end of a line.

Function-like Macros

When using function-like macros (parameterized macros), adhere to the following guidelines:

  • Enclose the entire macro body in parentheses.
  • Surround each parameter usage with parentheses.
  • Limit the use of each parameter to no more than once within the macro to avoid unintended side effects.
  • Never include control flow statements (e.g., return) within a macro.
  • If the macro involves multiple statements, encapsulate them within a do-while (0) construct.

For example:

#define SET_POINT(p, x, y)  do {    \
    (p)->px = (x);                  \
    (p)->py = (y);                  \
} while (0)

While the extensive use of parentheses, as shown above, helps minimize some risks, it cannot prevent issues like unintended double increments from calls such as MAX(i++, j++).

Other risks associated with macros include comparing signed and unsigned data or testing floating-point values. Additionally, macros are not visible at runtime, making them impossible to step into with a debugger. Therefore, use them with caution.

Use const and static effectively

The static keyword should be used for any variables that do not need to be accessible outside the module where they are declared. This is particularly important for global variables defined in C files. Declaring variables and functions as static at the module level protects them from external access, reducing coupling between modules and improving encapsulation.

For functions that do not need to be accessible outside the module, use the static keyword. This is especially important for private functions, where static should always be applied.

For example:

static bool verify_range(uint16_t x, uint16_t y);

The const keyword is essential for several key purposes:

  • Declaring variables that should not change after initialization.
  • Defining fields within a struct that must remain immutable, such as those in memory-mapped I/O peripheral registers.
  • Serving as a strongly typed alternative to #define for numerical constants.

For example, instead of using:

#define MAX_SKILL_LEVEL (100U)

Use:

const uint8_t max_skill_level = 100;

Maximizing the use of const provides the advantage of compiler-enforced protection against unintended modifications to data that should be read-only, thereby enhancing code reliability and safety.

Additionally, when one of your function arguments is a pointer to data that will not be modified within the function, you should use the const keyword. This is particularly useful when comparing a character array with predefined strings without altering the array’s contents.

For example:

static bool is_valid_cmd(const char *cmd);

Object abstraction

Objects are often "simulated" by the C programmers with a struct and its "public API". To enforce the information hiding principle, it is a good idea to define the structure in the source file (translation unit) and provide only the declaration in the header. For example, obj.c:

#include "obj.h"

struct obj {
    int value;
}

obj_t *obj_create(void)
{
    return calloc(1, sizeof(obj_t));
}

void obj_destroy(obj_t *obj)
{
    free(obj);
}

With an example obj.h:

#ifndef _OBJ_H_
#define _OBJ_H_

typedef struct obj;

obj_t *obj_create(void);
void obj_destroy(obj_t *);

#endif

Such structuring will prevent direct access of the obj_t members outside the obj.c source file. The implementation (of such "class" or "module") may be large and abstracted within separate source files. In such case, consider separating structures and "methods" into separate headers (think of different visibility), for example obj_impl.h (private) and obj.h (public).

Consider crypto_impl.h:

#ifndef _CRYPTO_IMPL_H_
#define _CRYPTO_IMPL_H_

#if !defined(__CRYPTO_PRIVATE)
#error "only to be used by the crypto modules"
#endif

#include "crypto.h"

typedef struct crypto {
    crypto_cipher_t cipher;
    void *key;
    size_t key_len;
    ...
}
...

#endif

And crypto.h (public API):

#ifndef _CRYPTO_H_
#define _CRYPTO_H_

typedef struct crypto crypto_t;

crypto_t *crypto_create(crypto_cipher_t);
void crypto_destroy(crypto_t *);
...

#endif

Use reasonable types

Use unsigned for general iterators; use size_t for general sizes; use ssize_t to return a size which may include an error. Of course, consider possible overflows.

Avoid using fixed-width types like uint8_t, uint16_t, or other smaller integer types for general iterators or similar cases unless there is a specific need for size-constrained operations, such as in fixed-width data processing or resource-limited environments.

C has rather peculiar type promotion rules and unnecessary use of sub-word types might contribute to a bug once in a while.

Boolean variables should be declared using the bool type. Non-Boolean values should be converted to Boolean by using relational operators (e.g., < or !=) rather than by casting.

For example:

#include <stdbool.h>
...
bool inside = (value < expected_range);

Embrace portability

Byte-order

Do not assume x86 or little-endian architecture. Use endian conversion functions for operating the on-disk and on-the-wire structures or other cases where it is appropriate.

Types

Do not assume a particular 32-bit or 64-bit architecture; for example, do not assume the size of long or unsigned long. Instead, use int64_t or uint64_t for 8-byte integers.

Fixed-width types, such as uint32_t, are particularly useful when memory size is critical, as in embedded systems, communication protocols requiring specific data sizes, or when interacting with hardware registers that require precise bit-width operations. In these scenarios, fixed-width types ensure consistent behavior across different platforms and compilers.

Do not assume char is signed; for example, on Arm architectures, it is unsigned by default.

Avoid defining bit-fields within signed integer types. Additionally, do not use bitwise operators (such as &, |, ~, ^, <<, and >>) on signed integer data. Refrain from combining signed and unsigned integers in comparisons or expressions, as this can lead to unpredictable results.

When using #define to declare decimal constants, append a U to ensure they are treated as unsigned. For example:

#define SOME_CONSTANT (6U)

uint16_t unsigned_a = 6;
int16_t  signed_b = -9;
if (unsigned_a + signed_b < 4) {
    /* This block might appear logically correct, as -9 + 6 is -3 */
    ...
}
/* but compilers with 16-bit int may legally interpret it as (0xFFFF – 9) + 6. */

It is important to note that certain aspects of manipulating binary data within signed integer containers are implementation-defined behaviors according to ISO C standards. Additionally, mixing signed and unsigned integers can lead to data-dependent results, as demonstrated in the example above.

Use C99 macros for constant prefixes or formatting of the fixed-width types.

Use:

#define SOME_CONSTANT (UINT64_C(1) << 48)
printf("val %" PRIu64 "\n", SOME_CONSTANT);

Do not use:

#define SOME_CONSTANT (1ULL << 48)
printf("val %lld\n", SOME_CONSTANT);

Avoid unaligned access

Avoid assuming that unaligned access is safe. It is not secure on architectures like Arm, POWER, and others. Additionally, even on x86, unaligned access can be slower.

Structures and Unions

Care should be taken to prevent the compiler from inserting padding bytes within struct or union types, as this can affect memory layout and portability. To control padding and alignment, consider using structure packing techniques specific to your compiler.

Additionally, take precautions to ensure that the compiler does not alter the intended order of bits within bit-fields. This is particularly important when working with hardware registers or communication protocols where bit order is crucial.

According to the C standard, the layout of structures, including padding and bit-field ordering, is implementation-defined, meaning it can vary between different compilers and platforms. Therefore, it is essential to verify that the structure's layout meets your expectations, especially when writing portable code.

For example:

typedef struct {
    uint16_t count;         /* offset 0 */
    uint16_t max_count;     /* offset 2 */
    uint16_t unused0;       /* offset 4 */
    uint16_t enable    : 2; /* offset 6 bits 15-14 */
    uint16_t interrupt : 1; /* offset 6 bit 13     */
    uint16_t unused1   : 7; /* offset 6 bits 12-6  */
    uint16_t complete  : 1; /* offset 6 bit 5      */
    uint16_t unused2   : 4; /* offset 6 bits 4-1   */
    uint16_t periodic  : 1; /* offset 6 bit 0      */
} mytimer_t;

/* Preprocessor check of timer register layout byte count. */
#if (sizeof(mytimer_t) != 8)
#error mytimer_t struct size incorrect (expected 8 bytes)
#endif

To enhance portability, use standard-defined types (e.g., uint16_t, uint32_t) and avoid relying on compiler-specific behavior. Where precise control over memory layout is required, such as in embedded systems or when interfacing with hardware, always verify the structure size and layout using static assertions or preprocessor checks.

Avoid extreme portability

Unless programming for micro-controllers or exotic CPU architectures, focus on the common denominator of the modern CPU architectures, avoiding the very maximum portability which can make the code unnecessarily cumbersome.

Some examples:

  • It is fair to assume sizeof(int) == 4 since it is the case on all modern mainstream architectures. PDP-11 era is long gone.
  • Using 1U instead of UINT32_C(1) or (uint32_t) 1 is also fine.
  • It is fair to assume that NULL is matching (uintptr_t) 0 and it is fair to memset() structures with zero. Non-zero NULL is for retro computing.

References