Skip to content

Best practices for C API design

Petr Viktorin edited this page Jul 13, 2023 · 2 revisions

This is not any kind of official document. We've not heard all the relevant voices yet.

Rather, this doc allows the active people to work on rough on consensus, while allowing others to see the current state and chime in. At some point we submit it as a PEP, and start working on the next one.


This focuses on guidelines that we can start applying now, staying reasonably consistent with existing API. Redesigns (e.g. adding a context argument to all calls) are out of scope.

Sources:

Exceptions to the rules

If there's a good reason to break the rules below (typically better performance), prefer including API that does adhere to these rules, and additionally a variant that doesn't. Usually, the rule-breaking variant will be less portable (compatible with other implementations) and less stable (compatible with future Python versions); add it to the appropriate stability tier.

Naming

All API usable by third parties should be prefixed by Py. (Even macros: avoid PY -- see PEP 7.) All such API must be included from <Python.h>

All private API (e.g. implementation details that need to be available for the compiler/linker, but should not be typed by users) should be prefixed by _Py.

Things to avoid

Avoid API that requires the C preprocessor or undefined/compiler-defined behaviour and memory layout:

  • Macros, except for:
    • Simple constants (XXX why not use type-safe const values?)
    • Feature flags (e.g. HAVE_FORK)
    • Macros used only to define the API (e.g. PyAPI_FUNC, Py_ALWAYS_INLINE)
    • Shortcuts for functionality that can be accomplished trivially (but perhaps tediously) without macros (e.g. Py_VISIT, Py_BEGIN_ALLOW_THREADS, Py_RETURN_RICHCOMPARE). The macro-less equivalent should be clear from documentation.
  • Bit fields (use integral types and maskks/shifts instead)
  • Enums (see Categorical types below)
  • Static (inline) functions - see below
  • Variable length argument lists (... in parameters)

Any of these are fine as C-specific shortcuts, that is, “second-class” alternatives to other API.

Functions

All functions should be exported as symbols (using PyAPI_FUNC).

Sometimes you will additionally want to provide an inline function for speed. Use the following pattern to both export a symbol and provide a static inline function for C programs:

Header:

static inline returntype
_Py_Foo_impl(args)
{
    ...
}

PyAPI_FUNC(returntype) Py_Foo (args);

#define Py_Foo _Py_Foo_impl

Code:

// at the end (after all calls to Py_Foo):
#undef Py_Foo

returntype
Py_Foo (args)
{
    return _Py_Foo_impl(args);
}

Types

Integral types

Prefer types with a fixed size from <stdint.h>, such as int32_t. Be aware that int is 16 bits. Avoid long as it differs in size even on the same hardware.

The use of int is sometimes acceptable, when it is used as an enumeration, or small range. If the return value can represent a value, then a <stdint.h> type should be used.

Flags

Use bitwise masks and shifts for bit fields, such as collections of flags. Avoid C bit fields (see above).

Prefer types with a fixed size from <stdint.h>, such as int32_t.

Enumerations

For categorical types, use int. Avoid C enum (see above).

Valid values should be non-negative (this allows using them as return values). It's fine to skip values and exploit bit patterns, but stay within 15 bits.

Always document how to treat undefined values. (It's OK to say these must not happen, if you're sure we won't need to define new values in the future.)

Structs

There are 3 kinds of structs:

Opaque structs

Always handled via pointers. Includes PyObject * and subclasses.

Generally, no “live” part of these structs may be exposed, data may only be copied out.

Initialization structs

“Blueprints” for how an object should be initialized. The runtime should make a copy of the information, so the struct and all its fields can be freed after it's used. Any exceptions -- typically const char* strings -- must be explicitly documented.

We should not provide API that returns, exposes or fills these structs. When needed, provide accessors for individual fields. (This will make it easier to add future API that uses a different struct.)

Interop structs

For example, Py_buffer. XXX These are used in special cases, it's hard to write general guidelines.

Arrays

XXX

Pointers

Always be explicit about

  • the lifetime of pointed-to data
  • nullability

Function pointers

Avoid returning/exposing function pointers. Use “caller” API instead. (For example, PyNumber_Add rather than get_nb_add.) If we need to adapt the function signature in the future, caller API can adapt to it.

Return values

All functions return either a pointer or a signed integral type (typically int).

Signaling failure

Functions must be able to signal failure.

Whether an API function has failed must be clear from the return value, and only from the return value.

Whenever a function fails, it must set the current exception.

Missing values aren't failures

Expected “negative” results such as “item not found” or “end of iteration” are not considered failures. (Allocating excexption objects for them would be needlessly expensive, for one.)

See Output parameters below for examples.

Returning integer types

Integral return types must be signed.

Return -1 to signal an error. (XXX Is there a use case for other negative numbers?)

An exception must be set iff the return value is negative. Callers should check for error with e.g. if (result < 0).

Non-negative values can be used for results. Common schemes are:

  • 0 for success (no result returned)
  • 0 for false, 1 for true
  • 0 for missing values (e.g. “entry not found”), 1 when a valid value is retrieved (see Output parameters below)
  • An unsigned value (such as size) returned directly

If a function needs to return negative values, it needs to use an output parameter (see below).

Returning PyObject *

Public API should return PyObject *, rather than specific types like PyDictObject *, to avoid excessive casting.

Return NULL with an exception set to signal failure. An exception should be set iff the return value is NULL.

When a non-NULL object is returned, it must be a new reference: the caller is responsible for decref-ing it. When breaking this rule, indicate that in the name (e.g. Borrow), and explicitly document how long the reference will be valid (e.g. what it's being borrowed from).

Function parameters

PyObject * parameters

Public API should take PyObject *, rather than specific types like PyDictObject *, to avoid excessive casting. Verify the type with a debug assertion, e.g. assert(PyDict_Check(obj)).

All PyObject * parameters should be non-NULL. Verify this with a debug assertion, e.g. assert(obj). If you need NULL to cover a special case, prefer adding a special function for it.

PyObject * parameters should be references borrowed from the caller:

  • they are guaranteed to be valid for the duration of the call
  • the caller retains ownership

If a function needs to “consume” a reference instead, signal that with a suffix. (XXX _Steal, _Consume, _Take, _Move, _DecRef?) This means the caller may no longer use the reference after the call.

Output parameters

If a function would return NULL or a negative int without setting an exception, it must use an output parameter, that is, a pointer to memory that the function sets. For example:

  • int PyLong_ToInt(PyLongObject *obj, int *result) (returning -1 on failure, 0 on overflow, 1 on success)
  • int PyDict_LookupItem(PyDictObject *obj, PyObject *key, PyObject **result) (returning -1 on failure, 0 on missing value, 1 when the value is found)

The output should be initialized in all cases, including errors. (XXX Should it?) If you don't have information to pass, set output parameters to NULL or -1 in error cases.