Skip to content

Commit

Permalink
docs: improve python_api preface
Browse files Browse the repository at this point in the history
  • Loading branch information
mara004 committed Mar 28, 2024
1 parent a8d8c0a commit ec5b497
Showing 1 changed file with 24 additions and 23 deletions.
47 changes: 24 additions & 23 deletions docs/source/python_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,67 +11,68 @@ Preface
Thread incompatibility
----------------------

PDFium is not thread-compatible. If you need to parallelize tasks, use processes instead of threads.
PDFium is not thread-safe. It is not allowed to call pdfium functions simultaneously across different threads, not even with different documents. [#illegal_threading]_
However, you may still use pdfium in a threaded context if it is ensured that only a single pdfium call can be made at a time (e.g. via mutex).
It is fine to do pdfium work in one thread and other work in other threads.

The same applies to pypdfium2's helpers, or any wrapper calling pdfium, whether directly or indirectly, unless protected by mutex.

To parallelize expensive pdfium tasks such as rendering, consider processes instead of threads.

.. [#illegal_threading] Doing so would crash or corrupt the process.
API layers
----------

pypdfium2 provides multiple API layers:

* The raw PDFium API, to be used with :mod:`ctypes` (namespace ``pypdfium2.raw``).
* The support model API, which is a nice set of Python helper classes around the raw API (namespace ``pypdfium2``).
* Additionally, there is the internal API, which contains various utilities that are not fit for the main support model (namespace ``pypdfium2.internal``).
* The raw PDFium API, to be used with :mod:`ctypes` (``pypdfium2.raw`` or ``pypdfium2_raw`` [#raw_model]_).
* The support model API, which is a set of Python helper classes around the raw API (``pypdfium2``).
* Additionally, there is the internal API, which contains various utilities that aid with using the raw API and are accessed by helpers, but do not fit in the support model namespace itself (``pypdfium2.internal``).

All wrapper objects provide a ``raw`` attribute to access the underlying ctypes object.
Wrapper objects provide a ``raw`` attribute to access the underlying ctypes object.
In addition, helpers automatically resolve to raw if used as C function parameter. [#ctypes_param_hook]_
This allows you to conveniently use helpers where available, while the raw API can still be accessed as necessary.
This allows to conveniently use helpers where available, while the raw API can still be accessed as needed.

The raw API is quite stable and provides a high level of backwards compatibility (seeing as PDFium is
well-tested and relied on by popular projects), but it can be difficult to use, and special care needs
to be taken about memory management.
The raw API is quite stable and provides a high level of backwards compatibility (seeing as PDFium is well-tested and relied on by popular projects), but it can be difficult to use, and special care needs to be taken with memory management.

The support model API is still in development. Backwards incompatible changes may be applied occasionally,
though they are usually limited to major releases. On the other hand, it is considerably easier to use,
and the consequences of usage mistakes are generally less serious.
The support model API is still in beta stage. It only covers a subset of pdfium features. Backwards incompatible changes may be applied occasionally, although we try to contain them within major releases. On the other hand, it is supposed to be safer and easier to use ("pythonic"), abstracting the finnicky interaction with C functions.

.. [#raw_model] The latter does not automatically initialize pdfium on import.
.. [#ctypes_param_hook] Implemented via ctypes hook `_as_parameter_ <https://docs.python.org/3/library/ctypes.html#calling-functions-with-your-own-custom-data-types>`_
Memory management
-----------------

.. Information on PDFium's behaviour: https://groups.google.com/g/pdfium/c/7qLFskabmnk/m/xQEnXiG5BgAJ
.. Limitations of weakref: https://stackoverflow.com/q/52648418/15547292/#comment131514594_58243606
.. Info on PDFium's close functions: https://groups.google.com/g/pdfium/c/7qLFskabmnk/m/xQEnXiG5BgAJ
.. weakref limitations: https://stackoverflow.com/q/52648418/15547292/#comment131514594_58243606
.. note::
This section covers the support model, which does a lot of protective handling around raw pdfium close functions.
It is not applicable to the raw API alone!
This section covers the support model. It is not applicable to the raw API alone!

PDFium objects commonly need to be closed by the caller to release allocated memory. [#ac_obj_ownership]_
Where necessary, pypdfium2's helper classes implement automatic closing on garbage collection using :class:`weakref.finalize`. Additionally, they provide ``close()`` methods that can be used to release memory explicitly.

It may be advantageous to close objects explicitly instead of relying on Python garbage collection behaviour, to release allocated memory and acquired file handles immediately. [#ac_obj_opaqueness]_

Closed objects must not be accessed anymore.
Closing an object sets the underlying ``raw`` attribute to None, which should safely prevent illegal function calls on closed raw handles, though.
Closing an object sets the underlying ``raw`` attribute to None, which should prevent illegal use of closed raw handles, though.
Attempts to re-close an already closed object are silently ignored.

Closing a parent object will automatically close any open children (e.g. pages derived from a pdf).
This is a fairly recent change. With older versions, you should be cautious to close all children explicitly before closing a parent object.

It is important to note that raw objects must never be isolated from their wrappers. Continuing to use a raw object after it was closed (explicitly or on garbage collection of the wrapper) is bound to result in a use after free scenario.
Due to limitations in :mod:`weakref`, finalizers can only be attached to wrapper objects, although they would logically belong to the raw objects.
Raw objects must not be detached from their wrappers. Accessing a raw object after it was closed, whether explicitly or on garbage collection of the wrapper, is illegal (use after free).
Due to limitations in :mod:`weakref`, finalizers can only be attached to wrapper objects, although they logically belong to the raw object.

.. [#ac_obj_ownership] Only objects owned by the caller of PDFium need to be closed. For instance, page objects that belong to a page are automatically freed by PDFium, while the caller is responsible for loose page objects.
.. [#ac_obj_ownership] Only objects owned by the caller of PDFium need to be closed. For instance, pageobjects that belong to a page are automatically freed by PDFium, while the caller is responsible for loose pageobjects.
.. [#ac_obj_opaqueness] Python does not know how many resources an opaque C object might bind.
Version
*******
.. note::
Version info can be fooled. Prefer to see it as orientation rather than inherently reliable data.
Version info can be fooled. See it as orientation rather than inherently reliable data.

.. automodule:: pypdfium2.version

Expand Down

0 comments on commit ec5b497

Please sign in to comment.