Skip to content

Commit

Permalink
General improvement of README and switch back to
Browse files Browse the repository at this point in the history
direct ast initialization within DeferredInstrumenter._create_import_name_replacement
  • Loading branch information
Sachaa-Thanasius committed Sep 2, 2024
1 parent 9a6bbe4 commit 3117530
Show file tree
Hide file tree
Showing 4 changed files with 103 additions and 65 deletions.
76 changes: 41 additions & 35 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Usage
Setup
-----

``deferred`` hooks into the Python import system with a path hook. That path hook needs to be registered before code using ``defer_imports_until_use`` is executed. To do that, include the following somewhere such that it will be executed before your code:
``deferred`` hooks into the Python import system with a path hook. That path hook needs to be registered before code using ``defer_imports_until_use`` is parsed. To do that, include the following somewhere such that it will be executed before your code:

.. code:: python
Expand All @@ -54,33 +54,66 @@ Assuming the path hook has been registered, you can write something like this:
import inspect
from typing import TypeVar
# inspect and TypeVar won't be imported until referenced. For imports that are only used for annotations,
# this import cost can be avoided entirely by making sure all annotations are strings.
# inspect and TypeVar won't be imported until referenced.
Use Cases
---------

- If imports are necessary to get symbols that are only used within annotations, but those would cause import chains. The current workaround for this is to perform the problematic imports within ``if typing.TYPE_CHECKING: ...`` blocks and then stringify the fake-imported symbols to prevent NameErrors at runtime from the symbols not existing; however, resulting annotations are difficult to introspect with standard library introspection tools, since they assume the symbols exist. With ``deferred``, however, those imports can be deferred, annotations can be stringified (or deferred under PEP 649 semantics), and the deferred imports would only occur when the annotations are introspected/evaluated for the sake of making the contained symbols exist at runtime, thus making the imports not circular and almost zero cost.
- If imports are expensive but only necessary for certain code paths that won't always be hit, e.g. in subcommands in CLI tools.
- If imports are necessary to get symbols that are only used within annotations, but such imports would cause import chains.

The current workaround for this is to perform the problematic imports within ``if typing.TYPE_CHECKING: ...`` blocks and then stringify the fake-imported, nonexistent symbols to prevent NameErrors at runtime; however, resulting annotations are difficult to introspect with standard library introspection tools, since they assume the symbols exist.

With ``deferred``, however, those imports can be deferred, annotations can be stringified (or late-evaluated under PEP 649 semantics), and the deferred imports would only occur for the sake of making the contained symbols exist at runtime *if** the annotations are inspected/evaluated, thus making the imports not circular and close to free in most circumstances.

- If expensive imports are only necessary for certain code paths that won't always be taken, e.g. in subcommands in CLI tools.


Features
========

- Python implementation–agnostic, in theory.
- Supports multiple Python runtimes/implementations, in theory.

- The library mainly depends on ``locals()`` at module scope to maintain its current API: specifically, that its return value will be a read-through, *write-through*, dict-like view of the module locals, and that a reference to that view can be passed around.

- Supports all syntactically valid Python import statements.
- Doesn't break type-checkers like pyright and mypy.


Caveats
========
=======

- Doesn't support lazy importing in class or function scope.
- Doesn't support wildcard imports.
- (WIP) Has an initial setup cost that could be smaller.
- Has a relatively hefty one-time setup cost.
- Doesn't have an API for giving users a choice to automatically defer all imports on a module, library, or application scale.


Why?
====

Lazy imports, in theory, alleviate several pain points that Python has currently. I'm not alone in thinking that; `PEP 690 <https://peps.python.org/pep-0690/>`_ was put forth to integrate lazy imports into CPython for that reason and explains the benefits much better than I can. While that proposal was rejected, there are other options in the form of third-party libraries that implement lazy importing, albeit with some constraints. Most do not have an API that is as general and ergonomic as what PEP 690 laid out, but they didn't aim to fill those shoes in the first place. Some examples:

- `demandimport <https://github.com/bwesterb/py-demandimport>`_
- `apipkg <https://github.com/pytest-dev/apipkg>`_
- `modutil <https://github.com/brettcannon/modutil>`_
- `metamodule <https://github.com/njsmith/metamodule/>`_
- `SPEC 1 <https://scientific-python.org/specs/spec-0001/>`_ and its implementation, `lazy-loader <https://github.com/scientific-python/lazy-loader>`_
- And countless more

Then along came `slothy <https://github.com/bswck/slothy>`_, a library that seems to do it better, having been constructed with feedback from multiple CPython core developers as well as one of the minds behind PEP 690. It was the main inspiration for this project. However, the library (currently) also ties itself to specific Python implementations by depending on the existence of frames that represent the call stack. That's perfectly fine; PEP 690's implementation was for CPython specifically, and to my knowledge, the most popular Python runtimes provide call stack access in some form. Still, I thought that there might be a way to do something similar while remaining implementation-independent, avoiding as many internal APIs as possible. After feedback and discussion, that thought crystalized into this library.


How?
====

The core of this package is quite simple: when import statments are executed, the resulting values are special proxies representing the delayed import, which are then saved in the local namespace with special keys instead of normal string keys. When a user requests the normal string key corresponding to the import, the relevant import is executed and both the special key and the proxy replace themselves with the correct string key and import result. Everything stems from this.

The ``defer_imports_until_used`` context manager is what causes the proxies to be returned by the import statements: it temporarily replaces ``builtins.__import__`` with a version that will give back proxies that store the arguments needed to execute the *actual* import at a later time.

Those proxies don't use those stored ``__import__`` arguments themselves, though; the aforementioned special keys are what use the proxy's stored arguments to trigger the late import. These keys are aware of the namespace, the *dictionary*, they live in, are aware of the proxy they are the key for, and have overriden their ``__eq__`` and ``__hash__`` methods so that they know when they've been queried. In a sense, they're almost like descriptors, but instead of "owning the dot", they're "owning the brackets". Once they've been matched (i.e. someone uses the name of the import), they can use the proxy's stored ``__import__`` arguments to execute the late import and *replace themselves* in the local namespace. That way, as soon as the name of the deferred import is referenced, all a user sees in the local namespace is a normal string key and the result of the resolved import.

The final step is actually assigning these special proxies to the special keys. After all, Python name binding semantics only allow regular strings to be used as variable names/namespace keys; how can this be bypassed? Well, this is where a little bit of instrumentation comes into play. When a user calls ``deferred.install_deferred_import_hook()`` to set up the ``deferred`` machinery (see "Setup" above), what they are actually doing is installing an import hook that will modify the code of any given Python file that users the ``defer_imports_until_use`` context manager. It adds a few lines of code such that the return values of imports within the context manager are reassigned to special keys in the local namespace, accessed and modified via ``locals()``. With this method, we can avoid using frame hacks to modify the locals and even avoid changing the contract of ``builtins.__import__``, which specifically says it does not modify the global or local namespaces that are passed into it.


Benchmarks
Expand Down Expand Up @@ -125,33 +158,6 @@ There are two ways of measuring activation and/or import time currently:
- This has great variance, so only value the resulting time relative to another import's time in the same process if possible.


Why?
====

Lazy imports, in theory, alleviate several pain points that Python has currently. I'm not alone in thinking that; `PEP 690 <https://peps.python.org/pep-0690/>`_ was put forth to integrate lazy imports into CPython for that reason and explains the benefits much better than I can. While that was rejected, there are other options in the form of third-party libraries that implement lazy importing, albeit with some constraints. Most do not have an API that is as general and ergonomic as what PEP 690 laid out, but they didn't aim to fill those shoes in the first place. Some examples:

- `demandimport <https://github.com/bwesterb/py-demandimport>`_
- `apipkg <https://github.com/pytest-dev/apipkg>`_
- `modutil <https://github.com/brettcannon/modutil>`_
- `metamodule <https://github.com/njsmith/metamodule/>`_
- `SPEC 1 <https://scientific-python.org/specs/spec-0001/>`_ and its implementation, `lazy-loader <https://github.com/scientific-python/lazy-loader>`_
- And countless more.

Then along came `slothy <https://github.com/bswck/slothy>`_, a library that seems to do it better, having been constructed with feedback from multiple CPython core developers as well as one of the minds behind PEP 690. It was the main inspiration for this project. However, the library (currently) also ties itself to specific Python implementations by depending on the existence of frames that represent the call stack. That's perfectly fine; PEP 690's implementation was for CPython specifically, and to my knowledge, the most popular Python runtimes provide call stack access in some form. Still, I thought that there might be a way to do something similar while remaining implementation-independent, avoiding as many internal APIs as possible. After feedback and discussion, that thought crystalized into this library.


How?
====

The core of this package is quite simple: when import statments are executed, the resulting values are special proxies representing the delayed import, which are then saved in the local namespace with special keys instead of normal string keys. When a user requests the normal string key corresponding to the import, the relevant import is executed and both the special key and the proxy replace themselves with the correct string key and import result. Everything stems from this.

The ``defer_imports_until_used`` context manager is what causes the proxies to be returned by the import statements: it temporarily replaces ``builtins.__import__`` with a version that will give back proxies that store the arguments needed to execute the *actual* import at a later time.

Those proxies don't use those stored ``__import__`` arguments themselves, though; the aforementioned special keys are what use the proxy's stored arguments to trigger the late import. These keys are aware of the namespace, the *dictionary*, they live in, are aware of the proxy they are the key for, and have overriden their ``__eq__`` and ``__hash__`` methods so that they know when they've been queried. In a sense, they're almost like descriptors, but instead of "owning the dot", they're "owning the brackets". Once they've been matched (i.e. someone uses the name of the import), they can use the proxy's stored ``__import__`` arguments to execute the late import and *replace themselves* in the local namespace. That way, as soon as the name of the deferred import is referenced, all a user sees in the local namespace is a normal string key and the result of the resolved import.

The final step is actually assigning these special proxies to the special keys. After all, Python name binding semantics only allow regular strings to be used as variable names/namespace keys; how can this be bypassed? Well, this is where a little bit of instrumentation comes into play. When a user calls ``deferred.install_deferred_import_hook()`` to set up the ``deferred`` machinery (see "Setup" above), what they are actually doing is installing an import hook that will modify the code of any given Python file that users the ``defer_imports_until_use`` context manager. It adds a few lines of code such that the return values of imports within the context manager are reassigned to special keys in the local namespace, accessed and modified via ``locals()``. With this method, we can avoid using frame hacks to modify the locals and even avoid changing the contract of ``builtins.__import__``, which specifically says it does not modify the global or local namespaces that are passed into it.


Acknowledgements
================

Expand Down
5 changes: 3 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,16 @@ name = "deferred"
description = "Lazy imports with regular syntax in pure Python."
requires-python = ">=3.9"
license = "MIT"
readme = { file = "README.rst", content-type = "text/x-rst" }
authors = [
{ name = "Sachaa-Thanasius", email = "[email protected]" },
]
classifiers = [
"Development Status :: 2 - Pre-Alpha",
"Development Status :: 3 - Alpha",
"Natural Language :: English",
"Operating System :: OS Independent",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
Expand Down
77 changes: 50 additions & 27 deletions src/deferred/_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@


class DeferredInstrumenter(ast.NodeTransformer):
"""AST transformer that "instruments" imports within "with defer_imports_until_use: ..." blocks so that their
"""AST transformer that instruments imports within "with defer_imports_until_use: ..." blocks so that their
results are assigned to custom keys in the global namespace.
"""

Expand Down Expand Up @@ -89,30 +89,54 @@ def _get_node_context(self, node: ast.stmt): # noqa: ANN202 # Version-dependent

@staticmethod
def _create_import_name_replacement(name: str) -> ast.If:
"""Create an AST for changing the name of a variable in locals if the variable is a deferred proxy."""
"""Create an AST for changing the name of a variable in locals if the variable is a deferred proxy.
The resulting node if unparsed is almost equivalent to the following::
if type(name) is @DeferredImportProxy:
@temp_proxy = @local_ns.pop("name")
local_ns[@DeferredImportKey("name", temp_proxy)] = @temp_proxy
"""

if "." in name:
name = name.partition(".")[0]

# NOTE: Creating the AST directly is also an option, but this feels more maintainable.
if_tree = ast.parse(
f"if type({name}) is DeferredImportProxy:\n"
f" temp_proxy = local_ns.pop('{name}')\n"
f" local_ns[DeferredImportKey('{name}', temp_proxy)] = temp_proxy"
return ast.If(
test=ast.Compare(
left=ast.Call(
func=ast.Name("type", ctx=ast.Load()),
args=[ast.Name(name, ctx=ast.Load())],
keywords=[],
),
ops=[ast.Is()],
comparators=[ast.Name("@DeferredImportProxy", ctx=ast.Load())],
),
body=[
ast.Assign(
targets=[ast.Name("@temp_proxy", ctx=ast.Store())],
value=ast.Call(
func=ast.Attribute(value=ast.Name("@local_ns", ctx=ast.Load()), attr="pop", ctx=ast.Load()),
args=[ast.Constant(name)],
keywords=[],
),
),
ast.Assign(
targets=[
ast.Subscript(
value=ast.Name("@local_ns", ctx=ast.Load()),
slice=ast.Call(
func=ast.Name("@DeferredImportKey", ctx=ast.Load()),
args=[ast.Constant(name), ast.Name("@temp_proxy", ctx=ast.Load())],
keywords=[],
),
ctx=ast.Store(),
)
],
value=ast.Name("@temp_proxy", ctx=ast.Load()),
),
],
orelse=[],
)
if_node = if_tree.body[0]
assert isinstance(if_node, ast.If)

# Adjust some of the names to be inaccessible by normal users.
for node in ast.walk(if_node):
if isinstance(node, ast.Name) and node.id in {
"temp_proxy",
"local_ns",
"DeferredImportProxy",
"DeferredImportKey",
}:
node.id = f"@{node.id}"
return if_node

@staticmethod
def _initialize_local_ns() -> ast.Assign:
Expand Down Expand Up @@ -236,16 +260,15 @@ def visit_Module(self, node: ast.Module) -> ast.AST:

position += 1

# Import key and proxy classes.
key_class = DeferredImportKey.__name__
proxy_class = DeferredImportProxy.__name__
defer_class_names = ("DeferredImportKey", "DeferredImportProxy")

defer_aliases = [ast.alias(name=name, asname=f"@{name}") for name in (key_class, proxy_class)]
# Import key and proxy classes.
defer_aliases = [ast.alias(name=name, asname=f"@{name}") for name in defer_class_names]
key_and_proxy_import = ast.ImportFrom(module="deferred._core", names=defer_aliases, level=0)
node.body.insert(position, key_and_proxy_import)

# Clean up the namespace.
key_and_proxy_names: list[ast.expr] = [ast.Name(f"@{name}", ctx=ast.Del()) for name in (key_class, proxy_class)]
key_and_proxy_names: list[ast.expr] = [ast.Name(f"@{name}", ctx=ast.Del()) for name in defer_class_names]
node.body.append(ast.Delete(targets=key_and_proxy_names))

return self.generic_visit(node)
Expand Down Expand Up @@ -683,7 +706,7 @@ def uninstall_defer_import_hook() -> None:
except ValueError:
pass
else:
# NOTE: Whatever invalidation mechanism install_defer_import_hook() uses must be used here as well.
# NOTE: Whatever invalidation mechanism install_defer_import_hook() uses should be used here as well.
PathFinder.invalidate_caches()


Expand All @@ -705,7 +728,7 @@ def __exit__(self, *exc_info: object) -> None:


defer_imports_until_use: _tp.Final[DeferredContext] = DeferredContext()
"""A context manager within which imports occur lazily.
"""A context manager within which imports occur lazily. Not reentrant.
This will not work correctly if install_defer_import_hook() was not called first elsewhere.
Expand Down
10 changes: 9 additions & 1 deletion tests/test_deferred.py
Original file line number Diff line number Diff line change
Expand Up @@ -921,7 +921,7 @@ def test_leaking_patch(tmp_path: Path):

@pytest.mark.skipif(sys.version_info < (3, 12), reason="type statements are only valid in 3.12+")
def test_type_statement_312(tmp_path: Path):
"""Test that the loading still occurs when a type statement resolves in python 3.12+.
"""Test that a proxy within a type statement doesn't resolve until accessed via .__value__.
The package has the following structure:
.
Expand Down Expand Up @@ -961,4 +961,12 @@ def test_type_statement_312(tmp_path: Path):

with temp_cache_module(package_name, module):
spec.loader.exec_module(module)
expected_proxy_repr = "<key for 'Expensive' import>: <proxy for 'from type_stmt_pkg.exp import Expensive'>"

assert expected_proxy_repr in repr(vars(module))

assert str(module.ManyExpensive) == "ManyExpensive"
assert expected_proxy_repr in repr(vars(module))

assert str(module.ManyExpensive.__value__) == "tuple[type_stmt_pkg.exp.Expensive, ...]"
assert expected_proxy_repr not in repr(vars(module))

0 comments on commit 3117530

Please sign in to comment.