Skip to content

regression introduced in 2.13.0 (from 2.12.14) concerning use_cache in astroid's cache #2041

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ltcmelo opened this issue Feb 28, 2023 · 4 comments · Fixed by #2206
Closed

Comments

@ltcmelo
Copy link
Contributor

ltcmelo commented Feb 28, 2023

This PR (#1747) introduced a regression in 2.13.0 (from 2.12.14). This regression causes undesired side effects — at least to me :-).

I rely on AstroidManager.astroid_cache to provide "special" implementations for certain modules. But the change in the PR mentioned above affects the behavior of AstroidManager.astroid_cache in a way that I can no longer ensure that those implementations are consistently used.

(And it appears that this side effect isn't intended by the PR, since the importing module isn't the same name as the imported module.)

Reproducible steps:

Under path /user/projects/

File /user/projects/main.py

import os
import os.path

import astroid
from astroid.manager import AstroidManager

mngr = AstroidManager()

app_dir_path = os.path.dirname(__file__)


def module_name_from_path(file_path: str):
    subpath = os.path.relpath(file_path, app_dir_path)
    mod_name = os.path.splitext(subpath)[0].replace(os.sep, '.')
    return mod_name


def parse_and_cache(mod_file_path: str, mod_name: str):
    with open(mod_file_path, 'r') as f:
        code = f.read()
    mod_node = astroid.parse(code, mod_name, mod_file_path, True)
    if mod_name in mngr.astroid_cache:
        del mngr.astroid_cache[mod_name]
    mngr.cache_module(mod_node)
    return mod_node


if __name__ == '__main__':
    prog_mod_path = os.path.abspath(os.path.join(app_dir_path, 'prog.py'))
    prog_mod_name = module_name_from_path(prog_mod_path)
    prog_mod_node = parse_and_cache(prog_mod_path, prog_mod_name)

    lib_dir_path = '/external/repository'
    os_dir_path = os.path.abspath(os.path.join(lib_dir_path, 'os', '__init__.py'))
    parse_and_cache(os_dir_path, 'os')
    ospath_mod_path = os.path.abspath(os.path.join(lib_dir_path, 'os', 'path.py'))
    parse_and_cache(ospath_mod_path, 'os.path')

    val = next(prog_mod_node.body[1].body[0].value.func.infer())
    print(val.parent.frame().file)

File /user/projects/proj.py:

import os.path

def f():
    os.path.relpath('abc')

Under path /external/repository/

This path must not be a subpath of /user/projects/.
Directory structure below:

$ tree /external/repository
/os
├── __init__.py
└── path.py

File /external/repository/os/init.py:

from . import path as path

File /external/repository/os/path.py:

def relpath(s):
    return " "

Output of run

With:

  • astroid 2.12.14 (this is correct/expected "special" os.path)
    /external/repository/os/path.py

  • astroid 2.13.0 (the system os.path)
    /usr/local/Cellar/python/…/lib/python3.10/posixpath.py

Details

I can observe the bug in do_import_module when modname is None and also self.modname is None, which in turn leads to use_cache = False. For instance, if you amend if mymodule.relative_to_absolute_name(modname, level) == mymodule.name: with if modname and mymodule.relative_to_absolute_name(modname, level) == mymodule.name:, then the bug is gone; but I'm not an astroid developer and I imagine that isn't the right place to fix the issue… it probably has a deeper root cause.

@DanielNoord
Copy link
Collaborator

Sorry for taking so long to get back to this. I think it is very likely that we might refactor AstroidManager in our efforts to fix our multiprocessing issues. That is why I'm inclined to leave this issue for now and see if it still exists after the refactor and what our stance would then be.

@jacobtylerwalls
Copy link
Member

Ended up in the same place after opening #2124. I do think we should reconsider this as part of 3.0. We may look into multiprocessing first, or we may look at this part first, I'm not sure.

@Pierre-Sassoulas
Copy link
Member

We may look into multiprocessing first, or we may look at this part first, I'm not sure.

Caching the sensitive part seems a lot easier to implement than multiprocessing ? I imagine it as "making sure that we can cache a function because the value returned for a particular input is always the same then adding a decorator on a function" vs "making sure that the design we have to invent can merge the result of two parallel parsing then actually implement it" ?

@cdce8p cdce8p modified the milestones: 3.0.0a1, 3.0.0a2, 3.0.0a3 Apr 25, 2023
@Pierre-Sassoulas Pierre-Sassoulas modified the milestones: 3.0.0a3, 3.0.0a4 May 14, 2023
@Pierre-Sassoulas Pierre-Sassoulas modified the milestones: 3.0.0a4, 3.0.0a5 Jun 6, 2023
@jacobtylerwalls
Copy link
Member

For instance, if you amend if mymodule.relative_to_absolute_name(modname, level) == mymodule.name: with if modname and mymodule.relative_to_absolute_name(modname, level) == mymodule.name:, then the bug is gone; but I'm not an astroid developer and I imagine that isn't the right place to fix the issue… it probably has a deeper root cause.

I just independently arrived at the same patch. I think it's correct. #1747 didn't intend to create endless re-instantiation of modules as described in #2124. I'll open a PR and credit you @ltcmelo. Thanks!

jacobtylerwalls added a commit to jacobtylerwalls/astroid that referenced this issue Jun 7, 2023
If modname is None (default), it's okay to use the cache;
there's no point in recreating the module ad nauseum.

Closes pylint-dev#2041

Co-authored-by: Leandro T. C. Melo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants