Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up import scanning by 400-500% #142

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

duzumaki
Copy link

@duzumaki duzumaki commented Feb 12, 2024

Tested on a large mono repo (35,000 modules) using a mac m1 (8 cores). The modules are divided amongst the cores on the system it runs.
so each core is doing an even share of ast walking

profiled using Cprofile, pstats(to read the profile dump) and snakeviz(for the visualisation)

Current(~37s):
Screenshot 2024-02-12 at 00 50 30

after parallelisation(~8s):
Screenshot 2024-02-12 at 00 52 35

@duzumaki duzumaki marked this pull request as draft February 12, 2024 01:18
@duzumaki duzumaki force-pushed the speed_up_build_graph branch 2 times, most recently from a86ed83 to 9ac45a1 Compare February 12, 2024 01:24
@duzumaki duzumaki marked this pull request as ready for review February 12, 2024 01:31
def test_syntax_error_terminates_executor_pool():
with pytest.raises(BrokenProcessPool):
Copy link
Author

@duzumaki duzumaki Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is one disadvantage of concurrency in python. using the .map() method in the executor results in the inability to capture errors raised from individual tasks

i've also tried using executor.submit(), with my own chunking logic in order to try capture the exception as well but any exception that isn't on the top level function get_imports_by_module, won't propagate to the context object:

with ProcessPoolExecutor() as executor:

you only get this generic
BrokenProcessPool('A process in the process pool was terminated abruptly while the future was running or pending.') error

one way around this might be to change the raises in the tasks to returns, then check the result accordingly

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a shame to lose that exception - returning any syntax errors (they could even be exception objects) sounds like it is at least worth a try.

)
imports_by_module[module] = direct_imports
with ProcessPoolExecutor() as executor:
chunk_size = ceil(len(found_package.module_files) / executor._max_workers) or 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be helpful to have a comment here explaining our reasoning for the chunk size.

@seddonym
Copy link
Owner

Very exciting!

Have left a couple of initial comments, also the tests are failing which would be good to sort out.

@duzumaki duzumaki force-pushed the speed_up_build_graph branch 2 times, most recently from cddb258 to 914d301 Compare February 12, 2024 19:24
import_scanner: AbstractImportScanner,
exclude_type_checking_imports: bool,
cache: caching.Cache,
):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing type annotation on return value.

@wesleykendall
Copy link

wesleykendall commented Jul 1, 2024

I checked out this branch and installed it locally. It dramatically slowed down graph building time on a large test repo (0.6 seconds -> 35 seconds).

On a Mac M2. I verified that my local installation of the master branch on this repo yielded fast results, so I don't think I'm doing anything wrong on the installation.

Seems like something is wrong. When I set max_workers to 1 on the process executor, build time was around 10 seconds. max_workers of 2 slowed down to 14 seconds. Let me know if there is another way I can profile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants