-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up import scanning by 400-500% #142
base: master
Are you sure you want to change the base?
Conversation
a86ed83
to
9ac45a1
Compare
def test_syntax_error_terminates_executor_pool(): | ||
with pytest.raises(BrokenProcessPool): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is one disadvantage of concurrency in python. using the .map()
method in the executor results in the inability to capture errors raised from individual tasks
i've also tried using executor.submit()
, with my own chunking logic in order to try capture the exception as well but any exception that isn't on the top level function get_imports_by_module
, won't propagate to the context object:
with ProcessPoolExecutor() as executor:
you only get this generic
BrokenProcessPool('A process in the process pool was terminated abruptly while the future was running or pending.')
error
one way around this might be to change the raises
in the tasks to returns
, then check the result accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a shame to lose that exception - returning any syntax errors (they could even be exception objects) sounds like it is at least worth a try.
src/grimp/application/usecases.py
Outdated
) | ||
imports_by_module[module] = direct_imports | ||
with ProcessPoolExecutor() as executor: | ||
chunk_size = ceil(len(found_package.module_files) / executor._max_workers) or 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be helpful to have a comment here explaining our reasoning for the chunk size.
Very exciting! Have left a couple of initial comments, also the tests are failing which would be good to sort out. |
cddb258
to
914d301
Compare
914d301
to
cecca20
Compare
import_scanner: AbstractImportScanner, | ||
exclude_type_checking_imports: bool, | ||
cache: caching.Cache, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing type annotation on return value.
I checked out this branch and installed it locally. It dramatically slowed down graph building time on a large test repo (0.6 seconds -> 35 seconds). On a Mac M2. I verified that my local installation of the Seems like something is wrong. When I set |
Tested on a large mono repo (35,000 modules) using a mac m1 (8 cores). The modules are divided amongst the cores on the system it runs.
so each core is doing an even share of ast walking
profiled using
Cprofile
,pstats
(to read the profile dump) andsnakeviz
(for the visualisation)Current(~37s):
after parallelisation(~8s):