Speed up import scanning by 400-500% #142

duzumaki · 2024-02-12T01:15:58Z

Tested on a large mono repo (35,000 modules) using a mac m1 (8 cores). The modules are divided amongst the cores on the system it runs.
so each core is doing an even share of ast walking

profiled using Cprofile, pstats(to read the profile dump) and snakeviz(for the visualisation)

Current(~37s):

after parallelisation(~8s):

duzumaki · 2024-02-12T01:33:00Z

tests/functional/test_error_handling.py

+def test_syntax_error_terminates_executor_pool():
+    with pytest.raises(BrokenProcessPool):


I guess this is one disadvantage of concurrency in python. using the .map() method in the executor results in the inability to capture errors raised from individual tasks

i've also tried using executor.submit(), with my own chunking logic in order to try capture the exception as well but any exception that isn't on the top level function get_imports_by_module, won't propagate to the context object:

with ProcessPoolExecutor() as executor:

you only get this generic
BrokenProcessPool('A process in the process pool was terminated abruptly while the future was running or pending.') error

one way around this might be to change the raises in the tasks to returns, then check the result accordingly

It would be a shame to lose that exception - returning any syntax errors (they could even be exception objects) sounds like it is at least worth a try.

seddonym · 2024-02-12T15:49:30Z

src/grimp/application/usecases.py

-                )
-            imports_by_module[module] = direct_imports
+        with ProcessPoolExecutor() as executor:
+            chunk_size = ceil(len(found_package.module_files) / executor._max_workers) or 1


Would be helpful to have a comment here explaining our reasoning for the chunk size.

seddonym · 2024-02-12T15:50:23Z

Very exciting!

Have left a couple of initial comments, also the tests are failing which would be good to sort out.

seddonym · 2024-02-13T11:11:31Z

src/grimp/application/usecases.py

+    import_scanner: AbstractImportScanner,
+    exclude_type_checking_imports: bool,
+    cache: caching.Cache,
+):


Missing type annotation on return value.

wesleykendall · 2024-07-01T10:54:27Z

I checked out this branch and installed it locally. It dramatically slowed down graph building time on a large test repo (0.6 seconds -> 35 seconds).

On a Mac M2. I verified that my local installation of the master branch on this repo yielded fast results, so I don't think I'm doing anything wrong on the installation.

Seems like something is wrong. When I set max_workers to 1 on the process executor, build time was around 10 seconds. max_workers of 2 slowed down to 14 seconds. Let me know if there is another way I can profile

duzumaki marked this pull request as draft February 12, 2024 01:18

duzumaki force-pushed the speed_up_build_graph branch 2 times, most recently from a86ed83 to 9ac45a1 Compare February 12, 2024 01:24

duzumaki marked this pull request as ready for review February 12, 2024 01:31

duzumaki commented Feb 12, 2024

View reviewed changes

seddonym reviewed Feb 12, 2024

View reviewed changes

duzumaki force-pushed the speed_up_build_graph branch 2 times, most recently from cddb258 to 914d301 Compare February 12, 2024 19:24

duzumaki added 2 commits February 12, 2024 19:40

Speed up import scanning

f01f901

Update test

cecca20

duzumaki force-pushed the speed_up_build_graph branch from 914d301 to cecca20 Compare February 12, 2024 19:41

seddonym reviewed Feb 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up import scanning by 400-500% #142

Speed up import scanning by 400-500% #142

duzumaki commented Feb 12, 2024 •

edited

Loading

duzumaki Feb 12, 2024 •

edited

Loading

seddonym Feb 12, 2024

seddonym Feb 12, 2024

seddonym commented Feb 12, 2024

seddonym Feb 13, 2024

wesleykendall commented Jul 1, 2024 •

edited

Loading

		def test_syntax_error_terminates_executor_pool():
		with pytest.raises(BrokenProcessPool):

Speed up import scanning by 400-500% #142

Are you sure you want to change the base?

Speed up import scanning by 400-500% #142

Conversation

duzumaki commented Feb 12, 2024 • edited Loading

duzumaki Feb 12, 2024 • edited Loading

Choose a reason for hiding this comment

seddonym Feb 12, 2024

Choose a reason for hiding this comment

seddonym Feb 12, 2024

Choose a reason for hiding this comment

seddonym commented Feb 12, 2024

seddonym Feb 13, 2024

Choose a reason for hiding this comment

wesleykendall commented Jul 1, 2024 • edited Loading

duzumaki commented Feb 12, 2024 •

edited

Loading

duzumaki Feb 12, 2024 •

edited

Loading

wesleykendall commented Jul 1, 2024 •

edited

Loading