Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: lazily load gazelle manifest files #2746

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mattem
Copy link
Contributor

@mattem mattem commented Apr 6, 2025

In large repositories where Python may not be the only language, the gazelle manifest loading is done unnecessarily, and is done during the configuration walk.

This means that even for non-python gazelle invocations (eg bazel run gazelle -- web/), Python manifest files are being parsed and loaded into memory.
This issue compounds if the repository uses multiple dependency closures, ie multiple gazelle_python.yaml files.
In our repo, we currently have ~250 Python manifests, so loading them when Gazelle is only running over other languages is time consuming.

@mattem mattem requested review from dougthor42 and aignas as code owners April 6, 2025 11:40
@mattem
Copy link
Contributor Author

mattem commented Apr 6, 2025

The CI failure here seems unrelated, it's failed to clone the repo on the agent.

@mattem mattem force-pushed the perf/lazy_parse_gazelle_manifests branch from 3a57101 to 225f9cc Compare April 6, 2025 11:49
@mattem mattem requested a review from rickeylev as a code owner April 6, 2025 11:49
Copy link
Collaborator

@dougthor42 dougthor42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any issues with this. I'd like to test it on our monorepo before approving, but I predict it'll be just fine.

For your case, what sort of speedups are you seeing? And how did you profile it (so that I can try to reproduce it)?

we currently have ~250 Python manifests,

Do you mean ~250 gazelle_python.yaml files? Or 250 packages listed in a single gazelle_python.yaml?

@mattem
Copy link
Contributor Author

mattem commented Apr 6, 2025

I'd like to test it on our monorepo before approving, but I predict it'll be just fine.

Sure, this has been running internally for us for ~2 months now, but we are still on the (now very diverged) original copy of the extension. Let me know how it goes.

For your case, what sort of speedups are you seeing? And how did you profile it (so that I can try to reproduce it)?

It's slightly tricky to say, as it depends on the directory that Gazelle is running over, but we saw a reduction of about 30 seconds when running over only web targets, where no Python manifests are required. If you always run Gazelle over Python directories, then this isn't going to change anything.

Gazelle can be profiled by via the -cpuprofile and -memprofile flags. The CPU profile is written to the given path, and can viewed via pprof (go tool pprof ...).
The other part I can upstream is the removal of filepath.Walkdir here, https://github.com/bazel-contrib/rules_python/blob/main/gazelle/python/generate.go#L154. There's no need to re-walk the file system, Gazelle has already collected up the files on the configuration pass. This is very noticeable on the profiles, given the walk does both file system walking and stats on files. Removing this took about 70 seconds off running Gazelle over our whole repo.

Do you mean ~250 gazelle_python.yaml files? Or 250 packages listed in a single gazelle_python.yaml?

250 individual gazelle_python.yaml files. 250 packages in a single file would be on the smaller side of the majority of manifest files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants