Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GoTestResolver: Limit scope of indexing to the package level instead of entire workspace/module #2504

Open
mnoah1 opened this issue Oct 21, 2022 · 26 comments
Assignees
Labels
FeatureRequest go-test issues related to go test support (test output, test explorer, ...) WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@mnoah1
Copy link

mnoah1 commented Oct 21, 2022

Is your feature request related to a problem? Please describe.
When working in a large monorepo, the test explorer indexes indefinitely as it attempts to discover every test in the monorepo. Even after the tests in the currently open package are displayed, it continues to spin and slowly populate other packages which may not be relevant to the user's current work.

Describe the solution you'd like
Adjust the resolve method in GoTestResolver. Add a new configuration setting that optionally skips the step of finding all packages in the workspace, so tests will only get added if part of the user's active package.

Describe alternatives you've considered
Potentially more complex logic that lets a user configure the projects that are relevant to them to be included in indexing.

Additional context
Below is the block of code in goTest/resolve that could go behind a conditional. I can contribute this change if you're good with it.

if (kind === 'module' || kind === 'workspace') {
    await walkPackages(this.workspace.fs, item.uri, async (uri) => {
        await this.getPackage(uri);
    });
}
@gopherbot gopherbot added this to the Untriaged milestone Oct 21, 2022
@hyangah
Copy link
Contributor

hyangah commented Oct 27, 2022

Thanks for the FR @mnoah1

There is a separate feature request #1739 and I think that matches what you described as an alternative - except that the FR and the [cl/351172](https://go-review.git.corp.google.com/c/vscode-go/+/351172 proposed by @firelizzard18 discusses only on exclude option. If the feature #1739 is implemented, do you still think this is useful?

Often times, users work on multiple packages. I am worried that kicking the resolver every time as users navigate/open/close files in different packages can be more complex than it looks.

@hyangah hyangah added go-test issues related to go test support (test output, test explorer, ...) FeatureRequest labels Oct 27, 2022
@hyangah hyangah removed their assignment Oct 27, 2022
@hyangah hyangah added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Oct 27, 2022
@mnoah1
Copy link
Author

mnoah1 commented Oct 31, 2022

Hi @hyangah - I think in the case of a large repo, the exclude option would not be ideal since for most users that work on only a few projects, they'll end up needing to set most of the repo to be excluded which will become tedious to maintain. If we could just avoid walking the entire workspace when looking for more packages to add, it should be relatively close to the current behavior with new packages updating in the list based on users opening files, and staying there if the user leaves to go to another package. So effectively it would build up a list of the user's current working packages.

@mnoah1
Copy link
Author

mnoah1 commented Nov 18, 2022

Hi @hyangah - any thoughts on this? I have a solution on my fork that seems to work as needed, should I try submitting it as a PR?

@mnoah1
Copy link
Author

mnoah1 commented Jan 12, 2023

@hyangah - Just wanted to bump this.

@greenstatic
Copy link

I would like this feature as well. In monorepos you sometimes/often have different projects in different languages. In my case I have C++, Node.js and Python projects in my monorepo. Currently the go test explorer is completely useless for me and I have it disabled using "go.testExplorer.enable": false in my settings.json file because if it is enabled, it takes 5+ minutes to load the test explorer UI.

gopls directoryFilters are also not respected by GoTestResolver thus having a separate Go setting field where you could specify not to walk the directories of non-Go projects (e.g. C++, Node.js, Python, etc.) makes sense since they will generally never contain Go code.

I can help work on a setting to enable this, but would like to hear from you guys if this is something that would be accepted in a PR.

@mnoah1
Copy link
Author

mnoah1 commented Mar 21, 2023

I have a working version of this in a fork, I can add what I have so far into a PR as well. It adds a setting, indexEntireWorkspace - if set to false, the test explorer won't walk all packages and will only look for tests in the same package as a currently open _test.go file.

@greenstatic
Copy link

greenstatic commented Mar 21, 2023

@mnoah1 does that mean that you need to open up a _test.go file for the GoTestResolver to detect it? If so, that isn't convenient for running all tests in multiple packages. Or am I missing something?

I think a better approach would be to simply have a list of excluded directories that thewalk function would exclude from the directories it searches.

If you would have a project layout like so:

/my_cpp_project/
/my_nodejs_project/
/my_python_project/
/my_go_project1/
/my_go_project2/
/my_misc_project/python_tests/
/my_misc_project/go_utility/

The settings.json configuration would look something like :

"go.testExplorer.walk.exclude": [
        "my_cpp_project",
        "my_nodejs_project",
        "my_python_project",
    ]

And walk would be allowed to search just the directories my_go_project1 and my_go_project2 and my_misc_project.

@mnoah1
Copy link
Author

mnoah1 commented Mar 21, 2023

I see, I think that's a slightly different use case, more like this issue? #1739

The intent with this one is for very large repos, where the current walk logic works fine, but indexes too many packages that aren't relevant at that point in time for the user. So this could limit it to the user's current open packages.

@mnoah1
Copy link
Author

mnoah1 commented Mar 21, 2023

Added a PR for the limiting indexing to current package. I think it could coexist with an additional setting for directories to exclude entirely from the walk as @greenstatic is suggesting.

@suzmue @hyangah do you have any feedback on next steps?

@gopherbot
Copy link
Collaborator

Change https://go.dev/cl/478175 mentions this issue: Add setting to limit indexing of the entire workspace

@firelizzard18
Copy link
Contributor

I can help work on a setting to enable this, but would like to hear from you guys if this is something that would be accepted in a PR.

@greenstatic My impression is that the Go team do not have the bandwidth to work on this themselves and would likely accept a PR. As the original developer of the test explorer adapter, I am happy for others to contribute.

because if it is enabled, it takes 5+ minutes to load the test explorer UI.

That is odd. The resolver implementation does not recursively scan folders so it should not bog down the UI unless the host (VSCode) is doing something I don't expect. GoTestResolver.resolve's logic boils down to:

  • Am I resolving the root?
    • Find Go modules and VSCode workspaces
  • Am I resolving a module/workspace?
    • Find packages
  • Am I resolving a package?
    • Find *_test.go files
  • Am I resolving a file?
    • Find symbols

Maybe it's the find packages step that's taking so long for you. Scanning for files and symbols should only happen lazily when you expand those items in the explorer. It's possible enabling package nesting will improve performance, since that might make the package discovery lazy instead of it recursing down all the folder structures.

Another possibility would be adding a setting to disable support for non-module workspaces. That should automatically exclude any folder that does not contain a go.mod file.

gopls directoryFilters are also not respected by GoTestResolver

The test explorer driver is implemented 100% in typescript (except for symbol resolution); gopls settings aren't respected because it doesn't use gopls (except for symbols).

I have a working version of this in a fork, I can add what I have so far into a PR as well. It adds a setting, indexEntireWorkspace - if set to false, the test explorer won't walk all packages and will only look for tests in the same package as a currently open _test.go file.

@mnoah1 IMO a more useful setting would be go.testExplorer.ignoreNonModules so it would skip any workspace folder that did not contain a go.mod file. That way you would still see the package lists but it would not scan non-go projects.

@firelizzard18
Copy link
Contributor

@mnoah1 As I said above, the test explorer should not be trying to index the entire workspace. Before implementing your proposal, I'd rather see if my proposal (ignore non-modules) would work for you and/or determine why performance is so bad in your case. Because it's definitely supposed to be lazy.

FWIW the setting that nests packages is "go.testExplorer.packageDisplayMode": "nested"

@greenstatic I responded in detail on #1739 to your go.testExplorer.walk.exclude suggestion.

@mnoah1
Copy link
Author

mnoah1 commented Mar 22, 2023

@firelizzard18 Is it expected to be lazy in loading the list of packages, or just the tests within those packages? It looks like the current behavior is as follows, and while it's lazy to load the test cases, it does attempt to find all packages right away:

  1. This line gets called with the root of the workspace, and begins iterating through all paths to discover packages.
  2. This line runs for every file in the repo, and packages get discovered. They'll begin to appear slowly in the list.
    • Adding the go.mod limitation might provide some small benefit here, as there are cases where it goes down a part of the repo that contains no Go files.
  3. User begins to see packages gradually appear on the list, but they're not relevant to the files that they're currently working on. To the user, it appears to index indefinitely and the spinner at the root is constantly running.
    • Our repo has thousands of packages, so the poor performance is mainly due to scale.
    • Even if it were fast, the list of packages is still very long (or very deeply nested if that's turned on) and full of irrelevant packages.
  4. If a user does open a test file, or expand a package, those tests get loaded right away (e.g. the steps above are non-blocking).

So the proposed setting on #2709 could turn off steps 1-3 and keep step 4.

  • Advantages:
    • Intuitive because the contents of the test explorer match the user's current work or things they've recently worked on. Good feedback so far from users working with this in our internal fork.
    • Doesn't require the user to maintain/update a configuration with specific paths/packages (in larger repos users need to be able to jump around between packages without worrying about whether it's included in their config)
    • Straightforward implementation
  • Disadvantages:
    • User has to open something in a given package for it to load
    • Can't freely browse every package in the repo

@firelizzard18
Copy link
Contributor

@mnoah1 It's been a while so I forgot the details. Originally it would lazily list folders and only scan for files once you expanded those folders. However that lead to it including folders that were not packages, hence the check for a *_test.go file. I think the following would improve your case:

  • If packageDisplayMode is nested, use a simple readdir operation instead of a walk. This may require refactoring.
  • Add a setting to disable the check for *_test.go files and instead blindly assume every folder is a package. This should make test item population properly lazy.

Looking at the code I agree that the go.mod setting wouldn't make a huge difference, since it still scans. Part of the reason I'm not enthusiastic about #2709 is that it modifies the file-did-open behavior as well as effectively disabling the resolver. If all you were doing was disabling the resolver, I'd suggest renaming the setting and then give it my 👍. However the fact that the file-did-open handler would also be triggering the resolver erodes separation of concerns and would make the code harder to follow and maintain. Perhaps that could be addressed via adding the resolve logic in an additional event handler registered by the resolver itself, but I'd rather think more carefully about the overall problem. Disabling the resolver feels like a band-aid and not a true solution.

@mnoah1
Copy link
Author

mnoah1 commented Mar 27, 2023

I think we could address the separation of concerns issue by moving the conditional logic into the resolver, creating a GoTestResolver.processCreatedFile method which conditionally calls getFile if the file is to be included. I updated #2709 with a rough idea of what this could look like.

I also looked into both of the suggestions below, but I think they have some drawbacks specific to very large monorepos:

  • If packageDisplayMode is nested, use a simple readdir operation instead of a walk. This may require refactoring.
    • Once the nesting becomes too deep (e.g. 5+ levels), we've gotten user feedback that nested mode becomes a bit harder to follow. So I don't think we should force users to use this mode just to get the performance needed.
  • Add a setting to disable the check for *_test.go files and instead blindly assume every folder is a package. This should make test item population properly lazy.
    • Even if this is quicker, user ends up with a long list of thousands of packages across the company that aren't relevant to their current work.

So I think, to make this experience good for users in large organizations that use this extension, there would ideally be some kind of inference between the user's current work and the tests that appear in the explorer.

@mnoah1
Copy link
Author

mnoah1 commented Apr 3, 2023

@firelizzard18 @suzmue @hyangah - Just wanted to see if there are any thoughts on the above?

@firelizzard18
Copy link
Contributor

The solution you implement #2709 is not one I find appealing. It essentially boils down to "index a package if and when I open a file in that package." I believe that it works but it feels inelegant and more of a stop-gap than a true solution. Ultimately I'm not in a position to dis/approve your MR so my opinion only matters in so much as it influences the maintainers.

I'm not very satisfied with the test discovery in general. I've mentioned before that I'd like to move test discovery into gopls, though I can't find those comments now. I think a lot of issues with the test explorer could be resolved by doing that. gopls is already indexing the workspace for Go files so I expect it would be relatively simple and quick to produce a list of packages with test files.

@firelizzard18
Copy link
Contributor

I opened golang/go#59445 to discuss moving test discovery into gopls.

@firelizzard18
Copy link
Contributor

@mnoah1 I am working on rebuilding the test explorer. For context, please familiarize yourself with gopls' Modules and Packages commands. TL;DR:

  • Given a path, Modules lists all the already loaded modules contained within that path, optionally recursively. This will not cause modules to be loaded.
  • Given a list of paths, for each path Packages lists the package that file belongs to (if it is a file) or the package that corresponds to the directory (if it is a directory), optionally recursively.

A couple notes: the implementations are not actually recursive. It's meaningful to talk about these arguments as recursive, but the implementation is already scanning over the entire set of modules/packages loaded by gopls so there's no recursion happening. For this reason, enabling recursion for Modules should not have a meaningful performance impact. Enabling recursion for Packages will have an impact but only in so much as it scans packages for tests and scanning more packages takes more time.

I'm rebuilding the test explorer around these two functions. The basic implementation is to scan the workspace for modules (recursively) and list those, then when the user expands a module it will scan that module for packages (recursively). At that point, gopls has already done the work, so it may be reasonable to populate the entire test item tree for that module's packages, though I currently only populate items when they're expanded.

Given this new approach, how do you want the test explorer to work in your scenario? Here's what I have so far.

@mnoah1
Copy link
Author

mnoah1 commented Jul 19, 2024

@firelizzard18 thanks for reaching out, I'll take a look.

@mnoah1
Copy link
Author

mnoah1 commented Aug 16, 2024

Sorry for the delay, I've taken a look at this and have some ideas.

Since I see you already have a spot to set the discovery mode, a couple of options that I think could make sense:

  • "Recursive packages of current files" option - Use user's current open files as a starting point to request packages, recursively. This would at least be a reasonable guess on which part of the repo a user wants to see, and if a package is missing then it's clear that they can just open the file to have it appear.
  • "Packages of current files", which would find packages based on a user's current work (e.g. approach earlier in the thread). Though it's narrow, it's always applicable to the user's work and avoids having the test explorer make the wrong guess about what to load. Lots of users mainly care about getting the run arrows anyway and will just go to test explorer panel for results. (could be a good starting point as the simplest option, then an intermediate option could be introduced)
  • "Dependencies of modified packages" - This is probably infeasible for now, but just something to think about, it would be nice to be able to modify a package and have the test explorer show all packages that depend on the modified one - those are really the tests that actually need to run/pass following a change.

it may be reasonable to populate the entire test item tree for that module's packages, though I currently only populate items when they're expanded.

I think this depends heavily on the repo/project structure, in a monorepo setup where there is only one go mod file, this could still be tens of thousands of packages, so may need to be up to the user.

Enabling recursion for Packages will have an impact but only in so much as it scans packages for tests and scanning more packages takes more time.

This brings up another point, in how gopls will handle the command to get packages (especially its integration with GOPACKAGESDRIVER). Will it only return packages that are already known to gopls, or is it similar to the existing flow used for code intelligence where it would trigger a GOPACKAGESDRIVER request?

As an example, if we have this scenario:

package/abc -> sample_abc_test.go -> depends on packages foo, bar
package/def -> sample_def_test.go -> depends on packages aaa, bbb

Currently if the user opens sample_abc_test.go, gopls will request package info for sample_abc_test.go and become aware of packages abc, foo, bar. But it won't know about packages def, aaa, bbb unless a file in package def (or something else with those packages in their dependency graph) is opened. So there may need to be some kind of reload mechanism here if gopls has loaded more packages since the initial population of test explorer, maybe triggered on browsing a _test.go file that does not yet have a test item.

@firelizzard18
Copy link
Contributor

The statements below reflect how I have been told to implement these commands. They may not be precisely true but they reflect how gopls's internal APIs behave and are expected to be used and thus are true for all intents and purposes relevant to this issue in so far as I understand gopls.

Will it only return packages that are already known to gopls?

The current (not merged) implementation of the gopls.packages and gopls.modules commands will not load anything. They simply filter the already existing/loaded views as per the command arguments and scan for relevant packages/modules. There is no plan to change this behavior. If a user opens a file that doesn't have a view, that will trigger loading of the view and the subsequent gopls.packages command will see that view.

Currently if the user opens sample_abc_test.go, gopls will request package info for sample_abc_test.go and become aware of packages abc, foo, bar.

gopls is aware of the metadata of imported packages, but I don't think this is relevant. If they do not belong to the same module as abc, I'm almost certain the module they do belong to will not be loaded, thus gopls.packages and gopls.modules will be unaware those packages exist.

But it won't know about packages def, aaa, bbb unless a file in package def (or something else with those packages in their dependency graph) is opened.

Your statement is not necessarily correct with respect to this issue (to the two gopls commands I mentioned). From the perspective of those commands, any packages that are in the same module as def (including but not limited to abc, aaa, and bbb) will be loaded and any packages that are in a different module will not be loaded.

@mnoah1
Copy link
Author

mnoah1 commented Aug 16, 2024

From the perspective of those commands, any packages that are in the same module as def and already returned in a past driverResponse (including abc, foo, bar but not def, aaa, bbb until something in def is opened) will be loaded and any packages that are in a different module will not be loaded.

I added a qualification in bold, which is how I would understand the situation to work when using packagesdriver (and the behavior that we see with other gopls features like type checking and autocomplete). Is that not correct? The packages driver responds with the relevant packages when requested based on go file name, but until a package has been part of at least one response, gopls won't have it stored or be able to return it, regardless of which module it's a part of.

@firelizzard18
Copy link
Contributor

The current iteration of those commands boils down to:

func (s *Server) Modules(...) (...) {
	for _, view := range s.session.Views() {
		snapshot := view.Snapshot()
		defer snapshot.Release()

		for _, mod := range view.Modules() {
			if want(mod) {
				add(mod)
			}
		}
	}
}

func (s *Server) Packages(...) (...) {
	for _, view := range s.session.Views() {
		snapshot := view.Snapshot()
		defer snapshot.Release()

		for _, pkg := range view.WorkspacePackages() {
			if want(pkg) {
				add(pkg, snapshot.Tests(pkg))
			}
		}
	}
}

So these commands scan whatever views and packages have already been loaded. Each view is tied to a go.mod or go.work (or equivalent), though there can be multiple views for a go.mod/go.work for different build tag sets. I don't have a clue how it works with GOPACKAGEDRIVER/a non-go-driver. But either way, the command iterates over loaded views and the packages gopls thinks belong to that view. Slight note, I wrote WorkspacePackages for clarity, it's actually WorkspaceMetadata. There is an AllMetadata method that includes dependencies, but AFAIK those are not considered part of the view. snapshot.Tests is what does the actual analysis though the results are cached. I believe snapshot.Tests will fail if called for a package that is not a workspace package.

I'll poke around and see what I can see next time I have gopls setup for debugging. But, and this is purely conjecture, I think it's something like packages loaded from source vs packages loaded from the build cache. As in, all you need for most language features (e.g. autocompletion) is type metadata included in the compiled package. But when you want to open a file and jump to a definition, you need to know source information about the declaration you're jumping to.

@firelizzard18
Copy link
Contributor

Separate the concept of a "syntax package" (something containing AST and types.Info) from an "export package" (a types.Package with type information for exported symbols). Syntax packages are used to fully understand the syntax a user is working on. Export packages are used for type checking.

From golang/go#57987, which is about refactoring gopls (so it may not reflect the final state of that effort). Based on that snippet, I think packages within a view are loaded as "syntax packages" and external packages they depend on are loaded as "export packages". At least in the context of the default packages driver, I believe all packages within the module/view are loaded as syntax packages, whereas any dependencies are loaded as export packages. And test discovery does not scan export packages.

@firelizzard18
Copy link
Contributor

This should be addressed by #3523

@firelizzard18 firelizzard18 self-assigned this Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FeatureRequest go-test issues related to go test support (test output, test explorer, ...) WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants