Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Analyzer): Add a flag to skip excluded projects / scopes in dependency graph #6405

Conversation

oheger-bosch
Copy link
Member

This flag indicates that dependencies in excluded scopes should not be added to the dependency graph. For large projects, this could reduce the size of the dependency graph and the analysis time significantly.

@sschuberth
Copy link
Member

I'd like to just point out #5968. It would be great if this work could fully address that issue.

@oheger-bosch
Copy link
Member Author

@fviernau, @sschuberth: This is currently a bit experimental. What I am after is indeed not a full solution for #5968, but only a first step:

We have currently problems with an Android project that frequently crashes our ORT pipeline due to resource exhaustion. In the cases the build runs through, the Analyzer step takes more than 2 hours. The time seems to be consumed mainly when constructing the dependency graph.

I did some tests in which I filtered out test scopes when adding the dependencies to the graph builder - this made the build stable and reduced the time of the Analyzer step to 8 minutes! So, I think, this intermediate step to skip excluded scopes when building the dependency graph is already beneficial - and the implementation should not be too complex.

WDYT?

@sschuberth
Copy link
Member

I did some tests in which I filtered out test scopes when adding the dependencies to the graph builder - this made the build stable and reduced the time of the Analyzer step to 8 minutes!

Feels a bit like cheating 😉 More seriously, I'd really like to understand what exactly is taking so long to construct the dependency graph. Also, I wouldn't expect test scopes to have much more dependencies than "regular" scopes, so why does excluding them make such a big difference?

@oheger-bosch oheger-bosch force-pushed the oheger-bosch/analyzer/dependency_graph_exclusions branch from 1c9ff9a to c023700 Compare January 30, 2023 08:25
@oheger-bosch
Copy link
Member Author

Feels a bit like cheating wink More seriously, I'd really like to understand what exactly is taking so long to construct the dependency graph. Also, I wouldn't expect test scopes to have much more dependencies than "regular" scopes, so why does excluding them make such a big difference?

Fair points. What we have noticed is that, starting from a certain size, adding further dependencies to the graph becomes rather slow. This is probably due to the fact that the subtrees that need to be compared become larger and larger. For Android projects with their numerous synthetic scopes this seems to be a real issue; so reducing the overall size of the graph by excluding unneeded scopes can actually have a significant effect.

Regarding "cheating": I think, handling scope excludes when constructing the dependency graph needs to be done anyway when implementing full support for exclusions as requested by #5968. So, these changes are not a side-track, but already a partial solution to this goal. I have pushed some more commits, so hopefully it becomes a bit clearer now what I try to achieve.

@oheger-bosch oheger-bosch force-pushed the oheger-bosch/analyzer/dependency_graph_exclusions branch 5 times, most recently from a5279a2 to 1d4db04 Compare February 1, 2023 13:57
@oheger-bosch oheger-bosch force-pushed the oheger-bosch/analyzer/dependency_graph_exclusions branch 3 times, most recently from c4eeb22 to bb2c739 Compare February 2, 2023 16:00
@oheger-bosch
Copy link
Member Author

I have extended the PR to cover the functionality requested by #5968. Still testing whether there is something missing.

@oheger-bosch oheger-bosch force-pushed the oheger-bosch/analyzer/dependency_graph_exclusions branch 2 times, most recently from af9c053 to c675c27 Compare February 6, 2023 08:09
@oheger-bosch oheger-bosch marked this pull request as ready for review February 6, 2023 09:05
@oheger-bosch oheger-bosch requested a review from a team as a code owner February 6, 2023 09:05
@oheger-bosch oheger-bosch force-pushed the oheger-bosch/analyzer/dependency_graph_exclusions branch from c675c27 to 8ad39d4 Compare February 6, 2023 10:05
model/src/main/kotlin/config/AnalyzerConfiguration.kt Outdated Show resolved Hide resolved
analyzer/src/main/kotlin/PackageManager.kt Show resolved Hide resolved
analyzer/src/main/kotlin/PackageManager.kt Outdated Show resolved Hide resolved
model/src/main/kotlin/utils/DependencyGraphConverter.kt Outdated Show resolved Hide resolved
analyzer/src/main/kotlin/Analyzer.kt Outdated Show resolved Hide resolved
analyzer/src/main/kotlin/AnalyzerResultBuilder.kt Outdated Show resolved Hide resolved
analyzer/src/main/kotlin/PackageManager.kt Show resolved Hide resolved
analyzer/src/main/kotlin/PackageManager.kt Outdated Show resolved Hide resolved
@oheger-bosch oheger-bosch force-pushed the oheger-bosch/analyzer/dependency_graph_exclusions branch from 8ad39d4 to 5c1520e Compare February 6, 2023 12:31
*/
fun findManagedFiles(
directory: File,
packageManagers: Collection<PackageManagerFactory> = ALL.values
packageManagers: Collection<PackageManagerFactory> = ALL.values,
excludes: Excludes = EMPTY_EXCLUDES
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PackageManager has an instance of RepositoryConfiguration containing all excludes.
So, in case --skip-excluded is passed, the excludes are passed redundantly. Would it make sense to pass a Boolean / flag instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findManagedFiles() is a function of the companion object. So the RepositoryConfiguration is not available here.

analyzer/src/main/kotlin/PackageManager.kt Outdated Show resolved Hide resolved
analyzer/src/main/kotlin/PackageManager.kt Show resolved Hide resolved
@oheger-bosch oheger-bosch force-pushed the oheger-bosch/analyzer/dependency_graph_exclusions branch 2 times, most recently from cb0f0c7 to a49e6d8 Compare February 6, 2023 14:10
@oheger-bosch oheger-bosch requested a review from fviernau February 6, 2023 14:15
@oheger-bosch
Copy link
Member Author

It is not clear to me why the commit-lint check fails. Could anybody help me understand what is wrong?

@sschuberth
Copy link
Member

It is not clear to me why the commit-lint check fails. Could anybody help me understand what is wrong?

For some reason it seems to believe graph format natively (see #3825). is a foother line. No idea why. Maybe try omitting the parentheses and using a comma instead, like , see #3825.

@oheger-bosch
Copy link
Member Author

It is not clear to me why the commit-lint check fails. Could anybody help me understand what is wrong?

For some reason it seems to believe graph format natively (see #3825). is a foother line. No idea why. Maybe try omitting the parentheses and using a comma instead, like , see #3825.

The comma was not sufficient, obviously it has problems with the '#' character. I now used the reference style.

@oheger-bosch oheger-bosch force-pushed the oheger-bosch/analyzer/dependency_graph_exclusions branch from f12d66f to 89b28b7 Compare February 6, 2023 15:12
@fviernau fviernau dismissed their stale review February 6, 2023 16:01

addressed.

@fviernau fviernau changed the title feat(Analyzer): Add a flag to skip excluded scopes in dependency graph feat(Analyzer): Add a flag to skip excluded projects / scopes in dependency graph Feb 6, 2023
@sschuberth sschuberth requested a review from a team February 6, 2023 17:02
This flag determines whether the excludes defined in the repository
configuration should lead to paths being skipped completely during
analysis. This means, if set to true, only definition files in
non-excluded paths are processed, and dependencies in non-excluded
scopes are added to the dependency graph.

Signed-off-by: Oliver Heger <[email protected]>
This is analogous to other model classes. The constant can be used
where an `Excludes` object is required, but no elements need to be
excluded.

Signed-off-by: Oliver Heger <[email protected]>
This property contains the excludes to be applied during analysis,
which depend on both the analyzer configuration and the configuration
of the current repository. Having this property available centrally
simplifies the implementation of concrete PackageManagers.

Signed-off-by: Oliver Heger <[email protected]>
Before adding a dependency to the dependency graph builder, make sure
that the scope is not excluded if the 'skipExcluded' flag is set.

Signed-off-by: Oliver Heger <[email protected]>
Before adding a dependency to the dependency graph builder, make sure
that the scope is not excluded if the 'skipExcluded' flag is set.

Signed-off-by: Oliver Heger <[email protected]>
Before adding a dependency to the dependency graph builder, make sure
that the scope is not excluded if the 'skipExcluded' flag is set.

Signed-off-by: Oliver Heger <[email protected]>
Before adding a dependency to the dependency graph builder, make sure
that the scope is not excluded if the 'skipExcluded' flag is set.

Signed-off-by: Oliver Heger <[email protected]>
The convert() function now accepts an Excludes object. The result of
the conversion contains only dependencies not defined by an excluded
scope.

This functionality is going to be used to implement scope exclusions
centrally for package managers that do not support the dependency
graph format natively, see [1].

[1] oss-review-toolkit#3825

Signed-off-by: Oliver Heger <[email protected]>
Pass the configured `Excludes` to the dependency graph converter when
constructing the analyzer result. That way, scope excludes are
effective for all package manager implementations that do not support
the dependency graph format natively.

Signed-off-by: Oliver Heger <[email protected]>
This allows skipping projects in specific paths already before
starting the analysis. That way problematic / large projects that are
irrelevant compliance-wise can be excluded leading to reduced resource
usage and analysis time.

Signed-off-by: Oliver Heger <[email protected]>
If the analyzer is configured to skip excludes, it passes the Excludes
defined in the repository configuration to
PackageManager.findManagedFiles(). That way the analyzer result will
contain only projects that are not matched by a path exclude.

Resolves oss-review-toolkit#5968.

Signed-off-by: Oliver Heger <[email protected]>
@oheger-bosch oheger-bosch force-pushed the oheger-bosch/analyzer/dependency_graph_exclusions branch from 89b28b7 to 414be04 Compare February 7, 2023 06:33
Copy link
Member

@sschuberth sschuberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'd prefer to have at least one other approval from @oss-review-toolkit/kotlin-devs.

@sschuberth sschuberth requested a review from a team February 7, 2023 07:32
@@ -338,55 +338,55 @@ class Yarn2(
mapping += it.value
}
}
graphBuilder.addPackages(allPackages.values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've removed this, but you aren't changing anything below aside from applying the filter.
Was this never required, or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This was never required, since the calls to addDependency() automatically add the packages to the graph builder. This would only be needed if packages were constructed in an alternative way, which is not the case here.

/**
* A flag to control whether excluded scopes and paths should be skipped during the analysis.
*/
val skipExcluded: Boolean = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document this new option. I guess the best place to do this would be this file:
https://github.com/oss-review-toolkit/ort/blob/main/docs/config-file-ort-yml.md#excludes

In there, you'd also need to adjust this sentence, as this changes with this PR:

Note that the excluded parts are analyzed and scanned

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this is on my agenda, but I want to get this merged first to be sure that there are no more changes in naming or otherwise. I will then open a follow-up PR.

@oheger-bosch oheger-bosch merged commit 6451027 into oss-review-toolkit:main Feb 7, 2023
@oheger-bosch oheger-bosch deleted the oheger-bosch/analyzer/dependency_graph_exclusions branch February 7, 2023 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants