Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement target distance metrics #230

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,44 @@ Open `bazel-diff-example.sh` to see how this is implemented. This is purely an e

* We run `bazel-diff` on the starting and final JSON hash filepaths to get our impacted set of targets. This impacted set of targets is written to a file.

## Build Graph Distance Metrics

`bazel-diff` can optionally compute build graph distance metrics between two revisions. This is
useful for understanding the impact of a change on the build graph. Directly impacted targets are
targets that have had their rule attributes or source file dependencies changed. Indirectly impacted
targets are that are impacted only due to a change in one of their target dependencies.

For each target, the following metrics are computed:

* `target_distance`: The number of dependency hops that it takes to get from an impacted target to a directly impacted target.
* `package_distance`: The number of dependency hops that cross a package boundary to get from an impacted target to a directly impacted target.

Build graph distance metrics can be used by downstream tools to power features such as:

* Only running sanitizers on impacted tests that are in the same package as a directly impacted target.
* Only running large-sized tests that are within a few package hops of a directly impacted target.
* Only running computationally expensive jobs when an impacted target is within a certain distance of a directly impacted target.

To enable this feature, you must generate a dependency mapping on your final revision when computing hashes, then pass it into the `get-impacted-targets` command.

```bash
git checkout BASE_REV
bazel-diff generate-hashes [...]

git checkout FINAL_REV
bazel-diff generate-hashes --depsFile deps.json [...]

bazel-diff get-impacted-targets --depsFile deps.json [...]
```

This will produce an impacted targets list with target label, target distance, and package distance delimited by `~` characters:

```text
//foo:bar~0~0
//foo:baz~1~0
//bar:qux~1~1
alex-torok marked this conversation as resolved.
Show resolved Hide resolved
```

## CLI Interface

`bazel-diff` Command
Expand Down Expand Up @@ -352,6 +390,13 @@ Now you can simply run `bazel-diff` from your project:
bazel run @bazel_diff//:bazel-diff -- bazel-diff -h
```

## Learn More

Take a look at the following bazelcon talks to learn more about `bazel-diff`:

* [BazelCon 2023: Improving CI efficiency with Bazel querying and bazel-diff](https://www.youtube.com/watch?v=QYAbmE_1fSo)
* BazelCon 2024: Not Going the Distance: Filtering Tests by Build Graph Distance: Coming Soon
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️


## Running the tests

To run the tests simply run
Expand Down
11 changes: 11 additions & 0 deletions cli/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,17 @@ kt_jvm_test(
runtime_deps = [":cli-test-lib"],
)

kt_jvm_test(
name = "TargetHashTest",
jvm_flags = select({
":macos": ["-Djava.security.manager=allow"],
"//conditions:default": [],
}),
alex-torok marked this conversation as resolved.
Show resolved Hide resolved
test_class = "com.bazel_diff.hash.TargetHashTest",
runtime_deps = [":cli-test-lib"],
)


kt_jvm_test(
name = "SourceFileHasherTest",
data = [
Expand Down
11 changes: 10 additions & 1 deletion cli/src/main/kotlin/com/bazel_diff/cli/GenerateHashesCommand.kt
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,14 @@ class GenerateHashesCommand : Callable<Int> {
)
var ignoredRuleHashingAttributes: Set<String> = emptySet()

@CommandLine.Option(
names = ["-d", "--depsFile"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have a longer flag name here that is more descriptive?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it to --depEdgesFile. Is that sufficient / do you have any other suggestions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good

description = ["Path to the file where dependency edges are written to. If not specified, the dependency edges will not be written to a file."],
scope = CommandLine.ScopeType.INHERIT,
defaultValue = CommandLine.Parameters.NULL_VALUE
)
var depsMappingJSONPath: File? = null

@CommandLine.Option(
names = ["-m", "--modified-filepaths"],
description = ["Experimental: A text file containing a newline separated list of filepaths (relative to the workspace) these filepaths should represent the modified files between the specified revisions and will be used to scope what files are hashed during hash generation."]
Expand All @@ -159,14 +167,15 @@ class GenerateHashesCommand : Callable<Int> {
cqueryCommandOptions,
useCquery,
keepGoing,
depsMappingJSONPath != null,
fineGrainedHashExternalRepos,
),
loggingModule(parent.verbose),
serialisationModule(),
)
}

return when (GenerateHashesInteractor().execute(seedFilepaths, outputPath, ignoredRuleHashingAttributes, targetType, includeTargetType, modifiedFilepaths)) {
return when (GenerateHashesInteractor().execute(seedFilepaths, outputPath, depsMappingJSONPath, ignoredRuleHashingAttributes, targetType, includeTargetType, modifiedFilepaths)) {
true -> CommandLine.ExitCode.OK
false -> CommandLine.ExitCode.SOFTWARE
}.also { stopKoin() }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,14 @@ class GetImpactedTargetsCommand : Callable<Int> {
)
lateinit var finalHashesJSONPath: File

@CommandLine.Option(
names = ["-d", "--depsFile"],
description = ["Path to the file where dependency edges are. If specified, build graph distance metrics will be computed from the given hash data."],
scope = CommandLine.ScopeType.INHERIT,
defaultValue = CommandLine.Parameters.NULL_VALUE
)
var depsMappingJSONPath: File? = null

@CommandLine.Option(
names = ["-tt", "--targetType"],
split = ",",
Expand Down Expand Up @@ -66,19 +74,25 @@ class GetImpactedTargetsCommand : Callable<Int> {

validate()
val deserialiser = DeserialiseHashesInteractor()
val from = deserialiser.execute(startingHashesJSONPath, targetType)
val to = deserialiser.execute(finalHashesJSONPath, targetType)
val from = deserialiser.executeTargetHash(startingHashesJSONPath, targetType)
val to = deserialiser.executeTargetHash(finalHashesJSONPath, targetType)

val impactedTargets = CalculateImpactedTargetsInteractor().execute(from, to)
val impactedTargetsStream = if (depsMappingJSONPath != null) {
val depsMapping = deserialiser.deserializeDeps(depsMappingJSONPath!!)
CalculateImpactedTargetsInteractor().executeWithDistances(from, to, depsMapping).map { (label, metrics) ->
"${label}~${metrics.targetDistance}~${metrics.packageDistance}"
}.stream()
alex-torok marked this conversation as resolved.
Show resolved Hide resolved
} else {
CalculateImpactedTargetsInteractor().execute(from, to).stream()
}

return try {
BufferedWriter(when (val path=outputPath) {
BufferedWriter(when (val path = outputPath) {
null -> FileWriter(FileDescriptor.out)
else -> FileWriter(path)
}).use { writer ->
impactedTargets.forEach {
writer.write(it)
//Should not depend on OS
impactedTargetsStream.forEach { line ->
writer.write(line)
writer.write("\n")
}
}
Expand Down
5 changes: 3 additions & 2 deletions cli/src/main/kotlin/com/bazel_diff/di/Modules.kt
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ fun hasherModule(
cqueryOptions: List<String>,
useCquery: Boolean,
keepGoing: Boolean,
trackDeps: Boolean,
fineGrainedHashExternalRepos: Set<String>,
): Module = module {
val cmd: MutableList<String> = ArrayList<String>().apply {
Expand Down Expand Up @@ -61,8 +62,8 @@ fun hasherModule(
single { BazelClient(useCquery, fineGrainedHashExternalRepos) }
single { BuildGraphHasher(get()) }
single { TargetHasher() }
single { RuleHasher(useCquery, fineGrainedHashExternalRepos) }
single { SourceFileHasher(fineGrainedHashExternalRepos) }
single { RuleHasher(useCquery, trackDeps, fineGrainedHashExternalRepos) }
single<SourceFileHasher> { SourceFileHasherImpl(fineGrainedHashExternalRepos) }
single { ExternalRepoResolver(workingDirectory, bazelPath, outputPath) }
single(named("working-directory")) { workingDirectory }
single(named("output-base")) { outputPath }
Expand Down
9 changes: 7 additions & 2 deletions cli/src/main/kotlin/com/bazel_diff/hash/BuildGraphHasher.kt
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ class BuildGraphHasher(private val bazelClient: BazelClient) : KoinComponent {
ignoredAttrs: Set<String>,
modifiedFilepaths: Set<Path>
): Map<String, TargetHash> {
val ruleHashes: ConcurrentMap<String, ByteArray> = ConcurrentHashMap()
val ruleHashes: ConcurrentMap<String, TargetDigest> = ConcurrentHashMap()
val targetToRule: MutableMap<String, BazelRule> = HashMap()
traverseGraph(allTargets, targetToRule)

Expand All @@ -114,7 +114,12 @@ class BuildGraphHasher(private val bazelClient: BazelClient) : KoinComponent {
ignoredAttrs,
modifiedFilepaths
)
Pair(target.name, TargetHash(target.javaClass.name.substringAfterLast('$'), targetDigest.toHexString()))
Pair(target.name, TargetHash(
target.javaClass.name.substringAfterLast('$'),
targetDigest.overallDigest.toHexString(),
targetDigest.directDigest.toHexString(),
targetDigest.deps,
))
}
.filter { targetEntry: Pair<String, TargetHash>? -> targetEntry != null }
.collect(
Expand Down
20 changes: 10 additions & 10 deletions cli/src/main/kotlin/com/bazel_diff/hash/RuleHasher.kt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import org.koin.core.component.inject
import java.util.concurrent.ConcurrentMap
import java.nio.file.Path

class RuleHasher(private val useCquery: Boolean, private val fineGrainedHashExternalRepos: Set<String>) : KoinComponent {
class RuleHasher(private val useCquery: Boolean, private val trackDepLabels: Boolean, private val fineGrainedHashExternalRepos: Set<String>) : KoinComponent {
private val logger: Logger by inject()
private val sourceFileHasher: SourceFileHasher by inject()

Expand All @@ -28,31 +28,31 @@ class RuleHasher(private val useCquery: Boolean, private val fineGrainedHashExte
fun digest(
rule: BazelRule,
allRulesMap: Map<String, BazelRule>,
ruleHashes: ConcurrentMap<String, ByteArray>,
ruleHashes: ConcurrentMap<String, TargetDigest>,
sourceDigests: ConcurrentMap<String, ByteArray>,
seedHash: ByteArray?,
depPath: LinkedHashSet<String>?,
ignoredAttrs: Set<String>,
modifiedFilepaths: Set<Path>
): ByteArray {
): TargetDigest {
val depPathClone = if (depPath != null) LinkedHashSet(depPath) else LinkedHashSet()
if (depPathClone.contains(rule.name)) {
throw raiseCircularDependency(depPathClone, rule.name)
}
depPathClone.add(rule.name)
ruleHashes[rule.name]?.let { return it }

val finalHashValue = sha256 {
safePutBytes(rule.digest(ignoredAttrs))
safePutBytes(seedHash)
val finalHashValue = targetSha256(trackDepLabels) {
putDirectBytes(rule.digest(ignoredAttrs))
putDirectBytes(seedHash)

for (ruleInput in rule.ruleInputList(useCquery, fineGrainedHashExternalRepos)) {
safePutBytes(ruleInput.toByteArray())
putDirectBytes(ruleInput.toByteArray())

val inputRule = allRulesMap[ruleInput]
when {
inputRule == null && sourceDigests.containsKey(ruleInput) -> {
safePutBytes(sourceDigests[ruleInput])
putDirectBytes(sourceDigests[ruleInput])
}

inputRule?.name != null && inputRule.name != rule.name -> {
Expand All @@ -66,7 +66,7 @@ class RuleHasher(private val useCquery: Boolean, private val fineGrainedHashExte
ignoredAttrs,
modifiedFilepaths
)
safePutBytes(ruleInputHash)
putTransitiveBytes(ruleInput, ruleInputHash.overallDigest)
}

else -> {
Expand All @@ -75,7 +75,7 @@ class RuleHasher(private val useCquery: Boolean, private val fineGrainedHashExte
heuristicDigest != null -> {
logger.i { "Source file $ruleInput picked up as an input for rule ${rule.name}" }
sourceDigests[ruleInput] = heuristicDigest
safePutBytes(heuristicDigest)
putDirectBytes(heuristicDigest)
}

else -> logger.w { "Unable to calculate digest for input $ruleInput for rule ${rule.name}" }
Expand Down
13 changes: 9 additions & 4 deletions cli/src/main/kotlin/com/bazel_diff/hash/SourceFileHasher.kt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,12 @@ import org.koin.core.qualifier.named
import java.nio.file.Path
import java.nio.file.Paths

class SourceFileHasher : KoinComponent {
interface SourceFileHasher {
fun digest(sourceFileTarget: BazelSourceFileTarget, modifiedFilepaths: Set<Path> = emptySet()): ByteArray
fun softDigest(sourceFileTarget: BazelSourceFileTarget, modifiedFilepaths: Set<Path> = emptySet()): ByteArray?
}

class SourceFileHasherImpl : KoinComponent, SourceFileHasher {
private val workingDirectory: Path
private val logger: Logger
private val relativeFilenameToContentHash: Map<String, String>?
Expand Down Expand Up @@ -38,9 +43,9 @@ class SourceFileHasher : KoinComponent {
this.externalRepoResolver = externalRepoResolver
}

fun digest(
override fun digest(
sourceFileTarget: BazelSourceFileTarget,
modifiedFilepaths: Set<Path> = emptySet()
modifiedFilepaths: Set<Path>
): ByteArray {
return sha256 {
val name = sourceFileTarget.name
Expand Down Expand Up @@ -89,7 +94,7 @@ class SourceFileHasher : KoinComponent {
}
}

fun softDigest(sourceFileTarget: BazelSourceFileTarget, modifiedFilepaths: Set<Path> = emptySet()): ByteArray? {
override fun softDigest(sourceFileTarget: BazelSourceFileTarget, modifiedFilepaths: Set<Path>): ByteArray? {
val name = sourceFileTarget.name
if (!name.startsWith("//")) return null

Expand Down
56 changes: 56 additions & 0 deletions cli/src/main/kotlin/com/bazel_diff/hash/TargetDigest.kt
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
package com.bazel_diff.hash
// import com.google.common.hash.Hasher
import com.google.common.hash.Hashing

data class TargetDigest(
val overallDigest: ByteArray,
val directDigest: ByteArray,
val deps: List<String>? = null,
) {
fun clone(newDeps: List<String>? = null): TargetDigest {
var newDeps = newDeps
if (newDeps == null) {
newDeps = deps
}
return TargetDigest(overallDigest.clone(), directDigest.clone(), newDeps)
}
}

fun targetSha256(trackDepLabels: Boolean, block: TargetDigestBuilder.() -> Unit): TargetDigest {
val hasher = TargetDigestBuilder(trackDepLabels)
hasher.apply(block)
return hasher.finish()
}

class TargetDigestBuilder(trackDepLabels: Boolean) {

private val overallHasher = Hashing.sha256().newHasher()
private val directHasher = Hashing.sha256().newHasher()
private val deps: MutableList<String>? = if (trackDepLabels) mutableListOf() else null

fun putDirectBytes(block: ByteArray?) {
block?.let { directHasher.putBytes(it) }
}

fun putBytes(block: ByteArray?) {
block?.let { overallHasher.putBytes(it) }
}

fun putTransitiveBytes(dep: String, block: ByteArray?) {
block?.let { overallHasher.putBytes(it) }
if (deps != null) {
deps.add(dep)
}
}

fun finish(): TargetDigest {
val directHash = directHasher.hash().asBytes().clone()
overallHasher.putBytes(directHash)

return TargetDigest(
overallHasher.hash().asBytes().clone(),
directHash,
deps,
)
}
}
Loading