Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Gazelle plugin supports multiple maven install rules #197

Open
kriscfoster opened this issue Aug 27, 2023 · 10 comments
Open

FR: Gazelle plugin supports multiple maven install rules #197

kriscfoster opened this issue Aug 27, 2023 · 10 comments

Comments

@kriscfoster
Copy link

We split our maven dependencies into multiple maven install rules to make adding/upgrading etc deps easier. It looks like the gazelle plugin here only supports a single maven install rule. How do people feel about making it possible to support multiple maven install rules?

@illicitonion
Copy link
Collaborator

This seems pretty reasonable to support - I imagine we'd do this on a sub-tree level, so you could specify that src1/... uses one maven_install, and src2/... uses another. I don't think we'd want to support it more granularly than that (e.g. per target, or per maven package).

Currently, the maven resolver is a global property of a Configuration:

if jc.lang.mavenResolver == nil {
resolver, err := maven.NewResolver(
cfg.MavenInstallFile(),
jc.lang.logger,
)
if err != nil {
jc.lang.logger.Fatal().Err(err).Msg("error creating Maven resolver")
}
jc.lang.mavenResolver = resolver
}

To change this, I think we'd want to make it instead be a property of a Config:

type Config struct {
parent *Config
extensionEnabled bool
isModuleRoot bool
generateProto bool
mavenInstallFile string
moduleGranularity string
repoRoot string
testMode string
customTestFileSuffixes *[]string
excludedArtifacts map[string]struct{}
annotationToAttribute map[string]map[string]bzl.Expr
mavenRepositoryName string
}

We'd want to make it so that each config inherits its parent resolver (if NewChild is called:

// NewChild creates a new child Config. It inherits desired values from the
// current Config and sets itself as the parent to the child.
func (c *Config) NewChild() *Config {
), but that if a java_maven_install_file and/or java_maven_repository_name directive is encountered (which... We may want to strictly require that both are set), we'll replace it with a new one here:
cfg.SetMavenInstallFile(d.Value)
and
case javaconfig.JavaMavenRepositoryName:
cfg.SetMavenRepositoryName(d.Value)

I'd be happy to review a PR adding this - there's an example integration test in https://github.com/bazel-contrib/rules_jvm/tree/main/java/gazelle/testdata/maven that could be cribbed from to add a test for this functionality.

@kriscfoster
Copy link
Author

kriscfoster commented Aug 28, 2023

Thank you for the comment & pointers @illicitonion,

It is getting me thinking a little bit more about what this would look like.

Our usage is actually a little more complex/dynamic than different packages using different maven install rules. We bring in ~1000 external jars so we divide them logically into their own maven install rules (~5 -> ~70 jars in each) to minimise merge conflicts etc of the pinned JSON files. We have some tooling on-top of this to maintain single version of transitives etc but in the end it is just multiple maven_install rules. For example, we might have an @org_eclipse namespace for all org.eclipse jars & an @org_apache namespace for all org.apache jars. That means a single java_library target could also bring in jars from ~10 different maven install rules.

@illicitonion
Copy link
Collaborator

Aha - yeah, unfortunately that kind of support is probably not great to add - it'd be great to work with the rules_jvm_external maintainers to work out how to better support your use-cases - ideally rules_jvm_external would be ergonomic enough that you didn't feel the need to manually do that sharding... /cc @shs96c

@kriscfoster
Copy link
Author

rules_jvm_external actually document having multiple maven_install.json files (https://github.com/bazelbuild/rules_jvm_external#multiple-maven_installjson-files) so it is probably something a lot of larger repos do. I'll take a look at the code you linked above anyway & think about it a little bit more.

@illicitonion
Copy link
Collaborator

rules_jvm_external actually document having multiple maven_install.json files (https://github.com/bazelbuild/rules_jvm_external#multiple-maven_installjson-files) so it is probably something a lot of larger repos do.

I think there's a strong assumption that they're used independently, though :)

@shs96c
Copy link
Collaborator

shs96c commented Aug 30, 2023

The typical usage pattern is that most repos declare one main maven_install including all their dependencies, and break out separate maven_install declarations for particularly troublesome libraries (some of the Apache Spark stuff springs to mind) which are used in tightly controlled areas of the repo.

The approach of stitching together multiple different maven_install instances as @kriscfoster describes is very unusual (first I've heard of it), though it may be noting that the new lock file format in recent rules_jvm_external releases was specifically designed to reduce conflicts, so it may be worth investigating consolidating them back into a single default maven_install again.

@kriscfoster
Copy link
Author

The typical usage pattern is that most repos declare one main maven_install including all their dependencies, and break out separate maven_install declarations for particularly troublesome libraries (some of the Apache Spark stuff springs to mind) which are used in tightly controlled areas of the repo.

The approach of stitching together multiple different maven_install instances as @kriscfoster describes is very unusual (first I've heard of it), though it may be noting that the new lock file format in recent rules_jvm_external releases was specifically designed to reduce conflicts, so it may be worth investigating consolidating them back into a single default maven_install again.

Thank you @shs96c, we didn't know about the new lock-file format. It may help us a little bit in this case!

@tharakadesilva
Copy link
Contributor

@illicitonion Let me add our use case here as well.

We have two separate rules_jvm_external trees for SpringBoot 2 and SpringBoot 3. We didn't have the capacity to do a mass migration, so we allowed service owners to upgrade at their own pace. To make things more complicated, we also have some services using DropWizard 3 and they have their own tree coz their transitives are different from SB2 and SB3, but for the sake of presenting our use case (and for my sanity), let's just consider the SB2 and SB3 trees.

Some of the core libraries that we have written are shared between the SB2 and SB3 and the major difference between them is that the former uses the javax namespace while the latter uses jakarta namespace. The other major differences are the transitive dependencies. So, instead of forking the source code, what we did was create a Bazel rule that replaces javax imports to jakarta. So, a sample BUILD file would look like this:

java_library(
    name = "my-core-lib",
    deps = [
        "@maven_sb2//:...",
        ...
    ],
)

java_library_jakarta(
    name = "my-core-lib-jakarta",
    deps = [
        "@maven_sb3//:...",
        ...
    ],
)

Our current situation should go away as everything moves to SB3, but this could potentially be something that happens with another migration in the future. Having said that, the forking would have made this simpler, but forking a significant amount of libraries is also not trivial and is not easy to maintain during the transition period and this strategy helped us get going faster.

Something we don't do, like in @kriscfoster's case, is mix trees. We don't have the technology to deal with managing transitive across trees and I don't think that's something we'll explore either (although we do have the same conflict issues that they mentioned and I'll talk a bit about that later).

I imagine we'd do this on a sub-tree level, so you could specify that src1/... uses one maven_install, and src2/... uses another. I don't think we'd want to support it more granularly than that (e.g. per target, or per maven package).

We would like to have at least per target granularity or even be able to auto-detect which tree to use per target based on the dependencies it uses. Compiling the same source set with different deps + transitives seems like a potentially common use-case especially during transition periods.

Would love to hear your thoughts on this! I am open to hearing some process recommendations too as we are still at the early stages of managing a monorepo.


though it may be noting that the new lock file format in recent rules_jvm_external releases was specifically designed to reduce conflicts, so it may be worth investigating consolidating them back into a single default maven_install again.

@shs96c We are using the new lock format (and also trying out the new BOM resolver to see if things get easier), but as it is, it doesn't help a whole lot as the __INPUT_ARTIFACTS_HASH and the __RESOLVED_ARTIFACTS_HASH ALWAYS causes conflicts... We have Renovate updating the versions for us, but our dep trees are not comprehensive (at least yet), so it is common for developers to need to add new libraries / manually do upgrades when Renovate fails.

We have tried several things like moving Renovate MRs to run during off peak hours, but we still end up getting at least a couple of conflicts a day and is a major point of frustration for our developers.

Since it is a generated file, we end up choosing either version on git and re-run the repinning process, but that takes a significant amount of time as we have a large number of dependencies. And then it's back to the rat race to see who can get their changes in first and then it is back to conflicts for the next person...

We read up on Alex Eagle's Easier merges on lockfiles as well, but that doesn't really solve the problem that we are having. It is an easier way to solve the merge conflicts, but we still have to run the repinning process again.

Maybe we are doing something wrong here / not using things as intended and there might be some recommendations from your end that might help us out and would love to hear your thoughts!

@illicitonion
Copy link
Collaborator

I imagine we'd do this on a sub-tree level, so you could specify that src1/... uses one maven_install, and src2/... uses another. I don't think we'd want to support it more granularly than that (e.g. per target, or per maven package).

We would like to have at least per target granularity or even be able to auto-detect which tree to use per target based on the dependencies it uses. Compiling the same source set with different deps + transitives seems like a potentially common use-case especially during transition periods.

I don't think Gazelle really has a way of expressing "I want to constrain a particular target in a particular way". Directives are package-level concepts, so if we wanted to introduce this, it would a novel configuration concept for gazelle.

We could express a package-level "generate two versions of all targets in this package" option, or we could have a directive which contained a list of files or target names within a package that we wanted to generate multiple versions of, or perhaps a mapping of "depended on package name to universe", i.e. something like:

# gazelle:java_maven_repository_for_package javax.blah maven_sb2
# gazelle:java_maven_repository_for_package jakarta.blah maven_sb3

I guess: Can you write an strawman example of the BUILD files that you wish gazelle would generate for you, and we can work out how to model that? Some things it'd be good to include in the example:

  1. Two maven_install calls
  2. Some targets which are configured for both universes
  3. Some targets that depend on them which may need to selectively depend on one or the other
  4. Anything else you think is interesting/novel/complicated

@MorganRoff-UnlikelyAI
Copy link

MorganRoff-UnlikelyAI commented Aug 22, 2024

I'll have a go at providing an example BUILD file because we are encountering the same issue, in our case it's between Spark and non-Spark code that want to use the same shared libraries.

The non-Spark code often uses newer versions of some libraries than are supported in the Spark runtime we're using. So we have two Maven repos: @maven, and @maven_spark, which each include most of the same dependencies, except the changes needed to match the Spark runtime.

We would love to be able to optionally (within a certain package or directory tree) generate a BUILD file that just has two almost identical java_libraries, one for each Maven repository. Each would also need to use the correct Spark or non-Spark internal dependencies as well (shown with "other-shared-code"). This would ideally look something like this (though I suspect the Gazelle directives might need more information):

# gazelle:java_maven_repository maven
java_library(
  name = "shared",
  srcs = glob(["*.java"]),
  deps = [
    "//other-shared-code",
    "@maven//dep_1",
    "@maven//dep_2",
  ],
)

# gazelle:java_maven_repository maven_spark
java_library(
  name = "shared-spark",
  srcs = glob(["*.java"]),
  deps = [
    "//other-shared-code:other-shared-code-spark",
    "@maven_spark//dep_1",
    "@maven_spark//dep_2",
  ],
)

There may be situations where the different dependency versions actually require code changes too, which would need us to do some more manual work to split out the code. However, most of the time the code works with either dependency version, so this would save a lot of hassle!

As an aside, I wish we could just "exclude" the conflicting deps at a later stage, when we import "//shared" into the Spark-specific code. This is what we used to do with Maven, and meant we didn't need to change anything about the shared package at all. However, this doesn't seem possible with Bazel. Please let me know if others have found a way though!

Compiling against the correct deps, like in the BUILD file above, brings the advantage of compile time checking against the different dependency versions, but it is currently tough to make this work with Gazelle at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants