Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a transition to resolve the correct versions of crane and yq #590

Merged
merged 2 commits into from
May 30, 2024

Conversation

EdSchouten
Copy link
Contributor

@EdSchouten EdSchouten commented May 24, 2024

Bazel doesn't provide a very elegant way to specify that a toolchain should be resolved in such a way that it can be executed on a target platform:

bazelbuild/bazel#19645

To work around this, we can pass in the toolchains through ordinary labels and apply an outgoing edge transition to set --extra_execution_platforms equal to --platforms. This causes the toolchains to be resolved the way we want.

Though I think using transitions for this is pretty slick, it's a shame that this has to be done by patching up command line options.

@thesayyn
Copy link
Collaborator

I'm torn by this, though i had the same problem myself a lot of times before. I have been told that the exec_groups solve this problem, any way we could use that here?

@EdSchouten
Copy link
Contributor Author

I have been told that the exec_groups solve this problem, any way we could use that here?

I don’t think that’s the case. Those can only be used to control where/how stuff gets executed. But in our case we’re interested in building an action that includes binaries for the right target.

@anguslees
Copy link
Contributor

I have been told that the exec_groups solve this problem, any way we could use that here?

Unfortunately, exec_groups don't support transitions afaics. exec_group(exec_compatible_with=HOST_CONSTRAINTS) is my current local-hack work-around for bazel-run commands like oci_push (which we can assume are going to run on host). Ed's suggestion is more correct in general - and more correct for host constraints beyond simple os/cpu - and I will be moving my own patches to that.

oci/private/push.bzl Outdated Show resolved Hide resolved
oci/private/push.bzl Outdated Show resolved Hide resolved
@EdSchouten EdSchouten force-pushed the eschouten/20240523-transition branch from 1a587ec to 56519c4 Compare May 27, 2024 07:11
@EdSchouten
Copy link
Contributor Author

PTAL!

@EdSchouten EdSchouten force-pushed the eschouten/20240523-transition branch from 56519c4 to d8eed58 Compare May 28, 2024 19:52
@EdSchouten
Copy link
Contributor Author

Whoops! Forgot to add _allowlist_function_transition for Bazel 6 support. PTAL!

@EdSchouten
Copy link
Contributor Author

Hmmm... In the bzlmod case we're unable to access @yq_toolchains. What would be the most logical way to address this? Request that @aspect_bazel_lib provides an alias() for the resolved toolchain?

@thesayyn
Copy link
Collaborator

Something along the lines of

rules_oci/MODULE.bazel

Lines 27 to 30 in 6620d46

bazel_lib = use_extension("@aspect_bazel_lib//lib:extensions.bzl", "toolchains")
bazel_lib.jq()
bazel_lib.tar()
use_repo(bazel_lib, "bsd_tar_toolchains", "jq_toolchains")

Bazel doesn't provide a very elegant way to specify that a toolchain
should be resolved in such a way that it can be executed on a *target*
platform:

bazelbuild/bazel#19645

To work around this, we can pass in the toolchains through ordinary
labels and apply an outgoing edge transition to set
--extra_execution_platforms equal to --platforms. This causes the
toolchains to be resolved the way we want.

Though I think using transitions for this is pretty slick, it's a shame
that this has to be done by patching up command line options.
@EdSchouten EdSchouten force-pushed the eschouten/20240523-transition branch from d8eed58 to 11cf945 Compare May 28, 2024 20:28
@thesayyn thesayyn enabled auto-merge (squash) May 29, 2024 19:29
Copy link
Collaborator

@alexeagle alexeagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fancy!

@thesayyn thesayyn merged commit 3d43cb1 into bazel-contrib:main May 30, 2024
20 checks passed
@BoleynSu
Copy link
Contributor

My understanding is that this should be fixed in upstream. See bazel-contrib/bazel-lib#747 and bazelbuild/bazel#19645 (comment)

@alexeagle
Copy link
Collaborator

This doesn't cherry-pick onto 2.x so I think we'll need to take that upstream fix for our 2.0.0 release anyway. thanks for reminding us @BoleynSu

@BoleynSu
Copy link
Contributor

BoleynSu commented Jul 1, 2024

bazel-contrib/bazel-lib#875 is for fixing upstream issue in bazel-lib but it is only for jq and lacks docs/tests.

For rules, if jq needs to be run during the actions, we should use jq_toolchain_type. If jq will be run when running the output of the rule, jq_runtime_toolchain_type should be used.

@EdSchouten
Copy link
Contributor Author

Again, I don’t see a reason why we should declare all toolchains twice, purely to distinguish in what context they end up getting used.

@BoleynSu
Copy link
Contributor

BoleynSu commented Jul 1, 2024

Okay. Let me point out at least one drawback I can think of for your solution. If the tool, e.g., crane is not prebuilt but built by compiling source code. With the transition approache, crane may not build. For example, target=win, exec=linux. With the transition, we will have to build crane with exec=win which won't work if only exec=linux is supported.

@EdSchouten
Copy link
Contributor Author

EdSchouten commented Jul 1, 2024

Are you sure that's correct...? Given your example, we're only forcing that a toolchain is resolved with both target & exec set to Windows. That doesn't necessarily imply that in order to build dependencies of that toolchain we also apply such a transition. The transition should influence the way the toolchain is resolved. Not the way the toolchain is built.

@BoleynSu
Copy link
Contributor

BoleynSu commented Jul 1, 2024

We are forced to build crane with cfg=win and exec=win. It does not matter what crane's dependency is. A toolchain rule is built in the same way as a normal rule.

For example,

cc_toolchain(name="llvm")
toolchain(name="cc_linux", toolchain="llvm", exec_platform="linux")
cc_binary(name="crane")
crane_toolchain(name="tool", bin="crane") # where crane is of cfg=exec
toolchain(name="crane_linux", toolchain="tool", exec_platform="linux")
toolchain(name="crane_win", toolchain="tool", exec_platform="win")

If you use transition extra_execution_platforms=win to use crane_toolchain_type, you will need to build ":tool" for target="value of --platforms" with extra_execution_platforms=win. I.e, you need to build ":crane" with target="selected execution_platforms for building :tool, which can only be win" (because ":crane" is of cfg=exec in ":tool") and exec="selected execution_platforms for building :crane, which can only be win tool". However, there is no cc_toolchain for that is exec compatible with win.

Anyway, if you doubt it, you can try it locally pretty easily.

@EdSchouten
Copy link
Contributor Author

EdSchouten commented Jul 1, 2024

If you use transition extra_execution_platforms=win to use crane_toolchain_type, you will need to build ":tool" for target="value of --platforms" with extra_execution_platforms=win.

Just to check: is this something you are speculating, or is that something that is actually true right now? Because to me it's not a given that this is the case. Changing extra_execution_platforms and platforms as part of a transition should influence the toolchain selection process. It should not necessarily influence the way toolchains are built. The latter would be highly undesirable, as it likely also means that if you transition "-c" to "dbg", Bazel would also try to rebuild any toolchains used to build those targets with "-c dbg". Those two things are completely distinct.

Furthermore, if that does turn out the be the case, we can patch up the transition to prepend the desired platforms instead of replacing them. Or file an issue against Bazel that they fix up the way transitions work.

@BoleynSu
Copy link
Contributor

BoleynSu commented Jul 1, 2024

If we prepend,

target="selected execution_platforms for building :tool, which can only be win" (because ":crane" is of cfg=exec in ":tool")
may not be the one we want.

For example,

cc_toolchain(name="llvm")
toolchain(name="cc_linux", toolchain="llvm", exec_platform="linux")
cc_binary(name="crane")
crane_toolchain(name="tool", bin="crane") # where crane is of cfg=exec
toolchain(name="crane_linux", toolchain="tool", exec_platform="linux")
# remove this!!!!! toolchain(name="crane_win", toolchain="tool", exec_platform="win") 

I have not really tried the theory to be honest but it must be true given how transition works. A repro it pretty easy to make.

@EdSchouten
Copy link
Contributor Author

A repro it pretty easy to make.

Feel free to post one.

@BoleynSu
Copy link
Contributor

BoleynSu commented Jul 1, 2024

bazel build :image --platforms=//:win --extra_toolchains=//:toolchain  --toolchain_resolution_debug='.*'
# Helper rule for ensuring that the crane and yq toolchains are actually
# resolved for the architecture we are targeting.
def _transition_to_target_impl(settings, attr):
    return {
        # String conversion is needed to prevent a crash with Bazel 6.x.
        "//command_line_option:extra_execution_platforms": [
            str(platform)
            for platform in settings["//command_line_option:platforms"]
        ],
    }

_transition_to_target = transition(
    implementation = _transition_to_target_impl,
    inputs = ["//command_line_option:platforms"],
    outputs = ["//command_line_option:extra_execution_platforms"],
)

BinaryInfo = provider(
    doc = "Provide info for binary",
    fields = {
        "bin": "Target for the binary",
    },
)

def _toolchain_impl(ctx):
    binary_info = BinaryInfo(
        bin = ctx.attr.bin,
    )

    toolchain_info = platform_common.ToolchainInfo(
        binary_info = binary_info,
    )

    return [toolchain_info]

binary_toolchain = rule(
    implementation = _toolchain_impl,
    attrs = {
        "bin": attr.label(
            mandatory = True,
            allow_single_file = True,
            executable = True,
            cfg = "exec",
        ),
    },
)

def _resolved_binary_rule_impl(ctx, toolchain_type, template_variable):
    bin = ctx.toolchains[toolchain_type].binary_info.bin[DefaultInfo]

    out = ctx.actions.declare_file(ctx.attr.name + ".exe")
    ctx.actions.symlink(
        target_file = bin.files_to_run.executable,
        output = out,
        is_executable = True,
    )

    return [
        DefaultInfo(
            executable = out,
            files = bin.files,
            runfiles = bin.default_runfiles,
        ),
        platform_common.TemplateVariableInfo({
            template_variable: out.path,
        } if template_variable != None else {}),
    ]

def resolved_binary_rule(*, toolchain_type, template_variable = None):
    return rule(
        implementation = lambda ctx: _resolved_binary_rule_impl(ctx, toolchain_type, template_variable),
        executable = True,
        toolchains = [toolchain_type],
    )

resolved_binary = resolved_binary_rule(toolchain_type = "//:toolchain_type")

def _image_impl(ctx):
    return DefaultInfo(
        files = ctx.attr.bin[0][DefaultInfo].files,
    )

image = rule(
    _image_impl,
    attrs = {
        "_allowlist_function_transition": attr.label(
            default = "@bazel_tools//tools/allowlists/function_transition_allowlist",
        ),
        "bin": attr.label(
            cfg = _transition_to_target,
            default = "//:bin",
        ),
    },
)
load("@io_bazel_rules_go//go:def.bzl", "go_binary")
load("//:defs.bzl", "binary_toolchain", "image", "resolved_binary")

genrule(
    name = "main_go",
    outs = ["main.go"],
    cmd = 'touch "$@"',
)

go_binary(
    name = "go_bin",
    srcs = ["main.go"],
)

toolchain_type(
    name = "toolchain_type",
)

binary_toolchain(
    name = "bin_toolcahin",
    bin = "go_bin",
)

toolchain(
    name = "toolchain",
    toolchain = "bin_toolcahin",
    toolchain_type = ":toolchain_type",
)

resolved_binary(name = "bin")

image(
    name = "image",
    bin = ":bin",
)

platform(
    name = "win",
    constraint_values = [
        "@platforms//os:windows",
        "@platforms//cpu:x86_64",
    ],
    visibility = ["//visibility:public"],
)

Please try the above example.

@BoleynSu
Copy link
Contributor

BoleynSu commented Jul 1, 2024

The issue with prepending will also happen with register_execution_platforms in WORKSPACE/MODULE.bazel. Again I did not try it but it should be how transition works. So basically, the current solution is broken I would say.

@EdSchouten
Copy link
Contributor Author

In my case it failed because it first failed as follows:

ERROR: /Users/ed/foo/BUILD.bazel:4:8: Executing genrule //:main_go [for tool] failed: (Exit 1): bash.exe failed: error executing Genrule command (from target //:main_go) c:/tools/msys64/usr/bin/bash.exe -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; touch "bazel-out/darwin_arm64-opt-exec-ST-6b9de8e24f96/bin/main.go"'

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
src/main/tools/process-wrapper-legacy.cc:80: "execvp(c:/tools/msys64/usr/bin/bash.exe, ...)": No such file or directory
Target //:image failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 6.134s, Critical Path: 0.11s
INFO: 4 processes: 4 internal.
ERROR: Build did NOT complete successfully

That's because it tried to run the genrule() on Windows. Adding an exec_compatible_with = ["@platforms//os:macos"] to that sorted that out. Then it failed like this:

ERROR: /private/var/tmp/_bazel_ed/0ba728bada58c95ae54c22d2bde4e2e8/external/rules_go~~go_sdk~rules_go__download_0_windows_amd64/BUILD.bazel:62:15: GoToolchainBinaryBuild external/rules_go~~go_sdk~rules_go__download_0_windows_amd64/builder.exe [for tool] failed: (Exit 2): builder.exe.bat failed: error executing GoToolchainBinaryBuild command (from target @@rules_go~~go_sdk~rules_go__download_0_windows_amd64//:builder) bazel-out/darwin_arm64-opt-exec-ST-6b9de8e24f96/bin/external/rules_go~~go_sdk~rules_go__download_0_windows_amd64/builder.exe.bat

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
bazel-out/darwin_arm64-opt-exec-ST-6b9de8e24f96/bin/external/rules_go~~go_sdk~rules_go__download_0_windows_amd64/builder.exe.bat: line 1: @echo: command not found
bazel-out/darwin_arm64-opt-exec-ST-6b9de8e24f96/bin/external/rules_go~~go_sdk~rules_go__download_0_windows_amd64/builder.exe.bat: line 8: unexpected EOF while looking for matching `"'
bazel-out/darwin_arm64-opt-exec-ST-6b9de8e24f96/bin/external/rules_go~~go_sdk~rules_go__download_0_windows_amd64/builder.exe.bat: line 10: syntax error: unexpected end of file
Target //:image failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 0.591s, Critical Path: 0.38s
INFO: 2 processes: 2 internal.
ERROR: Build did NOT complete successfully

That I could solve by constraining the registration of Go toolchains, by not registering ones where host_goos == "windows". So in that sense you can get it to work, even with a transition like that in place.

That said, I agree that this is suboptimal. But it's still not as awful as 'runtime toolchains'. We should simply work with the Bazel developers to get the right set of features added to Bazel to do resolution of toolchains the way we want.

@BoleynSu
Copy link
Contributor

BoleynSu commented Jul 1, 2024

You can replace the genrule with a real file to see other failures. I was using genrule simply because I do not want to add a go file to the example. Or you can add --nobuild to see that go's win toolchain is used for building :go_bin.

I do not think we need any change in the Bazel core. For our specific use-case and the one posted in bazelbuild/bazel#19645, basically, all we need is X_toolchain_type which provides a binary with cfg="target" instead of "exec" and a resolved_binary where user uses it as deps with cfg="exec" or tools in genrule. I.e it does not really need to be two toolchian types for one binary.

It is similar to that we have a alias(name="jq", actual=select(linux: linux_jq; win: win_jq)). Only difference is that the select is not hard-coded but resolved by toolchain resolution.

@anguslees
Copy link
Contributor

How do we do this with one toolchain type? How will resolution know if we need it for the exec or target platform (when those are not the same)?

I agree the exec platform transition propagates through the toolchain transition, and this breaks later actions. Sad face.

I think we're back to "no good solution" here. Duplicate toolchains works but isn't "good" (duplicates work for everyone and confuses target vs exec in definition). Exec platform transition breaks deeper builds, and would require further work I'm bazel to prevent that (not clear exactly what, since we've lost information in the transition).

@BoleynSu
Copy link
Contributor

BoleynSu commented Aug 7, 2024

@anguslees See my comments here bazel-contrib/bazel-lib#875 (comment)

@manuelnaranjo
Copy link

@EdSchouten @thesayyn @alexeagle folks be mindful of #706 this is breaking cross cpu compilation, not sure if anyone else is looking into a fix, looks like when crosscompiling from linux/x64 to linux/arm64 it also happens, so it's trivial to add a unit test to validate that things are broken, and then fix from there, I may have some capacity this week, but I can't promise much, but if someone else has capacity feel free to tackle it

@thesayyn
Copy link
Collaborator

Okay, i think this workaround has created problems as much as it solved, i am not sure what to do here.

@anguslees
Copy link
Contributor

anguslees commented Jan 22, 2025

Okay, i think this workaround has created problems as much as it solved, i am not sure what to do here.

The only current solution without problems (that I'm aware of) are:

  1. Don't use toolchains here, and instead use a regular cfg=target label that (for example) points at a select() block to choose amongst pre-built binaries. or
  2. Introduce new crane_target and yq_target toolchains, that are based around target_compatible_with rather than exec_compatible_with constraints.

Edit: Oh, or 3: avoid the issue by replacing every shell script in rules_oci with go_binaries that just embed the equivalent crane/yq functionality as libraries, and add a dependency on a go toolchain 😛. I've been doing this in my $company repo, and my life is better for it.

@BoleynSu
Copy link
Contributor

With the toolchain approach, one can choose to use prebuilt binary or not. For some users, they will prefer to build as many as dependencies from source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants