Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo_build_script: Some CARGO_* environment variables are not set #2677

Open
ParkMyCar opened this issue Jun 3, 2024 · 17 comments · May be fixed by #3260
Open

cargo_build_script: Some CARGO_* environment variables are not set #2677

ParkMyCar opened this issue Jun 3, 2024 · 17 comments · May be fixed by #3260
Labels

Comments

@ParkMyCar
Copy link
Contributor

There are a few CARGO_* environment variables that are currently set by rustc but don't appear to get set by the build script wrapper. The ones I know of specifically are:

  • CARGO_PKG_AUTHORS
  • CARGO_PKG_DESCRIPTION
  • CARGO_PKG_HOMEPAGE
  • CARGO_PKG_LICENSE
  • CARGO_PKG_REPOSITORY
  • RUSTDOC
ParkMyCar added a commit to MaterializeInc/materialize that referenced this issue Jun 3, 2024
The `launchdarkly-server-sdk` crate depends on
[`built`](https://github.com/lukaslueg/built) to get some info about the
current crate, which in turn requires a few environment variables to be
set. Ideally the `cargo_build_script` rules would already be setting
these, I made an issue for that:
bazelbuild/rules_rust#2677

Note: we also need to upgrade the version of `built`, which is done in
this PR MaterializeInc/rust-server-sdk#6

### Motivation

Build `launchdarkly-server-sdk` with Bazel

### Checklist

- [ ] This PR has adequate test coverage / QA involvement has been duly
considered. ([trigger-ci for additional test/nightly
runs](https://trigger-ci.dev.materialize.com/))
- [ ] This PR has an associated up-to-date [design
doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md),
is a design doc
([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)),
or is sufficiently small to not require a design.
  <!-- Reference the design in the description. -->
- [ ] If this PR evolves [an existing `$T ⇔ Proto$T`
mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md)
(possibly in a backwards-incompatible way), then it is tagged with a
`T-proto` label.
- [ ] If this PR will require changes to cloud orchestration or tests,
there is a companion cloud PR to account for those changes that is
tagged with the release-blocker label
([example](MaterializeInc/cloud#5021)).
<!-- Ask in #team-cloud on Slack if you need help preparing the cloud
PR. -->
- [ ] This PR includes the following [user-facing behavior
changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note):
- <!-- Add release notes here or explicitly state that there are no
user-facing behavior changes. -->
@bradzacher
Copy link

is there a good, generic workaround for this?
this is particularly problematic for binaries that use clap because it relies upon the env vars to fill in information for the resulting binary's help page.

you can even get unexpected results from this. In our local build we found that one of the doc comments on a struct in the command struct heriarchy was chosen due to the missing CARGO_PKG_DESCRIPTION - leading to an internal API doc comment being the binary's help description.

@illicitonion
Copy link
Collaborator

If you have a Cargo.toml file which has this data in, it would be ~trivial to write a genrule (or custom rule) which reads the Cargo.toml file, generate a file with the env vars to set, and then pass that as rustc_env_files to your rust target - something like:

extract_cargo_env_vars(
    name = "cargo_env",
    src = "Cargo.toml",
)

rust_binary(
    name = "my_bin",
    rustc_env_files = [":cargo_env"],
)

I'd be happy to review an implementation of extract_cargo_env_vars in https://github.com/bazelbuild/rules_rust/tree/main/cargo if someone wanted to write and share one. If this were done, it could also be nice to auto-generate this target in crate_universe-generated BUILD files.

@illicitonion
Copy link
Collaborator

(If you don't have a Cargo.toml file, then just setting these variables manually as rustc_env seems fine - I don't think there are obvious default values we should be supplying here)

@bradzacher
Copy link

I was actually hacking at this. I'm still a bazel noob so I didn't know about genrule till a little after I posted.

From what I can tell cargo provides no command to generate an env file out the box. There is cargo read-meta and cargo metadata. However I believe both of these require complete visibility of the workspace. I.e. I can't just do cargo read-meta $(location :Cargo.toml) - this fails because the workspace Cargo.toml isn't reachable for the action. And adding that isn't enough either as you need all Cargo.tomls from the workspace.

We have that defined in the crate_index but from what I can tell the list of Cargo.tomls isn't exposed from the generated outputs.

My next thought was that I could just parse the Cargo.toml by hand to extract the relevant bits -- and it's doable with grep.. But I hate it.

I'm still new to all this so maybe I'm misunderstanding but it might be more difficult than I thought.
Could whip up a quick rust binary to do it - that would be easy enough!

@illicitonion
Copy link
Collaborator

A quick rust binary would be great! Ideally one which just parses the toml rather than depending on cargo as a crate - if you can put together the rust binary, we can wire it together to run as a rule!

@illicitonion
Copy link
Collaborator

(We'd want to minimise the dependencies of that binary as much as possible, e.g. just taking std::env::args rather than using something like clap, but I would for sure want to use an existing toml parser not roll our own!)

@marvin-hansen
Copy link
Contributor

marvin-hansen commented Jul 30, 2024

I probably take this on since one of my optional dependencies currently fails to compile because CARGO_PKG_REPOSITORY isn't set. The way I understand the issue is that because rules_rust doesn't call cargo to invoke the rust compiler, the environment variables mentioned in the OP aren't set and if a crate, like autometrics, depends on any of these unset variables, it doesn't compile.

Personally, I have a very strong aversion against unnecessary env variables, but I also acknowledge that this needs to be done to make crates depending on those env variables compile with rules_rust. Also, nobody knows how many of these crates relying on environment variables exists meaning there is a high chance that the issue will re-occur over time.

I can definitely write a Cargo.toml parser that only depends on std lib to read the Cargo.toml of the crate, parse those values, and sets the corresponding ENV variables. This should be straightforward and cross compile to any Rust target that supports the std lib. No problem.

However, what I don't know is, how do I call to call such a parser prior to the compilation of a crate? In any case, it's a crate_universe direct dependency so this needs to be handled somewhere in crate_universe.

@illicitonion would you please give some guidance how to integrate a cargo parser in such a way that Cargo.toml gets parsed and its variables exposed before compiling that crate?

@illicitonion
Copy link
Collaborator

@marvin-hansen I have about 90% of this sitting in a branch on my machine - I'll try to get it tidied up and upstream in the next week or so :)

@marvin-hansen
Copy link
Contributor

marvin-hansen commented Jul 30, 2024 via email

@bradzacher
Copy link

Sorry -- I wanted to get to this but project pressures have limited my work time and a newborn has limited my out-of-work time.

We hacked around this for now with this filth:

rust_binary(
  ...
  rustc_env_files = [":generate_rustc_env_file"],
)

genrule(
    name = "generate_rustc_env_file",
    srcs = [":Cargo.toml"],
    outs = ["rustc_env_file"],
    cmd = "echo CARGO_PKG_DESCRIPTION=\"$$(cat $(location :Cargo.toml) | grep -oP 'description\\s*=\\s*\".+\"' | grep -oP '\".+\"' | tr -d '\"')\" > $@",
)

Thank you for picking up my slack!

@illicitonion
Copy link
Collaborator

Congratulations on the newborn @bradzacher!

#2772 adds a new rule, cargo_toml_env_vars, to replace your genrule :)

I'll wire it up in crate_universe when I find my next chunk of time (or will gladly review a PR doing so!)

@marvin-hansen
Copy link
Contributor

marvin-hansen commented Aug 1, 2024

Congratulations @bradzacher

@illicitonion Thank you for the variable_extractor.

I am still mulling over your draft PR. The testdata examples are quite helpful. The way I understand how to use the variable_extractor is that I make a genrule with the variable_extractor to generate a rustc_env_file and then add it to the rust_binary / rust_library declaration as rustc_env_files.

That means, to wire up crate_universe I have to detect if the crate dependency is either empry, stand-alone or a workspace, and then emit the correct genrule to add.

Technically, I can use either the rust_env or the rust_env_files to do so. Because the rust_env_files might be subject to stamping that would require additional wrapping, it might be semsible to use the rust_env by default.

Is that the way this is suppose to work?

Either way. I try to draft something over the weekend and see how that goes.

@marvin-hansen
Copy link
Contributor

marvin-hansen commented Aug 1, 2024

I did some more digging to understand the crate_universe code.

It seems the rustc_env and rustc_env_files are defined as attributes in crate.bzl. The crate type is then constructed in _crate_impl in the crate_universe extension. Apparently, there are two sets of attributes:

  1. module_annotations:
  2. repo_specific_annotations

These two sets of annotations get merged into one at the end of each crate configuration.

It seems to me the the repo_specific_annotations aren't really used at this point because it's initialized as an empty dict, and later, during the merge, there is a case that if repo_specific_annotations is Null/None, just return the other annotations. I think the repo_specific_annotations is the place to store the crate specific ENV variables from the variable_extractor.

There are two cases to keep in mind:

  1. from_cargo
  2. from_specs (direct dependencies)

Conventionally, crate env_variables are invariant, therefore it's sensible to add a pre-processor that iterates through all crates, extracts those env_vars, stuffs them into a dict keyed by crate name, and returns it for look-up mainly to to speed up processing for large projects.

It turned out, there is already a loop in the extension that basically iterates through all repositories and then adds an empty dict for the repo_specific_annotations:

  for repo in repositories:
      _get_or_insert(
          _get_or_insert(repo_specific_annotations, repo, {}),
          crate,
          [],
      ).append(annotation)

IMHO, instead of adding an empty dic, it's probably sensible to generate a gen_file using the extractor and then set the rustc_env_files for the crate. Something like:

    for repo in repositories:
        _get_or_insert(
            repo_annotations = _get_repo_annotations(repo)
            _get_or_insert(repo_specific_annotations, repo, repo_annotations),
            crate,
            [],
        ).append(annotation)

For implementing the _get_repo_annotations function, I am lost here as I honestly don't understand similar functions in the code and don't really get my head around how to generate and run genrule.

I've looked at the official Bazel docs, and I get the basic idea of context, attribute and rules, but man, this ain't not easy in practice.

@illicitonion
Copy link
Collaborator

The way I understand how to use the variable_extractor is that I make a genrule with the variable_extractor to generate a rustc_env_file and then add it to the rust_binary / rust_library declaration as rustc_env_files.

You don't actually need a genrule for this - @rules_rust//cargo:defs.bzl's cargo_toml_env_vars is a rule you can use and add to the rustc_env_files attr.

That means, to wire up crate_universe I have to detect if the crate dependency is either empry, stand-alone or a workspace, and then emit the correct genrule to add.

Technically, I can use either the rust_env or the rust_env_files to do so. Because the rust_env_files might be subject to stamping that would require additional wrapping, it might be semsible to use the rust_env by default.

I was assuming rust_env_files but I suppose you could run the logic from the variable extractor as part of BUILD file generation.

On the annotations stuff: I wasn't imagining using annotations in the MODULE.bazel, I was imagining changing how we generate BUILD.bazel files for each library. The core rendering logic here is this:

for rule in &krate.targets {
if let Some(override_target) = krate.override_targets.get(rule.override_target_key()) {
starlark.push(Starlark::Alias(Alias {
rule: AliasRule::default().rule(),
name: rule.crate_name().to_owned(),
actual: override_target.clone(),
tags: BTreeSet::from(["manual".to_owned()]),
}));
} else {
match rule {
Rule::BuildScript(target) => {
load("@rules_rust//cargo:defs.bzl", "cargo_build_script");
let cargo_build_script =
self.make_cargo_build_script(platforms, krate, target)?;
starlark.push(Starlark::CargoBuildScript(cargo_build_script));
starlark.push(Starlark::Alias(Alias {
rule: AliasRule::default().rule(),
name: target.crate_name.clone(),
actual: Label::from_str("_bs").unwrap(),
tags: BTreeSet::from(["manual".to_owned()]),
}));
}
Rule::ProcMacro(target) => {
load("@rules_rust//rust:defs.bzl", "rust_proc_macro");
let rust_proc_macro =
self.make_rust_proc_macro(platforms, krate, target)?;
starlark.push(Starlark::RustProcMacro(rust_proc_macro));
}
Rule::Library(target) => {
load("@rules_rust//rust:defs.bzl", "rust_library");
let rust_library = self.make_rust_library(platforms, krate, target)?;
starlark.push(Starlark::RustLibrary(rust_library));
}
Rule::Binary(target) => {
load("@rules_rust//rust:defs.bzl", "rust_binary");
let rust_binary = self.make_rust_binary(platforms, krate, target)?;
starlark.push(Starlark::RustBinary(rust_binary));
}
}
}
}
- I imagine we want to make a new Starlark enum value for cargo_toml_env_vars, unconditionally serialise one per crate, and then add a rustc_env_vars attribute to the existing rules which points at the per-crate singleton cargo_toml_env_vars.

@marvin-hansen
Copy link
Contributor

marvin-hansen commented Aug 1, 2024 via email

@marvin-hansen
Copy link
Contributor

Alright, I started picking this up again.

  • I imagine we want to make a new Starlark enum value for cargo_toml_env_vars,

I interpret this as something like this in utils/starlak.rs:

#[derive(Serialize)]
#[serde(untagged)]
pub(crate) enum Starlark {
    Load(Load),
    Package(Package),
    PackageInfo(PackageInfo),
    License(License),
    ExportsFiles(ExportsFiles),
    Filegroup(Filegroup),
    Alias(Alias),
    CargoBuildScript(CargoBuildScript),
    CargoTomlEnvVars(CargoTomlEnvVars),
    #[serde(serialize_with = "serialize::rust_proc_macro")]
    RustProcMacro(RustProcMacro),
    #[serde(serialize_with = "serialize::rust_library")]
    RustLibrary(RustLibrary),
    #[serde(serialize_with = "serialize::rust_binary")]
    RustBinary(RustBinary),
    #[serde(skip_serializing)]
    Verbatim(String),
}

And

#[derive(Serialize)]
#[serde(rename = "cargo_toml_env_vars")]
pub(crate) struct CargoTomlEnvVars {
    pub(crate) cargo_pkg_version: String,
    pub(crate) cargo_pkg_version_major: String,
    pub(crate) cargo_pkg_version_minor: String,
    pub(crate) cargo_pkg_version_patch: String,
    pub(crate) cargo_pkg_version_pre: String,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub(crate) cargo_pkg_authors: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub(crate) cargo_pkg_name: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub(crate) cargo_pkg_description: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub(crate) cargo_pkg_homepage: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub(crate) cargo_pkg_repository: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub(crate) cargo_pkg_license: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub(crate) cargo_pkg_license_file: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub(crate) cargo_pkg_rust_version: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub(crate) cargo_pkg_readme: Option<String>,
}

AFAIK, version number is mandatory in Cargo.toml, hence the string type as opposed to Option

I was imagining changing how we generate BUILD.bazel files for each library. The core rendering logic here is this:

The way I read the core rendering is that, it checks for a local override, and if there is none, it matches the different kind od targets i.e. macro, library, binary.

        for rule in &krate.targets {
            if let Some(override_target) = krate.override_targets.get(rule.override_target_key()) {
                starlark.push(Starlark::Alias(Alias {
                 ...
                }));
            } else {
                match rule {
                    Rule::Library(target) => {
                        load("@rules_rust//rust:defs.bzl", "rust_library");
                        let rust_library = self.make_rust_library(platforms, krate, target)?;
                        starlark.push(Starlark::RustLibrary(rust_library));
                    }
                ....
                }

The make_rust_library method then builds and returns a RustLibrary, defined in the Starlak.rs file

The RustLibrary currently does't have field representing Env vars.

unconditionally serialize one per crate,

How specifically do I serialize the CargoTomlEnvVars?

Conventionally, in Bazel I would do exactly what you have suggested early on:

extract_cargo_env_vars(
    name = "cargo_env",
    src = "Cargo.toml",
)

rust_binary(
    name = "my_bin",
    rustc_env_files = [":cargo_env"],
)

But since this rendering is supposed to generate the BUILD.bazel file, I somehow need
to generate the target to put in the the rustc_env_files.

How do I do that?

Sorry for my limited understanding, I am still trying to get my head around how this rule set works.

@marvin-hansen
Copy link
Contributor

I think the most practical workaround to the issue of missing cargo environment variables is to set them them as crate annotations. At least that solved the problem for me.

In your MODULE.bazel file, just add the missing variables as rustc_env annotation, for example:

crate = use_extension("@rules_rust//crate_universe:extension.bzl", "crate")

crate.spec(
    features = ["prometheus-exporter"],
    package = "autometrics",
    version = "2.0.0",
)
crate.annotation(
    crate = "autometrics",
    rustc_env = {
        "CARGO_PKG_REPOSITORY": "https://github.com/autometrics-dev/autometrics-rs",
    },
)

crate.from_specs()
use_repo(crate, "crates")

Would be great if this PR would make it into main one day but in the meantime,
crate annotations solve the problem just fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants