[Tracking] @xla//xla/pjrt/c:pjrt_c_api_gpu_plugin.so failed to build after updating XLA pin

## 🐛 Bug



## 

After updating XLA pin from 32ebd694c4d0442e241d76324ff1a721831366b4 to 590cd6fcd1ed24ab9cf494789a0fc524b94a4a6a in PR https://github.com/pytorch/xla/pull/8079/files

Our CI has the following failure:
https://github.com/pytorch/xla/actions/runs/11060810258/job/30732124138?pr=8079 ?  the object that is failed to build is bazel build @xla//xla/pjrt/c:pjrt_c_api_gpu_plugin.so which is not our target.

The exact error is 
```
ERROR: /github/home/.cache/bazel/_bazel_root/197a057057a49e5811107144e2d78508/external/xla/xla/stream_executor/cuda/BUILD:450:19: no such target '@local_config_cuda//cuda:implicit_cuda_headers_dependency': target 'implicit_cuda_headers_dependency' not declared in package 'cuda' defined by /github/home/.cache/bazel/_bazel_root/197a057057a49e5811107144e2d78508/external/local_config_cuda/cuda/BUILD (Tip: use query "@local_config_cuda//cuda:*" to see all the targets in that package) and referenced by '@xla//xla/stream_executor/cuda:delay_kernel_cuda_cuda'
```

 this `@local_config_cuda` is defined by using upstream's (https://github.com/google/tsl) `cuda_configure` starlack function:
like this:

```
load(
   "@tsl//third_party/gpus/cuda/hermetic:cuda_configure.bzl",
   "cuda_configure",
)

cuda_configure(name = "local_config_cuda")
```
this bit of code is copied by following this deprecated section of this doc: https://github.com/openxla/xla/blob/main/docs/hermetic_cuda.md#deprecated-non-hermetic-cudacudnn-usage

## Current theory:

cuda_configure function is supposed to setup the `local_config_cuda` to have the build target that tsl needs. But this deprecated non-hermetic version did not do that.

## Current tried actions:

We tried to follow the hermetic cuda setup described in this doc: https://github.com/openxla/xla/blob/main/docs/hermetic_cuda.md#deprecated-non-hermetic-cudacudnn-usage

However, it requires the use of clang compiler instead of gcc.

I am attempting to use clang, but this line that forces gcc claims that clang has issues: https://github.com/pytorch/xla/blob/940bee453fb27a023b360469487af2a8831966d6/.bazelrc#L27 

With clang
it produces this error:

```
      ERROR: /github/home/.cache/bazel/_bazel_root/197a057057a49e5811107144e2d78508/external/llvm-project/llvm/BUILD.bazel:251:11: Compiling llvm/lib/Support/Valgrind.cpp [for tool] failed: undeclared inclusion(s) in rule '@llvm-project//llvm:Support':
      this rule is missing dependency declarations for the following files included by 'llvm/lib/Support/Valgrind.cpp':
        '/usr/lib/clang/11.0.1/include/stddef.h'
        '/usr/lib/clang/11.0.1/include/__stddef_max_align_t.h'
```

Which is weird because `stddef.h` is a system header and bazel should not ask for extra BUILD dependency declared for this. 

This [post](https://stackoverflow.com/questions/43921911/how-to-resolve-bazel-undeclared-inclusions-error) in stackoverflow
says that we should clean bazel cache. Which we did by adding `bazel clean --expunge` right before  the build, and it still doesnt work.

The latest CI with the above change is: https://github.com/pytorch/xla/actions/runs/11115985671/job/30885415097?pr=8079

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Tracking] @xla//xla/pjrt/c:pjrt_c_api_gpu_plugin.so failed to build after updating XLA pin #8199

🐛 Bug

Current theory:

Current tried actions:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Tracking] @xla//xla/pjrt/c:pjrt_c_api_gpu_plugin.so failed to build after updating XLA pin #8199

Description

🐛 Bug

Current theory:

Current tried actions:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions