Skip to content

Commit 537e51b

Browse files
authored
[SYCL][CUDA] Add IPSCCP pass to O0 by default (#5900)
The IPSCCP pass can set branch conditions to ConstInt and swap conditional branches to unconditional branches. This is necessary at O0 in the nvptx backend in cases where the `nvvm_reflect` function is used: after the nvvm-reflect pass is called, dead branches containing unused instructions aimed at a different architecture generation (SM version) to the one compiled for can remain. A solution only targeting branches that are using the `nvvm_reflect` function was initially explored by adding a patch to the existing nvvm-reflect pass. This solution would require considering several cases and was abandoned in favour of a simple comprehensive solution of simply adding the IPSCCP pass to OO: since after discussions it turned out that other backends face a corresponding issue, it was decided that a simple temporary DPC++ solution is favoured and that later on in the year a permanent general solution will be worked on. New backend flag `use-ipsccp-nvptx-O0` can remove the IPSCCP pass from O0 when set false, at the users discretion.
1 parent 6a70d9a commit 537e51b

File tree

3 files changed

+24
-1
lines changed

3 files changed

+24
-1
lines changed

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

+11
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
#include "llvm/Support/CommandLine.h"
3434
#include "llvm/Target/TargetMachine.h"
3535
#include "llvm/Target/TargetOptions.h"
36+
#include "llvm/Transforms/IPO.h"
3637
#include "llvm/Transforms/IPO/PassManagerBuilder.h"
3738
#include "llvm/Transforms/Scalar.h"
3839
#include "llvm/Transforms/Scalar/GVN.h"
@@ -63,6 +64,12 @@ static cl::opt<bool> UseShortPointersOpt(
6364
"Use 32-bit pointers for accessing const/local/shared address spaces."),
6465
cl::init(false), cl::Hidden);
6566

67+
static cl::opt<bool>
68+
UseIPSCCPO0("use-ipsccp-nvptx-O0",
69+
cl::desc("Use IPSCCP pass at O0 as a temp solution for "
70+
"nvvm-reflect dead-code errors."),
71+
cl::init(true), cl::Hidden);
72+
6673
namespace llvm {
6774

6875
void initializeLocalAccessorToSharedMemoryPass(PassRegistry &);
@@ -327,6 +334,10 @@ void NVPTXPassConfig::addIRPasses() {
327334
const NVPTXSubtarget &ST = *getTM<NVPTXTargetMachine>().getSubtargetImpl();
328335
addPass(createNVVMReflectPass(ST.getSmVersion()));
329336

337+
if (getOptLevel() == CodeGenOpt::None && UseIPSCCPO0) {
338+
addPass(createIPSCCPPass());
339+
}
340+
330341
// FIXME: should the target triple check be done by the pass itself?
331342
// See createNVPTXLowerArgsPass as an example
332343
if (getTM<NVPTXTargetMachine>().getTargetTriple().getOS() == Triple::CUDA) {

llvm/test/CodeGen/NVPTX/param-load-store.ll

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
; Verifies correctness of load/store of parameters and return values.
2-
; RUN: llc < %s -march=nvptx64 -mcpu=sm_35 -O0 -verify-machineinstrs | FileCheck -allow-deprecated-dag-overlap %s
2+
; RUN: llc < %s -march=nvptx64 -mcpu=sm_35 -O0 -verify-machineinstrs -use-ipsccp-nvptx-O0=false | FileCheck -allow-deprecated-dag-overlap %s
33

44
%s_i1 = type { i1 }
55
%s_i8 = type { i8 }

sycl/doc/GetStartedGuide.md

+12
Original file line numberDiff line numberDiff line change
@@ -831,6 +831,18 @@ which contains all the symbols required.
831831
significantly slower but matches the default precision used by `nvcc`, and
832832
this `clang++` flag is equivalent to the `nvcc` `-prec-sqrt` flag, except that
833833
it defaults to `false`.
834+
* No Opt (O0) uses the IPSCCP compiler pass by default, although the IPSCCP pass
835+
can be switched off at O0 using the `-mllvm -use-ipsccp-nvptx-O0=false` flag at
836+
the user's discretion.
837+
The reason that the IPSCCP pass is used by default even at O0 is that there is
838+
currently an unresolved issue with the nvvm-reflect compiler pass: This pass is
839+
used to pick the correct branches depending on the SM version which can be
840+
optionally specified by the `--cuda-gpu-arch` flag.
841+
If the arch flag is not specified by the user, the default value, SM 50, is used.
842+
Without the execution of the IPSCCP pass at -O0 when using a low SM version,
843+
dead instructions which require a higher SM version can remain. Since
844+
corresponding issues occur in other backends future work will aim for a
845+
universal solution to these issues.
834846
835847
### HIP back-end limitations
836848

0 commit comments

Comments
 (0)