-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IR] Introduce U<address space>
to DataLayout
to represent undesirable address space if a target has it
#108786
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
@llvm/pr-subscribers-backend-spir-v @llvm/pr-subscribers-clang Author: Shilei Tian (shiltian) ChangesPatch is 23.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/108786.diff 15 Files Affected:
diff --git a/clang/lib/Basic/Targets/AMDGPU.cpp b/clang/lib/Basic/Targets/AMDGPU.cpp
index 3b748d0249d57b..0ee56848a6cb98 100644
--- a/clang/lib/Basic/Targets/AMDGPU.cpp
+++ b/clang/lib/Basic/Targets/AMDGPU.cpp
@@ -36,7 +36,7 @@ static const char *const DataLayoutStringAMDGCN =
"-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:"
"32-v48:64-v96:128"
"-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1"
- "-ni:7:8:9";
+ "-ni:7:8:9-T0";
const LangASMap AMDGPUTargetInfo::AMDGPUDefIsGenMap = {
llvm::AMDGPUAS::FLAT_ADDRESS, // Default
diff --git a/clang/lib/Basic/Targets/NVPTX.cpp b/clang/lib/Basic/Targets/NVPTX.cpp
index 43b653dc52ce0d..59344c3c71aee2 100644
--- a/clang/lib/Basic/Targets/NVPTX.cpp
+++ b/clang/lib/Basic/Targets/NVPTX.cpp
@@ -66,12 +66,12 @@ NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple &Triple,
HasFloat16 = true;
if (TargetPointerWidth == 32)
- resetDataLayout("e-p:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64");
+ resetDataLayout("e-p:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64-T0");
else if (Opts.NVPTXUseShortPointers)
- resetDataLayout(
- "e-p3:32:32-p4:32:32-p5:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64");
+ resetDataLayout("e-p3:32:32-p4:32:32-p5:32:32-i64:64-i128:128-v16:16-v32:"
+ "32-n16:32:64-T0");
else
- resetDataLayout("e-i64:64-i128:128-v16:16-v32:32-n16:32:64");
+ resetDataLayout("e-i64:64-i128:128-v16:16-v32:32-n16:32:64-T0");
// If possible, get a TargetInfo for our host triple, so we can match its
// types.
diff --git a/clang/test/CodeGen/target-data.c b/clang/test/CodeGen/target-data.c
index 41cbd5a0219d5e..a3d1a8cb9ee234 100644
--- a/clang/test/CodeGen/target-data.c
+++ b/clang/test/CodeGen/target-data.c
@@ -160,11 +160,11 @@
// RUN: %clang_cc1 -triple nvptx-unknown -o - -emit-llvm %s | \
// RUN: FileCheck %s -check-prefix=NVPTX
-// NVPTX: target datalayout = "e-p:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64"
+// NVPTX: target datalayout = "e-p:32:32-i64:64-i128:128-v16:16-v32:32-n16:32:64-T0"
// RUN: %clang_cc1 -triple nvptx64-unknown -o - -emit-llvm %s | \
// RUN: FileCheck %s -check-prefix=NVPTX64
-// NVPTX64: target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
+// NVPTX64: target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64-T0"
// RUN: %clang_cc1 -triple r600-unknown -o - -emit-llvm %s | \
// RUN: FileCheck %s -check-prefix=R600
@@ -176,12 +176,12 @@
// RUN: %clang_cc1 -triple amdgcn-unknown -target-cpu hawaii -o - -emit-llvm %s \
// RUN: | FileCheck %s -check-prefix=R600SI
-// R600SI: target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
+// R600SI: target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9-T0"
// Test default -target-cpu
// RUN: %clang_cc1 -triple amdgcn-unknown -o - -emit-llvm %s \
// RUN: | FileCheck %s -check-prefix=R600SIDefault
-// R600SIDefault: target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
+// R600SIDefault: target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9-T0"
// RUN: %clang_cc1 -triple arm64-unknown -o - -emit-llvm %s | \
// RUN: FileCheck %s -check-prefix=AARCH64
diff --git a/clang/test/CodeGenOpenCL/amdgpu-env-amdgcn.cl b/clang/test/CodeGenOpenCL/amdgpu-env-amdgcn.cl
index bb52f87615214b..b4b246cc082e00 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-env-amdgcn.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-env-amdgcn.cl
@@ -1,5 +1,5 @@
// RUN: %clang_cc1 %s -O0 -triple amdgcn -emit-llvm -o - | FileCheck %s
// RUN: %clang_cc1 %s -O0 -triple amdgcn---opencl -emit-llvm -o - | FileCheck %s
-// CHECK: target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
+// CHECK: target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9-T0"
void foo(void) {}
diff --git a/lld/test/ELF/lto/amdgcn-oses.ll b/lld/test/ELF/lto/amdgcn-oses.ll
index 7a74d0317f2b9e..8903b45565b41f 100644
--- a/lld/test/ELF/lto/amdgcn-oses.ll
+++ b/lld/test/ELF/lto/amdgcn-oses.ll
@@ -25,7 +25,7 @@
;--- amdhsa.ll
target triple = "amdgcn-amd-amdhsa"
-target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"
+target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-T0"
!llvm.module.flags = !{!0}
!0 = !{i32 1, !"amdhsa_code_object_version", i32 500}
@@ -36,7 +36,7 @@ define void @_start() {
;--- amdpal.ll
target triple = "amdgcn-amd-amdpal"
-target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"
+target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-T0"
define amdgpu_cs void @_start() {
ret void
@@ -44,7 +44,7 @@ define amdgpu_cs void @_start() {
;--- mesa3d.ll
target triple = "amdgcn-amd-mesa3d"
-target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"
+target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-T0"
define void @_start() {
ret void
diff --git a/lld/test/ELF/lto/amdgcn.ll b/lld/test/ELF/lto/amdgcn.ll
index 4281e209fd9789..bcfbc272afdf9a 100644
--- a/lld/test/ELF/lto/amdgcn.ll
+++ b/lld/test/ELF/lto/amdgcn.ll
@@ -5,7 +5,7 @@
; Make sure the amdgcn triple is handled
target triple = "amdgcn-amd-amdhsa"
-target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"
+target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-T0"
define void @_start() {
ret void
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 144b4497ca63ce..dc9b6b9c9ccbf8 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -354,7 +354,7 @@ added in the future:
not be used lightly but only for specific situations such as an
alternative to the *register pinning* performance technique often
used when implementing functional programming languages. At the
- moment only X86, AArch64, and RISCV support this convention. The
+ moment only X86, AArch64, and RISCV support this convention. The
following limitations exist:
- On *X86-32* only up to 4 bit type parameters are supported. No
@@ -685,10 +685,10 @@ implementation defined, the optimizer can't do the latter. The former is
challenging as many commonly expected properties, such as
``ptrtoint(v)-ptrtoint(v) == 0``, don't hold for non-integral types.
Similar restrictions apply to intrinsics that might examine the pointer bits,
-such as :ref:`llvm.ptrmask<int_ptrmask>`.
+such as :ref:`llvm.ptrmask<int_ptrmask>`.
The alignment information provided by the frontend for a non-integral pointer
-(typically using attributes or metadata) must be valid for every possible
+(typically using attributes or metadata) must be valid for every possible
representation of the pointer.
.. _globalvars:
@@ -1677,10 +1677,10 @@ Currently, only the following parameter attributes are defined:
- The range is allowed to wrap.
- The empty range is represented using ``0,0``.
- Otherwise, ``a`` and ``b`` are not allowed to be equal.
-
- This attribute may only be applied to parameters or return values with integer
+
+ This attribute may only be applied to parameters or return values with integer
or vector of integer types.
-
+
For vector-typed parameters, the range is applied element-wise.
.. _gc:
@@ -3050,6 +3050,19 @@ as follows:
address space 0, this property only affects the default value to be used
when creating globals without additional contextual information (e.g. in
LLVM passes).
+``T<address space>``
+ Specifies the address space for a target's 'flat' address space. Note this
+ is not necessarily the same as addrspace 0, which LLVM sometimes refers to
+ as the generic address space. The flat address space is a generic address
+ space that can be used access multiple segments of memory with different
+ address spaces. Access of a memory location through a pointer with this
+ address space is expected to be legal but slower compared to the same memory
+ location accessed through a pointer with a different address space. This is
+ for targets with different pointer representations which can be converted
+ with the addrspacecast instruction. If a pointer is converted to this
+ address space, optimizations should attempt to replace the access with the
+ source address space. The absence of this specification indicates the target
+ does not have such a flat address space to optimize away.
.. _alloca_addrspace:
@@ -14346,7 +14359,7 @@ Arguments:
""""""""""
The first 4 arguments are similar to ``llvm.instrprof.increment``. The indexing
is specific to callsites, meaning callsites are indexed from 0, independent from
-the indexes used by the other intrinsics (such as
+the indexes used by the other intrinsics (such as
``llvm.instrprof.increment[.step]``).
The last argument is the called value of the callsite this intrinsic precedes.
@@ -14360,7 +14373,7 @@ a buffer LLVM can use to perform counter increments (i.e. the lowering of
``llvm.instrprof.increment[.step]``. The address range following the counter
buffer, ``<num-counters>`` x ``sizeof(ptr)`` - sized, is expected to contain
pointers to contexts of functions called from this function ("subcontexts").
-LLVM does not dereference into that memory region, just calculates GEPs.
+LLVM does not dereference into that memory region, just calculates GEPs.
The lowering of ``llvm.instrprof.callsite`` consists of:
@@ -14929,8 +14942,8 @@ integer bit width or any vector of integer elements.
Overview:
"""""""""
-Return ``-1`` if ``%a`` is signed less than ``%b``, ``0`` if they are equal, and
-``1`` if ``%a`` is signed greater than ``%b``. Vector intrinsics operate on a per-element basis.
+Return ``-1`` if ``%a`` is signed less than ``%b``, ``0`` if they are equal, and
+``1`` if ``%a`` is signed greater than ``%b``. Vector intrinsics operate on a per-element basis.
Arguments:
""""""""""
@@ -14958,8 +14971,8 @@ integer bit width or any vector of integer elements.
Overview:
"""""""""
-Return ``-1`` if ``%a`` is unsigned less than ``%b``, ``0`` if they are equal, and
-``1`` if ``%a`` is unsigned greater than ``%b``. Vector intrinsics operate on a per-element basis.
+Return ``-1`` if ``%a`` is unsigned less than ``%b``, ``0`` if they are equal, and
+``1`` if ``%a`` is unsigned greater than ``%b``. Vector intrinsics operate on a per-element basis.
Arguments:
""""""""""
@@ -21556,9 +21569,9 @@ Semantics:
""""""""""
The '``llvm.vp.minimum``' intrinsic performs floating-point minimum (:ref:`minimum <i_minimum>`)
-of the first and second vector arguments on each enabled lane, the result being
+of the first and second vector arguments on each enabled lane, the result being
NaN if either argument is a NaN. -0.0 is considered to be less than +0.0 for this
-intrinsic. The result on disabled lanes is a :ref:`poison value <poisonvalues>`.
+intrinsic. The result on disabled lanes is a :ref:`poison value <poisonvalues>`.
The operation is performed in the default floating-point environment.
Examples:
@@ -29191,7 +29204,7 @@ Semantics:
""""""""""
The intrinsic ``@llvm.allow.ubsan.check()`` returns either ``true`` or
-``false``, depending on compiler options.
+``false``, depending on compiler options.
For each evaluation of a call to this intrinsic, the program must be valid and
correct both if it returns ``true`` and if it returns ``false``.
@@ -29250,13 +29263,13 @@ Semantics:
""""""""""
The intrinsic ``@llvm.allow.runtime.check()`` returns either ``true`` or
-``false``, depending on compiler options.
+``false``, depending on compiler options.
For each evaluation of a call to this intrinsic, the program must be valid and
correct both if it returns ``true`` and if it returns ``false``.
When used in a branch condition, it allows us to choose between
-two alternative correct solutions for the same problem.
+two alternative correct solutions for the same problem.
If the intrinsic is evaluated as ``true``, program should execute a guarded
check. If the intrinsic is evaluated as ``false``, the program should avoid any
diff --git a/llvm/docs/ReleaseNotes.rst b/llvm/docs/ReleaseNotes.rst
index 52456896f2fc6c..c4edbc6daff774 100644
--- a/llvm/docs/ReleaseNotes.rst
+++ b/llvm/docs/ReleaseNotes.rst
@@ -56,6 +56,9 @@ Changes to the LLVM IR
* Added ``usub_cond`` and ``usub_sat`` operations to ``atomicrmw``.
+* Added ``T<address space>`` to data layout to represent flat address space if a
+ target has it.
+
Changes to LLVM infrastructure
------------------------------
diff --git a/llvm/include/llvm/IR/DataLayout.h b/llvm/include/llvm/IR/DataLayout.h
index 8f7ab2f9df389e..5ee388a5d059dd 100644
--- a/llvm/include/llvm/IR/DataLayout.h
+++ b/llvm/include/llvm/IR/DataLayout.h
@@ -94,6 +94,7 @@ class DataLayout {
unsigned AllocaAddrSpace = 0;
unsigned ProgramAddrSpace = 0;
unsigned DefaultGlobalsAddrSpace = 0;
+ unsigned FlatAddressSpace = ~0U;
MaybeAlign StackNaturalAlign;
MaybeAlign FunctionPtrAlign;
@@ -245,6 +246,7 @@ class DataLayout {
unsigned getDefaultGlobalsAddressSpace() const {
return DefaultGlobalsAddrSpace;
}
+ unsigned getFlatAddressSpace() const { return FlatAddressSpace; }
bool hasMicrosoftFastStdCallMangling() const {
return ManglingMode == MM_WinCOFFX86;
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 69dae5e32dbbe8..a08700d4585985 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -5451,6 +5451,10 @@ std::string llvm::UpgradeDataLayoutString(StringRef DL, StringRef TT) {
if (!DL.contains("-p9") && !DL.starts_with("p9"))
Res.append("-p9:192:256:256:32");
+ // Add flat address space.
+ if (!DL.contains("-T0") && !DL.starts_with("T0"))
+ Res.append("-T0");
+
return Res;
}
@@ -5501,6 +5505,12 @@ std::string llvm::UpgradeDataLayoutString(StringRef DL, StringRef TT) {
Res = (Ref.take_front(I) + "-f80:128-" + Ref.drop_front(I + 8)).str();
}
+ if (T.isNVPTX()) {
+ // Add flat address space.
+ if (!DL.contains("-T0") && !DL.starts_with("T0"))
+ Res.append("-T0");
+ }
+
return Res;
}
diff --git a/llvm/lib/IR/DataLayout.cpp b/llvm/lib/IR/DataLayout.cpp
index d295d1f5785eb9..51396241515647 100644
--- a/llvm/lib/IR/DataLayout.cpp
+++ b/llvm/lib/IR/DataLayout.cpp
@@ -228,6 +228,7 @@ DataLayout &DataLayout::operator=(const DataLayout &Other) {
AllocaAddrSpace = Other.AllocaAddrSpace;
ProgramAddrSpace = Other.ProgramAddrSpace;
DefaultGlobalsAddrSpace = Other.DefaultGlobalsAddrSpace;
+ FlatAddressSpace = Other.FlatAddressSpace;
StackNaturalAlign = Other.StackNaturalAlign;
FunctionPtrAlign = Other.FunctionPtrAlign;
TheFunctionPtrAlignType = Other.TheFunctionPtrAlignType;
@@ -250,6 +251,7 @@ bool DataLayout::operator==(const DataLayout &Other) const {
AllocaAddrSpace == Other.AllocaAddrSpace &&
ProgramAddrSpace == Other.ProgramAddrSpace &&
DefaultGlobalsAddrSpace == Other.DefaultGlobalsAddrSpace &&
+ FlatAddressSpace == Other.FlatAddressSpace &&
StackNaturalAlign == Other.StackNaturalAlign &&
FunctionPtrAlign == Other.FunctionPtrAlign &&
TheFunctionPtrAlignType == Other.TheFunctionPtrAlignType &&
@@ -568,6 +570,13 @@ Error DataLayout::parseSpecification(StringRef Spec) {
return Err;
break;
}
+ case 'T': { // Flat address space.
+ if (Rest.empty())
+ return createSpecFormatError("T<address space>");
+ if (Error Err = parseAddrSpace(Rest, FlatAddressSpace))
+ return Err;
+ break;
+ }
case 'm':
if (!Rest.consume_front(":") || Rest.empty())
return createSpecFormatError("m:<mangling>");
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index f860b139945122..2e26cb969206f5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -579,7 +579,7 @@ static StringRef computeDataLayout(const Triple &TT) {
"-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-"
"v32:32-v48:64-v96:"
"128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-"
- "G1-ni:7:8:9";
+ "G1-ni:7:8:9-T0";
}
LLVM_READNONE
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
index 57b7fa783c14a7..eb7f29655e6441 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
@@ -117,7 +117,7 @@ static std::string computeDataLayout(bool is64Bit, bool UseShortPointers) {
else if (UseShortPointers)
Ret += "-p3:32:32-p4:32:32-p5:32:32";
- Ret += "-i64:64-i128:128-v16:16-v32:32-n16:32:64";
+ Ret += "-i64:64-i128:128-v16:16-v32:32-n16:32:64-T0";
return Ret;
}
diff --git a/llvm/unittests/Bitcode/DataLayoutUpgradeTest.cpp b/llvm/unittests/Bitcode/DataLayoutUpgradeTest.cpp
index ca50187e5e5ee0..85029297a72df2 100644
--- a/llvm/unittests/Bitcode/DataLayoutUpgradeTest.cpp
+++ b/llvm/unittests/Bitcode/DataLayoutUpgradeTest.cpp
@@ -36,12 +36,12 @@ TEST(DataLayoutUpgradeTest, ValidDataLayoutUpgrade) {
// Check that AMDGPU targets add -G1 if it's not present.
EXPECT_EQ(UpgradeDataLayoutString("e-p:32:32", "r600"), "e-p:32:32-G1");
// and that ANDGCN adds p7 and p8 as well.
- EXPECT_EQ(
- UpgradeDataLayoutString("e-p:64:64", "amdgcn"),
- "e-p:64:64-G1-ni:7:8:9-p7:160:256:256:32-p8:128:128-p9:192:256:256:32");
- EXPECT_EQ(
- UpgradeDataLayoutString("e-p:64:64-G1", "amdgcn"),
- "e-p:64:64-G1-ni:7:8:9-p7:160:256:256:32-p8:128:128-p9:192:256:256:32");
+ EXPECT_EQ(UpgradeDataLayoutString("e-p:64:64", "amdgcn"),
+ "e-p:64:64-G1-ni:7:8:9-p7:160:256:256:32-p8:128:128-p9:192:256:256:"
+ "32-T0");
+ EXPECT_EQ(UpgradeDataLayoutString("e-p:64:64-G1", "amdgcn"),
+ "e-p:64:64-G1-ni:7:8:9-p7:160:256:256:32-p8:128:128-p9:192:256:256:"
+ "32-T0");
// but that r600 does not.
EXPECT_EQ(UpgradeDataLayoutString("e-p:32:32-G1", "r600"), "e-p:32:32-G1");
@@ -56,7 +56,7 @@ TEST(DataLayoutUpgradeTest, ValidDataLayoutUpgrade) {
"e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-"
"v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:"
...
[truncated]
|
@@ -579,7 +579,7 @@ static StringRef computeDataLayout(const Triple &TT) { | |||
"-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-" | |||
"v32:32-v48:64-v96:" | |||
"128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-" | |||
"G1-ni:7:8:9"; | |||
"G1-ni:7:8:9-T0"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does r600 have flat address space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but yes. We probably should just define 0 to be the flat address space and take the same numbers as amdgcn. Flat will just be unsupported in codegen (but theoretically someone could go implement software tagged pointers)
56b15c3
to
0518fe9
Compare
Is there some github issue or discourse thread that provides context for this change? My understanding is that the default LLVM IR address space (*) is flat. Is that currently the case or is my understanding incorrect? Is this PR proposing that (*) The address space C++/Rust/... pointers map to, i.e., |
I updated the PR description. Hopefully it can make the motivation of this patch more clear. I think the central question is, why it can't be just address space 0. I don't have a clear answer to that. @arsenm first introduced this concept to LLVM at least 8 years ago. I agree that the only two targets that have this concept are AMDGPU and NVPTX, and they both set it to address space 0. My understanding is, LLVM doesn't force any other characteristics to address 0 in addition to just making it as the default one if there is no other specified for the allocation of certain variables, so we can't assert or assume how a target needs to implement it. |
There are targets that use a different integer to denote flat (e.g. see SPIR & SPIR-V). Whilst I know that there are objections to that, the fact remains that they had historical reason (wanted to make legacy OCL convention that the default is private work, and given that IR defaults to 0 this was an easy, if possibly costly, way out; AMDGPU also borks this for legacy OCL reasons, which has been a source of pain). I think the core matter, which we also talked about in #95728, is that LLVM defaults to 0 and posits some guarantees around that as per LangRef, but it doesn't say much of anything about guaranteed convertibility to / from 0. Which turns out to be a problem for targets / pseudo-targets which do not also use 0 to denote flat/generic, as a number of constructs that merely rely on getting the default end up needing a cast. @efriedma-quic made the point that in general there should / would always be a correct AS and the solution is to emit these in the right AS from the get-go, but the problem is somewhat pervasive. |
The description of the flat address space in the
Why not? E.g. if we have a processor with Harvard architecture, there may conceivably be multiple 'flat', non-overlapping address spaces, each covering a set of specific ASes. I think conceptually we should describe AS hierarchy explicitly, and avoid the assumptions on their number or layout. ... and then change the API to |
+1 to @efriedma-quic and @jdoerfert's comments. DataLayout should remain as generic as possible. Trying to encode a concept of "the flat address space" in it seems way too specific to one optimization for one or two targets. |
It's a good idea to make it more generic. Potentially we can add an API like |
This isn't purely a nice to have optimisation aid though, is it? There are cases where you need a safe (portable?) default that the target can perhaps optimise, but, absent that, would at least work, and at the moment there's no handy way to query that generically (or from Clang). We do handwave 0 as being that safe default, and hope for the best, but as mentioned that relies on targets using 0 to correspond to flat/generic/whatever we call it, which they are not bound to do. To me, adding this is not entirely different from encoding the Alloca AS, which we already do. |
We've avoided putting optimization properties in the DataLayout in the past. The interactions with LTO mean we want the DataLayout for a given subtarget to be stable. In this particular case, a target's "flat" address-space can't really ever change, though, so it's probably fine.
The use-cases currently under discussion, and LangRef itself, don't require the flat address-space to have any particular semantics; the only property it has is "operations using it are slower". What semantics do you need, and where do you need them? |
Your argument is my point 3), right? #108786 (comment) Why is that a DL property and not just a target property? Or even just "computable" by asking query 1) for different ASs? |
It is indeed a target property. DL is one choice here (if it is not the most preferable one). Like I mentioned before, TTI and
IIUC, the query 1) you were referring to was |
ping |
I don't know what I was thinking of, the above or I am personally not opposed to the DL solution, but I guess it's a big change that feels too much for me to approve. I'm also worried the "flat" concept is a special case of something more generic and we should instead define that. E.g.,
|
In reply to your question above re whether this is a DL or a Target property, I don't have a strong opinion there (it appears @shiltian and @arsenm might). I do believe that this is a necessary bit of query-able information, especially from a Clang, for correctness reasons (more on that below).
To make this generically useful, it probably should, but I don't think the current words imply it.
Ah, this is part of the challenge - we do indeed assume that 0 is flat, but Targets aren't bound by LangRef to use 0 to denote flat (and some, like SPIR / SPIR-V) do not, but rather problematically use 0 to denote Private, which is extremely constrained. This leads to problematic IR, due to the assumption you mentioned. An example I ran into the other day is Perhaps another solution(?) here would be to tweak LangRef to be binding in that it spells out that iff a particular Target has a Flat AS, that will be reflected in LLVM as AS0? |
Like @AlexVlx mentioned, as well as the wording I put into
I don't think this is technically true, but it is unfortunately practically true. There is no formal definition of what a flat AS is (except the one in TTI which is not widely used at all), which is the central question of this PR. That means all the "assumption everywhere" is basically assumption of any means for its own purpose. The current AS 0 is nothing special but just taken as the default AS for various object allocation if no explicit AS is used. That's all.
I'm fine with adding the enforcement in LLVM that AS0 needs to be the flat AS, if a target has it, but the definition of a flat AS still needs to be set. If we do that, how will SPIR/SPIR-V work? I think to have a clear definition of what a flat AS is critical, no matter if we want to add a new specification, as what is done is this PR, or add the wording into LLVM LangRef, enforcing it to AS 0. I'd like to give it a try here. ⬇️ A flat address space is an address space with the following characteristics:
All other characteristics of the flat address space will align with those of the original existing address space. This is the most generic wording I can come up with so far. Happy to hear more feedbacks. |
The SPIRV IR would be better off changing its numbers around like we did in AMDGPU ages ago. The only concern would be bitcode compatibility, but given it's still an "experimental target" that shouldn't be an issue.
This is only a broken in-clang hack, the backend IR always uses the correct address space |
This should change nothing about aliasing. The IR assumption is any address space may alias any other |
Right, this is more of an undesirable address space. Optimizations don't need to know anything about its behavior beyond that.
I don't really like putting this in the DataLayout. My original idea was to move it to TargetMachine, but we want to avoid the dependence on CodeGen. The DataLayout is just the other place we have that defines module level target information. The simple solution is just have a switch over the target architecture in Attributor.
I don't think this buys frontends much. Clang still needs to understand the full language address space -> target address space mapping. This would just allow populating one entry generically
As I mentioned above, SPIRV can just work its way out of this problem for its IR. SPIR's only reason for existence is bitcode compatibility, so doing anything with there will be quite a lot of work which will never realistically happen.
I would like to avoid adding additional special properties to AS0, or defining the flat concept. |
How can we add a new specification w/o defining it?
That means in the future when a new (GPU/accelerator) target is added, someone has to know that is one place to add a new case. FWIW, it is not just Attributor. The situation applies to all passes that potentially don't have access to TTI/TM. I don't really like the solution, but that is what I'm doing to move other PRs forward w/o being blocked by this one. |
By not defining it in terms of flat addressing. Just make it the undesirable address space |
But that will still require to define, what is undesirable address space right? |
The LangRef doesn't need to know why it's undesirable. It's like the n field |
|
Yes. It's an optimization hint |
…ddress space if a target supports it
cad9ac7
to
c2be824
Compare
T<address space>
to DataLayout
to represent flat address space if a target supports itU<address space>
to DataLayout
to represent undesirable address space if a target has it
This is all fine, let's ignore SPIR-V for a second (SPIR being relevant is debatable, and I'm not sure what its reason for being was, but I'm pretty sure nothing is being done with it), and assume that that gets fixed. What is there to prevent some other target coming in and doing the same thing (using 0 to represent some constrained / problematic AS), without placing any sort of additional constraints on 0 (the absence thereof being an issue as it stands), and without giving the FE a mechanism to at least attempt to get flat/flat-like AS? More specifically, how do you expect Clang to figure this out when e.g. compiling C++ for some AS rich target that decided to default to something odd (so querying the correspondent for LangAS::Default is not useful) and uses 0 to denote a special purpose AS (so the many uses of getUnqual/get with the AS argument defaulted end up doing something wrong)? There isn't an awful lot of guidance being provided, you don't want to update LangRef, and the "undesirable" wording is rather opaque to put it mildly. |
I do not expect clang to be using the datalayout to decide anything. That is not really the purpose of the datalayout. It does not provide semantic information to the frontend. The frontend needs to directly understand the source language semantics and the address spaces of the target. |
So your solution here is to simply "infect" every language with full address space awareness? Seems problematic. Furthermore, this stance on DataLayout is somewhat curious, considering we do use it precisely for that, in Clang, around things like globals and |
My impression is that the treatment of address-spaces varies so much across different languages and different targets that it doesn't make sense to encode the entire address-space layout the datalayout. Narrowly encoding just the specific properties optimizations need means we don't need to rework optimizations and the datalayout for each new generation of GPU/accelerator. And developers working on IR optimizations don't need to try to decipher the implications of different address-spaces. |
It's similar to, but a simpler version of the ABI emission problem. The frontend has to know properties of the target to emit the IR for it. It's infeasible to encode all of the possible address space information in a simple datalayout string.
Where is the datalayout providing semantics in clang? The clang usage of the datalayout should be limited to computing type sizes and other low level ABI details (not semantics) |
It seems like we can’t reach consensus on how to move forward with this. I already worked my issue around so I guess we no longer need this PR. |
This patch adds undesirable address space to
DataLayout
if a target supports it. It can be set viaU<address space>
.Motivation
First and foremost: The flat address space is an existing concept in LLVM.
llvm-project/llvm/include/llvm/Analysis/TargetTransformInfo.h
Line 454 in 27a62ec
Having the query function
getFlatAddressSpace
as a member function ofTargetTransformInfo
means if we want to query it, we need to have aFunction &
because bothTargetTransformInfoWrapperPass
andTargetIRAnalysis
are function pass. However, this is not always feasible. For example, in the module passAttributor
, those target independentAttributorAttributes
can potentially just analyze aConstant
(which means it only has access toConstant *
). In this case, we don't have aFunction &
to get TTI. A detailed discussion and use case can be found here.If the above issue is the symptom, then the root cause is, it should not be in TTI at the first place, because it is really a target or module level property. We can't put it in
TargetMachine
either because that would introduce circular dependence (see here for more information).Data layout is the perfect place to encode this information because the definition of various address spaces are defined there already. In this way, we can query the information just through a reference to
DataLayout
, which is widely available almost everywhere.Q&A
Isn't address space 0 flat?
Based on existing code, not necessarily.
Can there be more than one flat address space?
No.
If you specify a flat address space, does that mean that all other address spaces are not flat, and thus cannot alias with other address spaces?
Yes, all other address spaces are not flat. A flat address space pointer can still point to the same place as a non-flat address space pointer, so it doesn't guarantee no alias.
Is it optional for a target to provide a flat address space?
Yes. Similar to existing address space concept, a target doesn't need to have a flat address space.
Update:
The PR has been updated to call it undesirable address space w/o giving it a concrete definition. It is up to targets how to interpret undesirable address space. The LangRef only defines the minimum requirement to be qualified as undesirable address space.