Skip to content

Commit

Permalink
[IR] Introduce captures attribute (llvm#116990)
Browse files Browse the repository at this point in the history
This introduces the `captures` attribute as described in:
https://discourse.llvm.org/t/rfc-improvements-to-capture-tracking/81420

This initial patch only introduces the IR/bitcode support for the
attribute and its in-memory representation as `CaptureInfo`. This will
be followed by a patch to upgrade and remove the `nocapture` attribute,
and then by actual inference/analysis support.

Based on the RFC feedback, I've used a syntax similar to the `memory`
attribute, though the only "location" that can be specified is `ret`.

I've added some pretty extensive documentation to LangRef on the
semantics. One non-obvious bit here is that using ptrtoint will not
result in a "return-only" capture, even if the ptrtoint result is only
used in the return value. Without this requirement we wouldn't be able
to continue ordinary capture analysis on the return value.
  • Loading branch information
nikic authored Jan 13, 2025
1 parent d6f7f2a commit 22e9024
Show file tree
Hide file tree
Showing 19 changed files with 580 additions and 10 deletions.
136 changes: 127 additions & 9 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1397,6 +1397,42 @@ Currently, only the following parameter attributes are defined:
function, returning a pointer to allocated storage disjoint from the
storage for any other object accessible to the caller.

``captures(...)``
This attributes restrict the ways in which the callee may capture the
pointer. This is not a valid attribute for return values. This attribute
applies only to the particular copy of the pointer passed in this argument.

The arguments of ``captures`` is a list of captured pointer components,
which may be ``none``, or a combination of:

- ``address``: The integral address of the pointer.
- ``address_is_null`` (subset of ``address``): Whether the address is null.
- ``provenance``: The ability to access the pointer for both read and write
after the function returns.
- ``read_provenance`` (subset of ``provenance``): The ability to access the
pointer only for reads after the function returns.

Additionally, it is possible to specify that some components are only
captured in certain locations. Currently only the return value (``ret``)
and other (default) locations are supported.

The `pointer capture section <pointercapture>` discusses these semantics
in more detail.

Some examples of how to use the attribute:

- ``captures(none)``: Pointer not captured.
- ``captures(address, provenance)``: Equivalent to omitting the attribute.
- ``captures(address)``: Address may be captured, but not provenance.
- ``captures(address_is_null)``: Only captures whether the address is null.
- ``captures(address, read_provenance)``: Both address and provenance
captured, but only for read-only access.
- ``captures(ret: address, provenance)``: Pointer captured through return
value only.
- ``captures(address_is_null, ret: address, provenance)``: The whole pointer
is captured through the return value, and additionally whether the pointer
is null is captured in some other way.

.. _nocapture:

``nocapture``
Expand Down Expand Up @@ -3339,10 +3375,92 @@ Pointer Capture
---------------

Given a function call and a pointer that is passed as an argument or stored in
the memory before the call, a pointer is *captured* by the call if it makes a
copy of any part of the pointer that outlives the call.
To be precise, a pointer is captured if one or more of the following conditions
hold:
memory before the call, the call may capture two components of the pointer:

* The address of the pointer, which is its integral value. This also includes
parts of the address or any information about the address, including the
fact that it does not equal one specific value. We further distinguish
whether only the fact that the address is/isn't null is captured.
* The provenance of the pointer, which is the ability to perform memory
accesses through the pointer, in the sense of the :ref:`pointer aliasing
rules <pointeraliasing>`. We further distinguish whether only read acceses
are allowed, or both reads and writes.

For example, the following function captures the address of ``%a``, because
it is compared to a pointer, leaking information about the identitiy of the
pointer:

.. code-block:: llvm

@glb = global i8 0

define i1 @f(ptr %a) {
%c = icmp eq ptr %a, @glb
ret i1 %c
}

The function does not capture the provenance of the pointer, because the
``icmp`` instruction only operates on the pointer address. The following
function captures both the address and provenance of the pointer, as both
may be read from ``@glb`` after the function returns:

.. code-block:: llvm

@glb = global ptr null

define void @f(ptr %a) {
store ptr %a, ptr @glb
ret void
}

The following function captures *neither* the address nor the provenance of
the pointer:

.. code-block:: llvm

define i32 @f(ptr %a) {
%v = load i32, ptr %a
ret i32
}

While address capture includes uses of the address within the body of the
function, provenance capture refers exclusively to the ability to perform
accesses *after* the function returns. Memory accesses within the function
itself are not considered pointer captures.

We can further say that the capture only occurs through a specific location.
In the following example, the pointer (both address and provenance) is captured
through the return value only:

.. code-block:: llvm

define ptr @f(ptr %a) {
%gep = getelementptr i8, ptr %a, i64 4
ret ptr %gep
}

However, we always consider direct inspection of the pointer address
(e.g. using ``ptrtoint``) to be location-independent. The following example
is *not* considered a return-only capture, even though the ``ptrtoint``
ultimately only contribues to the return value:

.. code-block:: llvm

@lookup = constant [4 x i8] [i8 0, i8 1, i8 2, i8 3]

define ptr @f(ptr %a) {
%a.addr = ptrtoint ptr %a to i64
%mask = and i64 %a.addr, 3
%gep = getelementptr i8, ptr @lookup, i64 %mask
ret ptr %gep
}

This definition is chosen to allow capture analysis to continue with the return
value in the usual fashion.

The following describes possible ways to capture a pointer in more detail,
where unqualified uses of the word "capture" refer to capturing both address
and provenance.

1. The call stores any bit of the pointer carrying information into a place,
and the stored bits can be read from the place by the caller after this call
Expand Down Expand Up @@ -3381,30 +3499,30 @@ hold:
@lock = global i1 true

define void @f(ptr %a) {
store ptr %a, ptr* @glb
store ptr %a, ptr @glb
store atomic i1 false, ptr @lock release ; %a is captured because another thread can safely read @glb
store ptr null, ptr @glb
ret void
}

3. The call's behavior depends on any bit of the pointer carrying information.
3. The call's behavior depends on any bit of the pointer carrying information
(address capture only).

.. code-block:: llvm

@glb = global i8 0

define void @f(ptr %a) {
%c = icmp eq ptr %a, @glb
br i1 %c, label %BB_EXIT, label %BB_CONTINUE ; escapes %a
br i1 %c, label %BB_EXIT, label %BB_CONTINUE ; captures address of %a only
BB_EXIT:
call void @exit()
unreachable
BB_CONTINUE:
ret void
}

4. The pointer is used in a volatile access as its address.

4. The pointer is used as the pointer operand of a volatile access.

.. _volatile:

Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/AsmParser/LLParser.h
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,7 @@ namespace llvm {
bool inAttrGrp, LocTy &BuiltinLoc);
bool parseRangeAttr(AttrBuilder &B);
bool parseInitializesAttr(AttrBuilder &B);
bool parseCapturesAttr(AttrBuilder &B);
bool parseRequiredTypeAttr(AttrBuilder &B, lltok::Kind AttrToken,
Attribute::AttrKind AttrKind);

Expand Down
6 changes: 6 additions & 0 deletions llvm/include/llvm/AsmParser/LLToken.h
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,12 @@ enum Kind {
kw_inaccessiblememonly,
kw_inaccessiblemem_or_argmemonly,

// Captures attribute:
kw_address,
kw_address_is_null,
kw_provenance,
kw_read_provenance,

// nofpclass attribute:
kw_all,
kw_nan,
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/Bitcode/LLVMBitCodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -788,6 +788,7 @@ enum AttributeKindCodes {
ATTR_KIND_NO_EXT = 99,
ATTR_KIND_NO_DIVERGENCE_SOURCE = 100,
ATTR_KIND_SANITIZE_TYPE = 101,
ATTR_KIND_CAPTURES = 102,
};

enum ComdatSelectionKindCodes {
Expand Down
7 changes: 7 additions & 0 deletions llvm/include/llvm/IR/Attributes.h
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,9 @@ class Attribute {
/// Returns memory effects.
MemoryEffects getMemoryEffects() const;

/// Returns information from captures attribute.
CaptureInfo getCaptureInfo() const;

/// Return the FPClassTest for nofpclass
FPClassTest getNoFPClass() const;

Expand Down Expand Up @@ -436,6 +439,7 @@ class AttributeSet {
UWTableKind getUWTableKind() const;
AllocFnKind getAllocKind() const;
MemoryEffects getMemoryEffects() const;
CaptureInfo getCaptureInfo() const;
FPClassTest getNoFPClass() const;
std::string getAsString(bool InAttrGrp = false) const;

Expand Down Expand Up @@ -1260,6 +1264,9 @@ class AttrBuilder {
/// Add memory effect attribute.
AttrBuilder &addMemoryAttr(MemoryEffects ME);

/// Add captures attribute.
AttrBuilder &addCapturesAttr(CaptureInfo CI);

// Add nofpclass attribute
AttrBuilder &addNoFPClassAttr(FPClassTest NoFPClassMask);

Expand Down
3 changes: 3 additions & 0 deletions llvm/include/llvm/IR/Attributes.td
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,9 @@ def NoCallback : EnumAttr<"nocallback", IntersectAnd, [FnAttr]>;
/// Function creates no aliases of pointer.
def NoCapture : EnumAttr<"nocapture", IntersectAnd, [ParamAttr]>;

/// Specify how the pointer may be captured.
def Captures : IntAttr<"captures", IntersectCustom, [ParamAttr]>;

/// Function is not a source of divergence.
def NoDivergenceSource : EnumAttr<"nodivergencesource", IntersectAnd, [FnAttr]>;

Expand Down
101 changes: 101 additions & 0 deletions llvm/include/llvm/Support/ModRef.h
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,107 @@ raw_ostream &operator<<(raw_ostream &OS, MemoryEffects RMRB);
// Legacy alias.
using FunctionModRefBehavior = MemoryEffects;

/// Components of the pointer that may be captured.
enum class CaptureComponents : uint8_t {
None = 0,
AddressIsNull = (1 << 0),
Address = (1 << 1) | AddressIsNull,
ReadProvenance = (1 << 2),
Provenance = (1 << 3) | ReadProvenance,
All = Address | Provenance,
LLVM_MARK_AS_BITMASK_ENUM(Provenance),
};

inline bool capturesNothing(CaptureComponents CC) {
return CC == CaptureComponents::None;
}

inline bool capturesAnything(CaptureComponents CC) {
return CC != CaptureComponents::None;
}

inline bool capturesAddressIsNullOnly(CaptureComponents CC) {
return (CC & CaptureComponents::Address) == CaptureComponents::AddressIsNull;
}

inline bool capturesAddress(CaptureComponents CC) {
return (CC & CaptureComponents::Address) != CaptureComponents::None;
}

inline bool capturesReadProvenanceOnly(CaptureComponents CC) {
return (CC & CaptureComponents::Provenance) ==
CaptureComponents::ReadProvenance;
}

inline bool capturesFullProvenance(CaptureComponents CC) {
return (CC & CaptureComponents::Provenance) == CaptureComponents::Provenance;
}

raw_ostream &operator<<(raw_ostream &OS, CaptureComponents CC);

/// Represents which components of the pointer may be captured in which
/// location. This represents the captures(...) attribute in IR.
///
/// For more information on the precise semantics see LangRef.
class CaptureInfo {
CaptureComponents OtherComponents;
CaptureComponents RetComponents;

public:
CaptureInfo(CaptureComponents OtherComponents,
CaptureComponents RetComponents)
: OtherComponents(OtherComponents), RetComponents(RetComponents) {}

CaptureInfo(CaptureComponents Components)
: OtherComponents(Components), RetComponents(Components) {}

/// Create CaptureInfo that may capture all components of the pointer.
static CaptureInfo all() { return CaptureInfo(CaptureComponents::All); }

/// Get components potentially captured by the return value.
CaptureComponents getRetComponents() const { return RetComponents; }

/// Get components potentially captured through locations other than the
/// return value.
CaptureComponents getOtherComponents() const { return OtherComponents; }

/// Get the potentially captured components of the pointer (regardless of
/// location).
operator CaptureComponents() const { return OtherComponents | RetComponents; }

bool operator==(CaptureInfo Other) const {
return OtherComponents == Other.OtherComponents &&
RetComponents == Other.RetComponents;
}

bool operator!=(CaptureInfo Other) const { return !(*this == Other); }

/// Compute union of CaptureInfos.
CaptureInfo operator|(CaptureInfo Other) const {
return CaptureInfo(OtherComponents | Other.OtherComponents,
RetComponents | Other.RetComponents);
}

/// Compute intersection of CaptureInfos.
CaptureInfo operator&(CaptureInfo Other) const {
return CaptureInfo(OtherComponents & Other.OtherComponents,
RetComponents & Other.RetComponents);
}

static CaptureInfo createFromIntValue(uint32_t Data) {
return CaptureInfo(CaptureComponents(Data >> 4),
CaptureComponents(Data & 0xf));
}

/// Convert CaptureInfo into an encoded integer value (used by captures
/// attribute).
uint32_t toIntValue() const {
return (uint32_t(OtherComponents) << 4) | uint32_t(RetComponents);
}
};

raw_ostream &operator<<(raw_ostream &OS, CaptureInfo Info);

} // namespace llvm

#endif
4 changes: 4 additions & 0 deletions llvm/lib/AsmParser/LLLexer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -704,6 +704,10 @@ lltok::Kind LLLexer::LexIdentifier() {
KEYWORD(argmemonly);
KEYWORD(inaccessiblememonly);
KEYWORD(inaccessiblemem_or_argmemonly);
KEYWORD(address_is_null);
KEYWORD(address);
KEYWORD(provenance);
KEYWORD(read_provenance);

// nofpclass attribute
KEYWORD(all);
Expand Down
Loading

0 comments on commit 22e9024

Please sign in to comment.