Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
rcvalle committed Feb 28, 2024
1 parent e79d20f commit 00a2146
Showing 1 changed file with 46 additions and 40 deletions.
86 changes: 46 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,22 @@
cfi\_types crate
================
cfi\_types
==========

![Build Status](https://github.com/rcvalle/rust-cfi-types/workflows/build/badge.svg)
![Build Status](https://github.com/rcvalle/rust-crate-cfi-types/workflows/build/badge.svg)

CFI types for cross-language LLVM CFI support.


Installation
------------

To install the cfi\_types crate:
To install the `cfi_types` crate:

1. On a command prompt or terminal with your package root's directory as the
current working directory, run the following command:

cargo add cfi-types

Or:

1. Add the `cfi_types` crate to your package root's `Cargo.toml` file:

Expand All @@ -25,7 +32,7 @@ To install the cfi\_types crate:
Usage
-----

To use the cfi\_types crate:
To use the `cfi_types` crate:

1. Import the CFI types from the `cfi_types` crate. E.g.:

Expand All @@ -49,13 +56,13 @@ Background

LLVM uses [type metadata](https://llvm.org/docs/TypeMetadata.html) to allow IR
modules to aggregate pointers by their types. This type metadata is used by
LLVM Control Flow Integrity to test whether a given pointer is associated with
a type identifier (i.e., test type membership).
LLVM CFI to test whether a given pointer is associated with a type identifier
(i.e., test type membership).

Clang uses the [Itanium C++
ABI](https://itanium-cxx-abi.github.io/cxx-abi/abi.html)'s [virtual tables and
RTTI](https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-special-vtables)
typeinfo structure name as type metadata identifiers for function pointers.
`typeinfo` structure name as type metadata identifiers for function pointers.

For cross-language LLVM CFI support, a compatible encoding must be used. The
compatible encoding chosen for cross-language LLVM CFI support is the Itanium
Expand All @@ -68,13 +75,13 @@ document](https://rcvalle.com/docs/rust-cfi-design-doc.pdf)).

Rust defines `char` as an Unicode scalar value, while C defines `char` as an
integer type. Rust also defines explicitly-sized integer types (i.e., `i8`,
`i16`, `i32`, ...), while C defines abstract integer types (i.e., `char`,
`short`, `long`, ...), which actual sizes are implementation defined and may
vary across different data models. This causes ambiguity if Rust integer types
are used in `extern "C"` function types that represent C functions because the
`i16`, `i32`, ), while C defines abstract integer types (i.e., `char`,
`short`, `long`, ), which actual sizes are implementation defined and may vary
across different data models. This causes ambiguity if Rust integer types are
used in `extern "C"` function types that represent C functions because the
Itanium C++ ABI specifies encodings for C integer types (e.g., `char`, `short`,
`long`, ...), not their defined representations (e.g., 8-bit signed integer,
16-bit signed integer, 32-bit signed integer, ...).
`long`, ), not their defined representations (e.g., 8-bit signed integer,
16-bit signed integer, 32-bit signed integer, ).

For example, the Rust compiler currently is unable to identify if an

Expand All @@ -94,10 +101,10 @@ across the FFI boundary when CFI is enabled.

For convenience, Rust provides some C-like type aliases for use when
interoperating with foreign code written in C, and these C type aliases may be
used for disambiguation. However, when types are encoded, all type aliases are
already resolved to their respective `ty::Ty` type representations[15] (i.e.,
their respective Rust aliased types) making it currently impossible to identify
C type aliases use from their resolved types.
used for disambiguation. However, at the time types are encoded, all type
aliases are already resolved to their respective `ty::Ty` type representations
(i.e., their respective Rust aliased types), making it currently impossible to
identify C type aliases use from their resolved types.

For example, the Rust compiler currently is also unable to identify that an

Expand All @@ -109,12 +116,11 @@ extern "C" {
Fig. 2. Example extern "C" function using C type alias.

used the `c_long` type alias and is not able to disambiguate between it and an
`extern "C" fn func(arg: c_longlong)` in an LP64 or equivalent data model when
types are encoded.
`extern "C" fn func(arg: c_longlong)` in an LP64 or equivalent data model.

Consequently, the Rust compiler is unable to identify and correctly encode C
types in `extern "C"` function types indirectly called across the FFI boundary
when CFI is enabled.
when CFI is enabled:

```c
#include <stdio.h>
Expand All @@ -136,7 +142,7 @@ indirect_call_from_c(void (*fn)(long), long arg)
// group derived from the same type id of the fn declaration, which has the
// type id "_ZTSFvlE".
//
// Notice that since the test is at the call site and generated by Clang,
// Notice that since the test is at the call site and is generated by Clang,
// the type id used in the test is encoded by Clang.
fn(arg);
}
Expand All @@ -151,7 +157,7 @@ extern "C" {
// This declaration would have the type id "_ZTSFvlE", but at the time types
// are encoded, all type aliases are already resolved to their respective
// Rust aliased types, so this is encoded either as "_ZTSFvu3i32E" or
// "_ZTSFvu3i64E" depending to what type c_long type alias is resolved to,
// "_ZTSFvu3i64E", depending to what type c_long type alias is resolved to,
// which currently uses the u<length><type-name> vendor extended type
// encoding for the Rust integer types--this is the problem demonstrated in
// this example.
Expand Down Expand Up @@ -189,8 +195,9 @@ fn indirect_call(f: unsafe extern "C" fn(c_long), arg: c_long) {
// "_ZTSFvu3i32E" or "_ZTSFvu3i64E", similarly to the hello_from_c
// declaration above.
//
// Notice that since the test is at the call site and generated by the Rust
// compiler, the type id used in the test is encoded by the Rust compiler.
// Notice that since the test is at the call site and is generated by the
// Rust compiler, the type id used in the test is encoded by the Rust
// compiler.
unsafe { f(arg) }
}
Expand Down Expand Up @@ -218,17 +225,17 @@ fn main() {
// This demonstrates an indirect call to a function passed as a callback
// across the FFI boundary with the Rust compiler and Clang using different
// encodings for the passed-callback declaration and the test at the
// indirect call site at indirect_call_from_c (i.e., "_ZTSFvu3i32E" or
// "_ZTSFvu3i64E" vs "_ZTSFvlE").
// encodings for the hello_from_rust_again and the test at the indirect call
// site at indirect_call_from_c (i.e., "_ZTSFvu3i32E" or "_ZTSFvu3i64E" vs
// "_ZTSFvlE").
//
// When Rust functions are passed as callbacks across the FFI boundary to be
// called back from C code, the tests are also at the call site but
// generated by Clang instead, so the type ids used in the tests are encoded
// by Clang, which will not match the type ids of declarations encoded by
// the Rust compiler (e.g., hello_from_rust_again). (The same happens the
// other way around for C functions passed as callbacks across the FFI
// boundary to be called back from Rust code.)
// by Clang, which do not match the type ids of declarations encoded by the
// Rust compiler (e.g., hello_from_rust_again). (The same happens the other
// way around for C functions passed as callbacks across the FFI boundary to
// be called back from Rust code.)
unsafe {
indirect_call_from_c(hello_from_rust_again, 5);
}
Expand All @@ -240,7 +247,7 @@ encoding.
Whenever there is an indirect call across the FFI boundary or an indirect call
to a function passed as a callback across the FFI boundary, the Rust compiler
and Clang use different encodings for C integer types for function definitions
and declarations and at the indirect call sites when CFI is enabled (see Figs.
and declarations, and at indirect call sites when CFI is enabled (see Figs.
3–4).


Expand Down Expand Up @@ -284,10 +291,9 @@ unsafe extern "C" fn hello_from_rust_again(_: c_long) {
println!("Hello from Rust again!");
}

// This definition would also have the type id "_ZTSFvPFvlElE" because it uses
// the CFI types for cross-language LLVM CFI support, similarly to the
// hello_from_c declaration above--this can be ignored for the purposes of this
// example.
// This definition also has the type id "_ZTSFvPFvlElE" because it uses the CFI
// types for cross-language LLVM CFI support, similarly to the hello_from_c
// declaration above--this can be ignored for the purposes of this example.
fn indirect_call(f: unsafe extern "C" fn(c_long), arg: c_long) {
// This indirect call site tests whether the destination pointer is a member
// of the group derived from the same type id of the f declaration, which
Expand All @@ -312,8 +318,8 @@ fn main() {

// This demonstrates an indirect call to a function passed as a callback
// across the FFI boundary with the Rust compiler and Clang the same
// encoding for the passed-callback declaration and the test at the indirect
// call site at indirect_call_from_c (i.e., "_ZTSFvlE").
// encoding for the hello_from_rust_again and the test at the indirect call
// site at indirect_call_from_c (i.e., "_ZTSFvlE").
unsafe {
indirect_call_from_c(hello_from_rust_again, c_long(5));
}
Expand All @@ -323,7 +329,7 @@ Fig. 5. Example Rust program using Rust integer types and the Rust compiler
encoding with the cfi\_types crate types.

This new set of C types allows the Rust compiler to identify and correctly
encode C types in extern "C" function types indirectly called across the FFI
encode C types in `extern "C"` function types indirectly called across the FFI
boundary when CFI is enabled (see Fig 5).


Expand Down

0 comments on commit 00a2146

Please sign in to comment.