Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bpftool sync 2024-03-26 #138

Merged
merged 17 commits into from
Mar 26, 2024

Conversation

qmonnet
Copy link
Member

@qmonnet qmonnet commented Mar 26, 2024

Pull latest libbpf from mirror and sync bpftool repo with kernel, up to the commits used for libbpf sync. This is an automatic update performed by calling the sync script from this repo:

$ ./scripts/sync-kernel.sh . <path/to/>linux

Additionally, update the patch with the differences in the Makefile, and add the definition of u16 to types.h for commit bpf: Disasm support for addr_space_cast instruction.

qmonnet and others added 17 commits March 26, 2024 13:02
We'll have a value cast as a u16 in src/kernel/bpf/disasm.c in a future
commit. Add the type definition to the relevant header.

Signed-off-by: Quentin Monnet <[email protected]>
Pull latest libbpf from mirror.
Libbpf version: 1.4.0
Libbpf commit:  20ea95b4505c477af3b6ff6ce9d19cee868ddc5d

Signed-off-by: Quentin Monnet <[email protected]>
It's not restricted to working with "internal" maps, it cares about any
map that can be mmap'ed. Reflect that in more succinct and generic name.

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
Introduce bpf_arena, which is a sparse shared memory region between the bpf
program and user space.

Use cases:
1. User space mmap-s bpf_arena and uses it as a traditional mmap-ed
   anonymous region, like memcached or any key/value storage. The bpf
   program implements an in-kernel accelerator. XDP prog can search for
   a key in bpf_arena and return a value without going to user space.
2. The bpf program builds arbitrary data structures in bpf_arena (hash
   tables, rb-trees, sparse arrays), while user space consumes it.
3. bpf_arena is a "heap" of memory from the bpf program's point of view.
   The user space may mmap it, but bpf program will not convert pointers
   to user base at run-time to improve bpf program speed.

Initially, the kernel vm_area and user vma are not populated. User space
can fault in pages within the range. While servicing a page fault,
bpf_arena logic will insert a new page into the kernel and user vmas. The
bpf program can allocate pages from that region via
bpf_arena_alloc_pages(). This kernel function will insert pages into the
kernel vm_area. The subsequent fault-in from user space will populate that
page into the user vma. The BPF_F_SEGV_ON_FAULT flag at arena creation time
can be used to prevent fault-in from user space. In such a case, if a page
is not allocated by the bpf program and not present in the kernel vm_area,
the user process will segfault. This is useful for use cases 2 and 3 above.

bpf_arena_alloc_pages() is similar to user space mmap(). It allocates pages
either at a specific address within the arena or allocates a range with the
maple tree. bpf_arena_free_pages() is analogous to munmap(), which frees
pages and removes the range from the kernel vm_area and from user process
vmas.

bpf_arena can be used as a bpf program "heap" of up to 4GB. The speed of
bpf program is more important than ease of sharing with user space. This is
use case 3. In such a case, the BPF_F_NO_USER_CONV flag is recommended.
It will tell the verifier to treat the rX = bpf_arena_cast_user(rY)
instruction as a 32-bit move wX = wY, which will improve bpf prog
performance. Otherwise, bpf_arena_cast_user is translated by JIT to
conditionally add the upper 32 bits of user vm_start (if the pointer is not
NULL) to arena pointers before they are stored into memory. This way, user
space sees them as valid 64-bit pointers.

Diff llvm/llvm-project#84410 enables LLVM BPF
backend generate the bpf_addr_space_cast() instruction to cast pointers
between address_space(1) which is reserved for bpf_arena pointers and
default address space zero. All arena pointers in a bpf program written in
C language are tagged as __attribute__((address_space(1))). Hence, clang
provides helpful diagnostics when pointers cross address space. Libbpf and
the kernel support only address_space == 1. All other address space
identifiers are reserved.

rX = bpf_addr_space_cast(rY, /* dst_as */ 1, /* src_as */ 0) tells the
verifier that rX->type = PTR_TO_ARENA. Any further operations on
PTR_TO_ARENA register have to be in the 32-bit domain. The verifier will
mark load/store through PTR_TO_ARENA with PROBE_MEM32. JIT will generate
them as kern_vm_start + 32bit_addr memory accesses. The behavior is similar
to copy_from_kernel_nofault() except that no address checks are necessary.
The address is guaranteed to be in the 4GB range. If the page is not
present, the destination register is zeroed on read, and the operation is
ignored on write.

rX = bpf_addr_space_cast(rY, 0, 1) tells the verifier that rX->type =
unknown scalar. If arena->map_flags has BPF_F_NO_USER_CONV set, then the
verifier converts such cast instructions to mov32. Otherwise, JIT will emit
native code equivalent to:
rX = (u32)rY;
if (rY)
  rX |= clear_lo32_bits(arena->user_vm_start); /* replace hi32 bits in rX */

After such conversion, the pointer becomes a valid user pointer within
bpf_arena range. The user process can access data structures created in
bpf_arena without any additional computations. For example, a linked list
built by a bpf program can be walked natively by user space.

Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Reviewed-by: Barret Rhoden <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
LLVM generates rX = addr_space_cast(rY, dst_addr_space, src_addr_space)
instruction when pointers in non-zero address space are used by the bpf
program. Recognize this insn in uapi and in bpf disassembler.

Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Teach bpftool to recognize arena map type.

Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
LLVM automatically places __arena variables into ".arena.1" ELF section.
In order to use such global variables bpf program must include definition
of arena map in ".maps" section, like:
struct {
       __uint(type, BPF_MAP_TYPE_ARENA);
       __uint(map_flags, BPF_F_MMAPABLE);
       __uint(max_entries, 1000);         /* number of pages */
       __ulong(map_extra, 2ull << 44);    /* start of mmap() region */
} arena SEC(".maps");

libbpf recognizes both uses of arena and creates single `struct bpf_map *`
instance in libbpf APIs.
".arena.1" ELF section data is used as initial data image, which is exposed
through skeleton and bpf_map__initial_value() to the user, if they need to tune
it before the load phase. During load phase, this initial image is copied over
into mmap()'ed region corresponding to arena, and discarded.

Few small checks here and there had to be added to make sure this
approach works with bpf_map__initial_value(), mostly due to hard-coded
assumption that map->mmaped is set up with mmap() syscall and should be
munmap()'ed. For arena, .arena.1 can be (much) smaller than maximum
arena size, so this smaller data size has to be tracked separately.
Given it is enforced that there is only one arena for entire bpf_object
instance, we just keep it in a separate field. This can be generalized
if necessary later.

All global variables from ".arena.1" section are accessible from user space
via skel->arena->name_of_var.

For bss/data/rodata the skeleton/libbpf perform the following sequence:
1. addr = mmap(MAP_ANONYMOUS)
2. user space optionally modifies global vars
3. map_fd = bpf_create_map()
4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel
5. mmap(addr, MAP_FIXED, map_fd)
after step 5 user spaces see the values it wrote at step 2 at the same addresses

arena doesn't support update_map_elem. Hence skeleton/libbpf do:
1. addr = malloc(sizeof SEC ".arena.1")
2. user space optionally modifies global vars
3. map_fd = bpf_create_map(MAP_TYPE_ARENA)
4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd)
5. memcpy(real_addr, addr) // this will fault-in and allocate pages

At the end look and feel of global data vs __arena global data is the same from
bpf prog pov.

Another complication is:
struct {
  __uint(type, BPF_MAP_TYPE_ARENA);
} arena SEC(".maps");

int __arena foo;
int bar;

  ptr1 = &foo;   // relocation against ".arena.1" section
  ptr2 = &arena; // relocation against ".maps" section
  ptr3 = &bar;   // relocation against ".bss" section

Fo the kernel ptr1 and ptr2 has point to the same arena's map_fd
while ptr3 points to a different global array's map_fd.
For the verifier:
ptr1->type == unknown_scalar
ptr2->type == const_ptr_to_map
ptr3->type == ptr_to_map_value

After verification, from JIT pov all 3 ptr-s are normal ld_imm64 insns.

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
According to a report, skeletons fail to assign shadow pointers when being
compiled with C++ programs. Unlike C doing implicit casting for void
pointers, C++ requires an explicit casting.

To support C++, we do explicit casting for each shadow pointer.

Also add struct_ops_module.skel.h to test_cpp to validate C++
compilation as part of BPF selftests.

Signed-off-by: Kui-Feng Lee <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Acked-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Current 'bpftool link' command does not show pids, e.g.,
  $ tools/build/bpftool/bpftool link
  ...
  4: tracing  prog 23
        prog_type lsm  attach_type lsm_mac
        target_obj_id 1  target_btf_id 31320

Hack the following change to enable normal libbpf debug output,
  --- a/tools/bpf/bpftool/pids.c
  +++ b/tools/bpf/bpftool/pids.c
  @@ -121,9 +121,9 @@ int build_obj_refs_table(struct hashmap **map, enum bpf_obj_type type)
          /* we don't want output polluted with libbpf errors if bpf_iter is not
           * supported
           */
  -       default_print = libbpf_set_print(libbpf_print_none);
  +       /* default_print = libbpf_set_print(libbpf_print_none); */
          err = pid_iter_bpf__load(skel);
  -       libbpf_set_print(default_print);
  +       /* libbpf_set_print(default_print); */

Rerun the above bpftool command:
  $ tools/build/bpftool/bpftool link
  libbpf: prog 'iter': BPF program load failed: Permission denied
  libbpf: prog 'iter': -- BEGIN PROG LOAD LOG --
  0: R1=ctx() R10=fp0
  ; struct task_struct *task = ctx->task; @ pid_iter.bpf.c:69
  0: (79) r6 = *(u64 *)(r1 +8)          ; R1=ctx() R6_w=ptr_or_null_task_struct(id=1)
  ; struct file *file = ctx->file; @ pid_iter.bpf.c:68
  ...
  ; struct bpf_link *link = (struct bpf_link *) file->private_data; @ pid_iter.bpf.c:103
  80: (79) r3 = *(u64 *)(r8 +432)       ; R3_w=scalar() R8=ptr_file()
  ; if (link->type == bpf_core_enum_value(enum bpf_link_type___local, @ pid_iter.bpf.c:105
  81: (61) r1 = *(u32 *)(r3 +12)
  R3 invalid mem access 'scalar'
  processed 39 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2
  -- END PROG LOAD LOG --
  libbpf: prog 'iter': failed to load: -13
  ...

The 'file->private_data' returns a 'void' type and this caused subsequent 'link->type'
(insn libbpf#81) failed in verification.

To fix the issue, restore the previous BPF_CORE_READ so old kernels can also work.
With this patch, the 'bpftool link' runs successfully with 'pids'.
  $ tools/build/bpftool/bpftool link
  ...
  4: tracing  prog 23
        prog_type lsm  attach_type lsm_mac
        target_obj_id 1  target_btf_id 31320
        pids systemd(1)

Fixes: 44ba7b30e84f ("bpftool: Use a local copy of BPF_LINK_TYPE_PERF_EVENT in pid_iter.bpf.c")
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Tested-by: Quentin Monnet <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Wire up BPF cookie for raw tracepoint programs (both BTF and non-BTF
aware variants). This brings them up to part w.r.t. BPF cookie usage
with classic tracepoint and fentry/fexit programs.

Acked-by: Stanislav Fomichev <[email protected]>
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
When trying to load the pid_iter BPF program used to iterate over the
PIDs of the processes holding file descriptors to BPF links, we would
unconditionally silence libbpf in order to keep the output clean if the
kernel does not support iterators and loading fails.

Although this is the desirable behaviour in most cases, this may hide
bugs in the pid_iter program that prevent it from loading, and it makes
it hard to debug such load failures, even in "debug" mode. Instead, it
makes more sense to print libbpf's logs when we pass the -d|--debug flag
to bpftool, so that users get the logs to investigate failures without
having to edit bpftool's source code.

Signed-off-by: Quentin Monnet <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Commit d510296d331a ("bpftool: Use syscall/loader program in "prog load"
and "gen skeleton" command.") added new files to the list of objects to
compile in order to build the bootstrap version of bpftool. As far as I
can tell, these objects are unnecessary and were added by mistake; maybe
a draft version intended to add support for loading loader programs from
the bootstrap version. Anyway, we can remove these object files from the
list to make the bootstrap bpftool binary a tad smaller and faster to
build.

Fixes: d510296d331a ("bpftool: Use syscall/loader program in "prog load" and "gen skeleton" command.")
Signed-off-by: Quentin Monnet <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Bpftool's Makefile uses $(HOST_CFLAGS) to build the bootstrap version of
bpftool, in order to pick the flags for the host (where we run the
bootstrap version) and not for the target system (where we plan to run
the full bpftool binary). But we pass too much information through this
variable.

In particular, we set HOST_CFLAGS by copying most of the $(CFLAGS); but
we do this after the feature detection for bpftool, which means that
$(CFLAGS), hence $(HOST_CFLAGS), contain all macro definitions for using
the different optional features. For example, -DHAVE_LLVM_SUPPORT may be
passed to the $(HOST_CFLAGS), even though the LLVM disassembler is not
used in the bootstrap version, and the related library may even be
missing for the host architecture.

A similar thing happens with the $(LDFLAGS), that we use unchanged for
linking the bootstrap version even though they may contains flags to
link against additional libraries.

To address the $(HOST_CFLAGS) issue, we move the definition of
$(HOST_CFLAGS) earlier in the Makefile, before the $(CFLAGS) update
resulting from the feature probing - none of which being relevant to the
bootstrap version. To clean up the $(LDFLAGS) for the bootstrap version,
we introduce a dedicated $(HOST_LDFLAGS) variable that we base on
$(LDFLAGS), before the feature probing as well.

On my setup, the following macro and libraries are removed from the
compiler invocation to build bpftool after this patch:

  -DUSE_LIBCAP
  -DHAVE_LLVM_SUPPORT
  -I/usr/lib/llvm-17/include
  -D_GNU_SOURCE
  -D__STDC_CONSTANT_MACROS
  -D__STDC_FORMAT_MACROS
  -D__STDC_LIMIT_MACROS
  -lLLVM-17
  -L/usr/lib/llvm-17/lib

Another advantage of cleaning up these flags is that displaying
available features with "bpftool version" becomes more accurate for the
bootstrap bpftool, and no longer reflects the features detected (and
available only) for the final binary.

Cc: Jean-Philippe Brucker <[email protected]>
Signed-off-by: Quentin Monnet <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
There is a difference between kernel uapi bpf.h and tools
uapi bpf.h. There is no functionality difference, but let
us sync properly to make it easy for later bpf.h update.

Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
The selftests use
to tell LLVM about special pointers. For LLVM there is nothing "arena"
about them. They are simply pointers in a different address space.
Hence LLVM diff llvm/llvm-project#85161 renamed:
. macro __BPF_FEATURE_ARENA_CAST -> __BPF_FEATURE_ADDR_SPACE_CAST
. global variables in __attribute__((address_space(N))) are now
  placed in section named ".addr_space.N" instead of ".arena.N".

Adjust libbpf, bpftool, and selftests to match LLVM.

Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Syncing latest bpftool commits from kernel repository.
Baseline bpf-next commit:   e63985ecd22681c7f5975f2e8637187a326b6791
Checkpoint bpf-next commit: 14bb1e8c8d4ad5d9d2febb7d19c70a3cf536e1e5
Baseline bpf commit:        2487007aa3b9fafbd2cb14068f49791ce1d7ede5
Checkpoint bpf commit:      443574b033876c85a35de4c65c14f7fe092222b2

Alexei Starovoitov (4):
  bpf: Introduce bpf_arena.
  bpf: Disasm support for addr_space_cast instruction.
  bpftool: Recognize arena map type
  libbpf, selftests/bpf: Adjust libbpf, bpftool, selftests to match LLVM

Andrii Nakryiko (3):
  bpftool: rename is_internal_mmapable_map into is_mmapable_map
  libbpf: Recognize __arena global variables.
  bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs

Kui-Feng Lee (1):
  bpftool: Cast pointers for shadow types explicitly.

Quentin Monnet (3):
  bpftool: Enable libbpf logs when loading pid_iter in debug mode
  bpftool: Remove unnecessary source files from bootstrap version
  bpftool: Clean up HOST_CFLAGS, HOST_LDFLAGS for bootstrap bpftool

Yonghong Song (2):
  bpftool: Fix missing pids during link show
  bpf: Sync uapi bpf.h to tools directory

 docs/bpftool-map.rst        |  2 +-
 include/uapi/linux/bpf.h    | 20 ++++++++++++++++++--
 src/Makefile                | 14 ++++++--------
 src/gen.c                   | 34 ++++++++++++++++++++++++----------
 src/kernel/bpf/disasm.c     | 10 ++++++++++
 src/map.c                   |  2 +-
 src/pids.c                  | 19 ++++++++++++-------
 src/skeleton/pid_iter.bpf.c |  4 ++--
 8 files changed, 74 insertions(+), 31 deletions(-)

Signed-off-by: Quentin Monnet <[email protected]>
A recent patch has touched some portions of bpftool's Makefile that
differ between kernel's and mirror's sources. Let's update the diff with
the expected differences accordingly, to smoothen future sync ups.

Signed-off-by: Quentin Monnet <[email protected]>
@qmonnet qmonnet merged commit 20ce693 into libbpf:main Mar 26, 2024
7 checks passed
@qmonnet qmonnet deleted the bpftool-sync-2024-03-26T09-47-27.457Z branch March 26, 2024 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants