Bpf/optimized usdt ci #8664

olsajiri · 2025-03-12T09:42:49Z

No description provided.

@Head

Add htmldoc annotation for the newly introduced "head_tail" member describing it to be a union of the pipe_inode_info's @Head and @tail members. Reported-by: Stephen Rothwell <[email protected]> Closes: https://lore.kernel.org/lkml/[email protected]/ Fixes: 3d25216 ("fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex") Signed-off-by: K Prateek Nayak <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

Add the __counted_by() compiler attribute to the flexible array member buf to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. No functional changes intended. Signed-off-by: Thorsten Blum <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]

num_gb_pipes was set to a wrong value using r420_pipe_config This have lead to HyperZ glitches on fast Z clearing. Closes: https://bugs.freedesktop.org/show_bug.cgi?id=110897 Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Richard Thier <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 044e59a) Cc: [email protected]

always allow ih interrupt from fw on smu v14 based on the interface requirement Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit a3199eb) Cc: [email protected] # 6.12.x

While looking for incorrect users of the pipe head/tail fields (see commit c27c66a: "fs/pipe: Fix pipe_occupancy() with 16-bit indexes"), I found a bug in pipe_discard_from() that looked entirely broken. However, the fix is trivial: this buggy function isn't actually called by anything, so let's just remove it ASAP. Signed-off-by: Linus Torvalds <[email protected]>

…linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - power management fix in intel-thc-hid (Even Xu) - nintendo gencon mapping fix (Ryan McClelland) - fix for UAF on device diconnect path in hid-steam (Vicki Pfau) - two fixes for UAF on device disconnect path in intel-ish-hid (Zhang Lixu) - fix for potential NULL dereference in hid-appleir (Daniil Dulov) - few other small cosmetic fixes (e.g. typos) * tag 'hid-for-linus-2025030501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: Intel-thc-hid: Intel-quickspi: Correct device state after S4 HID: intel-thc-hid: Fix spelling mistake "intput" -> "input" HID: hid-steam: Fix use-after-free when detaching device HID: debug: Fix spelling mistake "Messanger" -> "Messenger" HID: appleir: Fix potential NULL dereference at raw event handle HID: apple: disable Fn key handling on the Omoton KB066 HID: i2c-hid: improve i2c_hid_get_report error message HID: intel-ish-hid: Fix use-after-free issue in ishtp_hid_remove() HID: intel-ish-hid: Fix use-after-free issue in hid_ishtp_cl_remove() HID: google: fix unused variable warning under !CONFIG_ACPI HID: nintendo: fix gencon button events map HID: corsair-void: Update power supply values with a unified work handler

The kernel_recvmsg() function returns an int which could be either negative error codes or the number of bytes received. The problem is that the condition: if (ret < sizeof(*icresp)) { is type promoted to type unsigned long and negative values are treated as high positive values which is success, when they should be treated as failure. Handle invalid positive returns separately from negative error codes to avoid this problem. Fixes: 578539e ("nvme-tcp: fix connect failure on receiving partial ICResp PDU") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Caleb Sander Mateos <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Keith Busch <[email protected]>

…S35L41 HDA Add support for ASUS G814PH/PM/PP and G814FH/FM/FP. Laptops use 2 CS35L41 Amps with HDA, using Internal boost, with I2C. Signed-off-by: Stefan Binding <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Link: https://patch.msgid.link/[email protected]

… CS35L41 HDA Add support for ASUS GA603KP, GA603KM and GA603KH. Laptops use 2 CS35L41 Amps with HDA, using Internal boost, with I2C Signed-off-by: Stefan Binding <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Link: https://patch.msgid.link/[email protected]

…CS35L41 HDA Add support for ASUS G614PH/PM/PP and G614FH/FM/FP. Laptops use 2 CS35L41 Amps with HDA, using Internal boost, with I2C Signed-off-by: Stefan Binding <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Link: https://patch.msgid.link/[email protected]

… HDA Add support for ASUS B3405CVA, B5405CVA, B5605CVA, B3605CVA. Laptops use 2 CS35L41 Amps with HDA, using Internal boost, with SPI Signed-off-by: Stefan Binding <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Link: https://patch.msgid.link/[email protected]

… CS35L41 HDA Add support for ASUS B3405CCA / P3405CCA, B3605CCA / P3605CCA, B3405CCA, B3605CCA. Laptops use 2 CS35L41 Amps with HDA, using Internal boost, with SPI Signed-off-by: Stefan Binding <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Link: https://patch.msgid.link/[email protected]

… CS35L41 HDA Add support for ASUS B5605CCA and B5405CCA. Laptops use 2 CS35L41 Amps with HDA, using Internal boost, with SPI Signed-off-by: Stefan Binding <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Link: https://patch.msgid.link/[email protected]

…g CS35L41 HDA Laptop uses 2 CS35L41 Amps with HDA, using External boost with I2C Signed-off-by: Stefan Binding <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Link: https://patch.msgid.link/[email protected]

If a userptr vma subject to prefetching was already invalidated or invalidated during the prefetch operation, the operation would repeatedly return -EAGAIN which would typically cause an infinite loop. Validate the userptr to ensure this doesn't happen. v2: - Don't fallthrough from UNMAP to PREFETCH (Matthew Brost) Fixes: 5bd24e7 ("drm/xe/vm: Subclass userptr vmas") Fixes: 617eebb ("drm/xe: Fix array of binds") Cc: Matthew Brost <[email protected]> Cc: <[email protected]> # v6.9+ Suggested-by: Matthew Brost <[email protected]> Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 03c346d) Signed-off-by: Rodrigo Vivi <[email protected]>

Fix a (harmless) misplaced #endif leading to declarations appearing multiple times. Fixes: 0eb2a18 ("drm/xe: Implement VM snapshot support for BO's and userptr") Cc: Maarten Lankhorst <[email protected]> Cc: José Roberto de Souza <[email protected]> Cc: <[email protected]> # v6.12+ Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Lucas De Marchi <[email protected]> Reviewed-by: Tejas Upadhyay <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit fcc20a4) Signed-off-by: Rodrigo Vivi <[email protected]>

Fix fault mode invalidation racing with unbind leading to the PTE zapping potentially traversing an invalid page-table tree. Do this by holding the notifier lock across PTE zapping. This might transfer any contention waiting on the notifier seqlock read side to the notifier lock read side, but that shouldn't be a major problem. At the same time get rid of the open-coded invalidation in the bind code by relying on the notifier even when the vma bind is not yet committed. Finally let userptr invalidation call a dedicated xe_vm function performing a full invalidation. Fixes: e8babb2 ("drm/xe: Convert multiple bind ops into single job") Cc: Thomas Hellström <[email protected]> Cc: Matthew Brost <[email protected]> Cc: Matthew Auld <[email protected]> Cc: <[email protected]> # v6.12+ Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 100a5b8) Signed-off-by: Rodrigo Vivi <[email protected]>

Concurrent VM bind staging and zapping of PTEs from a userptr notifier do not work because the view of PTEs is not stable. VM binds cannot acquire the notifier lock during staging, as memory allocations are required. To resolve this race condition, use a staging tree for VM binds that is committed only under the userptr notifier lock during the final step of the bind. This ensures a consistent view of the PTEs in the userptr notifier. A follow up may only use staging for VM in fault mode as this is the only mode in which the above race exists. v3: - Drop zap PTE change (Thomas) - s/xe_pt_entry/xe_pt_entry_staging (Thomas) Suggested-by: Thomas Hellström <[email protected]> Cc: <[email protected]> Fixes: e8babb2 ("drm/xe: Convert multiple bind ops into single job") Fixes: a708f65 ("drm/xe: Update PT layer with better error handling") Signed-off-by: Matthew Brost <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Thomas Hellström <[email protected]> (cherry picked from commit 6f39b0c) Signed-off-by: Rodrigo Vivi <[email protected]>

Add proper #ifndef around the xe_hmm.h header, proper spacing and since the documentation mostly follows kerneldoc format, make it kerneldoc. Also prepare for upcoming -stable fixes. Fixes: 81e058a ("drm/xe: Introduce helper to populate userptr") Cc: Oak Zeng <[email protected]> Cc: <[email protected]> # v6.10+ Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Acked-by: Matthew Brost <Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit bbe2b06) Signed-off-by: Rodrigo Vivi <[email protected]>

The pnfs that we obtain from hmm_range_fault() point to pages that we don't have a reference on, and the guarantee that they are still in the cpu page-tables is that the notifier lock must be held and the notifier seqno is still valid. So while building the sg table and marking the pages accesses / dirty we need to hold this lock with a validated seqno. However, the lock is reclaim tainted which makes sg_alloc_table_from_pages_segment() unusable, since it internally allocates memory. Instead build the sg-table manually. For the non-iommu case this might lead to fewer coalesces, but if that's a problem it can be fixed up later in the resource cursor code. For the iommu case, the whole sg-table may still be coalesced to a single contigous device va region. This avoids marking pages that we don't own dirty and accessed, and it also avoid dereferencing struct pages that we don't own. v2: - Use assert to check whether hmm pfns are valid (Matthew Auld) - Take into account that large pages may cross range boundaries (Matthew Auld) v3: - Don't unnecessarily check for a non-freed sg-table. (Matthew Auld) - Add a missing up_read() in an error path. (Matthew Auld) Fixes: 81e058a ("drm/xe: Introduce helper to populate userptr") Cc: Oak Zeng <[email protected]> Cc: <[email protected]> # v6.10+ Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Acked-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit ea3e66d) Signed-off-by: Rodrigo Vivi <[email protected]>

If userptr pages are freed after a call to the xe mmu notifier, the device will not be blocked out from theoretically accessing these pages unless they are also unmapped from the iommu, and this violates some aspects of the iommu-imposed security. Ensure that userptrs are unmapped in the mmu notifier to mitigate this. A naive attempt would try to free the sg table, but the sg table itself may be accessed by a concurrent bind operation, so settle for only unmapping. v3: - Update lockdep asserts. - Fix a typo (Matthew Auld) Fixes: 81e058a ("drm/xe: Introduce helper to populate userptr") Cc: Oak Zeng <[email protected]> Cc: Matthew Auld <[email protected]> Cc: <[email protected]> # v6.10+ Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Acked-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit ba767b9) Signed-off-by: Rodrigo Vivi <[email protected]>

The IOPOLL path posts CQEs when the io_kiocb is marked as completed, so it cannot rely on the usual retry that non-IOPOLL requests do for read/write requests. If -EAGAIN is received and the request should be retried, go through the normal completion path and let the normal flush logic catch it and reissue it, like what is done for !IOPOLL reads or writes. Fixes: d803d12 ("io_uring/rw: handle -EAGAIN retry at IO completion time") Reported-by: John Garry <[email protected]> Link: https://lore.kernel.org/io-uring/[email protected]/ Signed-off-by: Jens Axboe <[email protected]>

There is not enough room in the 12-bit ASID address space to hand out broadcast ASIDs to every process. Only hand out broadcast ASIDs to processes when they are observed to be simultaneously running on 4 or more CPUs. This also allows single threaded process to continue using the cheaper, local TLB invalidation instructions like INVLPGB. Due to the structure of flush_tlb_mm_range(), the INVLPGB flushing is done in a generically named broadcast_tlb_flush() function which can later also be used for Intel RAR. Combined with the removal of unnecessary lru_add_drain calls() (see https://lore.kernel.org/r/20241219153253.3da9e8aa@fangorn) this results in a nice performance boost for the will-it-scale tlb_flush2_threads test on an AMD Milan system with 36 cores: - vanilla kernel: 527k loops/second - lru_add_drain removal: 731k loops/second - only INVLPGB: 527k loops/second - lru_add_drain + INVLPGB: 1157k loops/second Profiling with only the INVLPGB changes showed while TLB invalidation went down from 40% of the total CPU time to only around 4% of CPU time, the contention simply moved to the LRU lock. Fixing both at the same time about doubles the number of iterations per second from this case. Comparing will-it-scale tlb_flush2_threads with several different numbers of threads on a 72 CPU AMD Milan shows similar results. The number represents the total number of loops per second across all the threads: threads tip INVLPGB 1 315k 304k 2 423k 424k 4 644k 1032k 8 652k 1267k 16 737k 1368k 32 759k 1199k 64 636k 1094k 72 609k 993k 1 and 2 thread performance is similar with and without INVLPGB, because INVLPGB is only used on processes using 4 or more CPUs simultaneously. The number is the median across 5 runs. Some numbers closer to real world performance can be found at Phoronix, thanks to Michael: https://www.phoronix.com/news/AMD-INVLPGB-Linux-Benefits [ bp: - Massage - :%s/\<static_cpu_has\>/cpu_feature_enabled/cgi - :%s/\<clear_asid_transition\>/mm_clear_asid_transition/cgi - Fold in a 0day bot fix: https://lore.kernel.org/oe-kbuild-all/[email protected] ] Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nadav Amit <[email protected]> Link: https://lore.kernel.org/r/[email protected]

With AMD TCE (translation cache extensions) only the intermediate mappings that cover the address range zapped by INVLPG / INVLPGB get invalidated, rather than all intermediate mappings getting zapped at every TLB invalidation. This can help reduce the TLB miss rate, by keeping more intermediate mappings in the cache. From the AMD manual: Translation Cache Extension (TCE) Bit. Bit 15, read/write. Setting this bit to 1 changes how the INVLPG, INVLPGB, and INVPCID instructions operate on TLB entries. When this bit is 0, these instructions remove the target PTE from the TLB as well as all upper-level table entries that are cached in the TLB, whether or not they are associated with the target PTE. When this bit is set, these instructions will remove the target PTE and only those upper-level entries that lead to the target PTE in the page table hierarchy, leaving unrelated upper-level entries intact. [ bp: use cpu_has()... I know, it is a mess. ] Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]

When executing the INVLPGB instruction on a bare-metal host or hypervisor, if the ASID valid bit is not set, the instruction will flush the TLB entries that match the specified criteria for any ASID, not just the those of the host. If virtual machines are running on the system, this may result in inadvertent flushes of guest TLB entries. When executing the INVLPGB instruction in a guest and the INVLPGB instruction is not intercepted by the hypervisor, the hardware will replace the requested ASID with the guest ASID and set the ASID valid bit before doing the broadcast invalidation. Thus a guest is only able to flush its own TLB entries. So to limit the host TLB flushing reach, always set the ASID valid bit using an ASID value of 0 (which represents the host/hypervisor). This will will result in the desired effect in both host and guest. Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/20250304120449.GHZ8bsYYyEBOKQIxBm@fat_crate.local

Signed-off-by: Ingo Molnar <[email protected]>

Conflicts: arch/x86/kernel/cpu/amd.c Signed-off-by: Ingo Molnar <[email protected]>

On MMIO devices (e.g. MT7988 or EN7581) unicast traffic received on lanX port is flooded on all other user ports if the DSA switch is configured without VLAN support since PORT_MATRIX in PCR regs contains all user ports. Similar to MDIO devices (e.g. MT7530 and MT7531) fix the issue defining default VLAN-ID 0 for MT7530 MMIO devices. Fixes: 110c18b ("net: dsa: mt7530: introduce driver for MT7988 built-in switch") Signed-off-by: Lorenzo Bianconi <[email protected]> Reviewed-by: Chester A. Unal <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

This reverts commit a5c6bc5. The general approach described in commit e076eac ("selftests: break the dependency upon local header files") was taken one step too far here: it should not have been extended to include the syscall numbers. This is because doing so would require per-arch support in tools/include/uapi, and no such support exists. This revert fixes two separate reports of test failures, from Dave Hansen[1], and Li Wang[2]. An excerpt of Dave's report: Before this commit (a5c6bc5) things are fine. But after, I get: running PKEY tests for unsupported CPU/OS An excerpt of Li's report: I just found that mlock2_() return a wrong value in mlock2-test [1] https://lore.kernel.org/[email protected] [2] https://lore.kernel.org/CAEemH2eW=UMu9+turT2jRie7+6ewUazXmA6kL+VBo3cGDGU6RA@mail.gmail.com Link: https://lkml.kernel.org/r/[email protected] Fixes: a5c6bc5 ("selftests/mm: remove local __NR_* definitions") Signed-off-by: John Hubbard <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Li Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jeff Xu <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Muhammad Usama Anjum <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rich Felker <[email protected]> Cc: Shuah Khan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

… further reduced damos_quota_goal.py selftest see if DAMOS quota goals tuning feature increases or reduces the effective size quota for given score as expected. The tuning feature sets the minimum quota size as one byte, so if the effective size quota is already one, we cannot expect it further be reduced. However the test is not aware of the edge case, and fails since it shown no expected change of the effective quota. Handle the case by updating the failure logic for no change to see if it was the case, and simply skips to next test input. Link: https://lkml.kernel.org/r/[email protected] Fixes: f1c07c0 ("selftests/damon: add a test for DAMOS quota goal") Signed-off-by: SeongJae Park <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Cc: Shuah Khan <[email protected]> Cc: <[email protected]> [6.10.x] Signed-off-by: Andrew Morton <[email protected]>

Signed-off-by: Jiri Olsa <[email protected]>

Adding new uprobe syscall that calls uprobe handlers for given 'breakpoint' address. The idea is that the 'breakpoint' address calls the user space trampoline which executes the uprobe syscall. The syscall handler reads the return address of the initial call to retrieve the original 'breakpoint' address. With this address we find the related uprobe object and call its consumers. Adding the arch_uprobe_trampoline_mapping function that provides uprobe trampoline mapping. This mapping is backed with one global page initialized at __init time and shared by the all the mapping instances. We do not allow to execute uprobe syscall if the caller is not from uprobe trampoline mapping. Signed-off-by: Jiri Olsa <[email protected]>

Adding support to add special mapping for for user space trampoline with following functions: uprobe_trampoline_get - find or add related uprobe_trampoline uprobe_trampoline_put - remove ref or destroy uprobe_trampoline The user space trampoline is exported as architecture specific user space special mapping, which is provided by arch_uprobe_trampoline_mapping function. The uprobe trampoline needs to be callable/reachable from the probe address, so while searching for available address we use uprobe_is_callable function to decide if the uprobe trampoline is callable from the probe address. All uprobe_trampoline objects are stored in uprobes_state object and are cleaned up when the process mm_struct goes down. Adding new arch hooks for that, because this change is x86_64 specific. Locking is provided by callers in following changes. Signed-off-by: Jiri Olsa <[email protected]>

Adding support to emulate nop5 as the original uprobe instruction. Signed-off-by: Jiri Olsa <[email protected]>

Putting together all the previously added pieces to support optimized uprobes on top of 5-byte nop instruction. The current uprobe execution goes through following: - installs breakpoint instruction over original instruction - exception handler hit and calls related uprobe consumers - and either simulates original instruction or does out of line single step execution of it - returns to user space The optimized uprobe path - checks the original instruction is 5-byte nop (plus other checks) - adds (or uses existing) user space trampoline and overwrites original instruction (5-byte nop) with call to user space trampoline - the user space trampoline executes uprobe syscall that calls related uprobe consumers - trampoline returns back to next instruction This approach won't speed up all uprobes as it's limited to using nop5 as original instruction, but we could use nop5 as USDT probe instruction (which uses single byte nop ATM) and speed up the USDT probes. This patch overloads related arch functions in uprobe_write_opcode and set_orig_insn so they can install call instruction if needed. The arch_uprobe_optimize triggers the uprobe optimization and is called after first uprobe hit. I originally had it called on uprobe installation but then it clashed with elf loader, because the user space trampoline was added in a place where loader might need to put elf segments, so I decided to do it after first uprobe hit when loading is done. We do not unmap and release uprobe trampoline when it's no longer needed, because there's no easy way to make sure none of the threads is still inside the trampoline. But we do not waste memory, because there's just single page for all the uprobe trampoline mappings. We do waste frmae on page mapping for every 4GB by keeping the uprobe trampoline page mapped, but that seems ok. Attaching the speed up from benchs/run_bench_uprobes.sh script: current: usermode-count : 818.836 ± 2.842M/s syscall-count : 8.917 ± 0.003M/s uprobe-nop : 3.056 ± 0.013M/s uprobe-push : 2.903 ± 0.002M/s uprobe-ret : 1.533 ± 0.001M/s --> uprobe-nop5 : 1.492 ± 0.000M/s uretprobe-nop : 1.783 ± 0.000M/s uretprobe-push : 1.672 ± 0.001M/s uretprobe-ret : 1.067 ± 0.002M/s --> uretprobe-nop5 : 1.052 ± 0.000M/s after the change: usermode-count : 818.386 ± 1.886M/s syscall-count : 8.923 ± 0.003M/s uprobe-nop : 3.086 ± 0.005M/s uprobe-push : 2.751 ± 0.001M/s uprobe-ret : 1.481 ± 0.000M/s --> uprobe-nop5 : 4.016 ± 0.002M/s uretprobe-nop : 1.712 ± 0.008M/s uretprobe-push : 1.616 ± 0.001M/s uretprobe-ret : 1.052 ± 0.000M/s --> uretprobe-nop5 : 2.015 ± 0.000M/s Signed-off-by: Jiri Olsa <[email protected]>

Adding __test_uprobe_syscall with non x86_64 stub to execute all the tests, so we don't need to keep adding non x86_64 stub functions for new tests. Signed-off-by: Jiri Olsa <[email protected]>

…multi Renaming uprobe_syscall_executed prog to test_uretprobe_multi to fit properly in the following changes that add more programs. Signed-off-by: Jiri Olsa <[email protected]>

Using 5-byte nop for x86 usdt probes so we can switch to optimized uprobe them. Signed-off-by: Jiri Olsa <[email protected]>

Adding tests for optimized uprobe/usdt probes. Checking that we get expected trampoline and attached bpf programs get executed properly. Signed-off-by: Jiri Olsa <[email protected]>

Adding test that makes sure parallel execution of the uprobe and attach/detach of optimized uprobe on it works properly. Signed-off-by: Jiri Olsa <[email protected]>

Make sure that calling uprobe syscall from outside uprobe trampoline results in sigill signal. Signed-off-by: Jiri Olsa <[email protected]>

Adding optimized usdt variant for basic usdt test to check that usdt arguments are properly passed in optimized code path. Signed-off-by: Jiri Olsa <[email protected]>

Signed-off-by: Jiri Olsa <[email protected]>

Add 5-byte nop uprobe trigger bench (x86_64 specific) to measure uprobes/uretprobes on top of nop5 instruction. Signed-off-by: Jiri Olsa <[email protected]>

…robe Signed-off-by: Jiri Olsa <[email protected]>

Pass uprobe systemcall through seccomp without depending on configuration. Note: uprobe is currently only x86_64 and isn't expected to ever be supported in i386. Signed-off-by: Jiri Olsa <[email protected]>

UPROBE.not_attached.uprobe_default_allow UPROBE.not_attached.uprobe_default_block UPROBE.not_attached.uprobe_block_syscall UPROBE.not_attached.uprobe_default_block_with_syscall UPROBE.uprobe_attached.uprobe_default_allow UPROBE.uprobe_attached.uprobe_default_block UPROBE.uprobe_attached.uprobe_block_syscall UPROBE.uprobe_attached.uprobe_default_block_with_syscall UPROBE.uretprobe_attached.uprobe_default_allow UPROBE.uretprobe_attached.uprobe_default_block UPROBE.uretprobe_attached.uprobe_block_syscall UPROBE.uretprobe_attached.uprobe_default_block_with_syscall Signed-off-by: Jiri Olsa <[email protected]>

kudureranganath and others added 30 commits March 5, 2025 07:17

Merge branch 'x86/cpu' into x86/merge, to ease integration testing

9dd0aac

Signed-off-by: Ingo Molnar <[email protected]>

Merge branch 'x86/mm' into x86/merge, to resolve conflict

f06709e

Conflicts: arch/x86/kernel/cpu/amd.c Signed-off-by: Ingo Molnar <[email protected]>

olsajiri force-pushed the bpf/optimized_usdt_ci branch 3 times, most recently from bd2fa12 to fcaa810 Compare March 12, 2025 11:42

olsajiri added 6 commits March 12, 2025 15:36

uprobes: Do breakpoint removal under mmap_write_lock

a6837d9

Signed-off-by: Jiri Olsa <[email protected]>

uprobes/x86: Add support to emulate nop5 instruction

d299d67

Adding support to emulate nop5 as the original uprobe instruction. Signed-off-by: Jiri Olsa <[email protected]>

selftests/bpf: Reorg the uprobe_syscall test function

f68bd23

Adding __test_uprobe_syscall with non x86_64 stub to execute all the tests, so we don't need to keep adding non x86_64 stub functions for new tests. Signed-off-by: Jiri Olsa <[email protected]>

olsajiri force-pushed the bpf/optimized_usdt_ci branch from fcaa810 to b836e48 Compare March 12, 2025 14:36

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 2 times, most recently from 44c3a1d to 720c696 Compare March 12, 2025 23:33

olsajiri added 11 commits March 13, 2025 09:56

selftests/bpf: Rename uprobe_syscall_executed prog to test_uretprobe_…

cff4de3

…multi Renaming uprobe_syscall_executed prog to test_uretprobe_multi to fit properly in the following changes that add more programs. Signed-off-by: Jiri Olsa <[email protected]>

selftests/bpf: Use 5-byte nop for x86 usdt probes

ff11a6d

Using 5-byte nop for x86 usdt probes so we can switch to optimized uprobe them. Signed-off-by: Jiri Olsa <[email protected]>

selftests/bpf: Add uprobe/usdt syscall tests

f6df2bb

Adding tests for optimized uprobe/usdt probes. Checking that we get expected trampoline and attached bpf programs get executed properly. Signed-off-by: Jiri Olsa <[email protected]>

selftests/bpf: Add hit/attach/detach race optimized uprobe test

36d3b64

Adding test that makes sure parallel execution of the uprobe and attach/detach of optimized uprobe on it works properly. Signed-off-by: Jiri Olsa <[email protected]>

selftests/bpf: Add uprobe syscall sigill signal test

223a6ac

Make sure that calling uprobe syscall from outside uprobe trampoline results in sigill signal. Signed-off-by: Jiri Olsa <[email protected]>

selftests/bpf: Add optimized usdt variant for basic usdt test

1fe214a

Adding optimized usdt variant for basic usdt test to check that usdt arguments are properly passed in optimized code path. Signed-off-by: Jiri Olsa <[email protected]>

selftests/bpf: Add uprobe_regs_equal test

65d8fc5

Signed-off-by: Jiri Olsa <[email protected]>

selftests/bpf: Add 5-byte nop uprobe trigger bench

d360c61

Add 5-byte nop uprobe trigger bench (x86_64 specific) to measure uprobes/uretprobes on top of nop5 instruction. Signed-off-by: Jiri Olsa <[email protected]>

selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretp…

cb5df36

…robe Signed-off-by: Jiri Olsa <[email protected]>

seccomp: passthrough uprobe systemcall without filtering

36e694b

Pass uprobe systemcall through seccomp without depending on configuration. Note: uprobe is currently only x86_64 and isn't expected to ever be supported in i386. Signed-off-by: Jiri Olsa <[email protected]>

olsajiri force-pushed the bpf/optimized_usdt_ci branch from b836e48 to 0c5d0ca Compare March 13, 2025 09:45

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 3 times, most recently from 7879197 to dc8d1d6 Compare March 13, 2025 23:58

fix

3573018

olsajiri closed this Mar 14, 2025

olsajiri deleted the bpf/optimized_usdt_ci branch March 14, 2025 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bpf/optimized usdt ci #8664

Bpf/optimized usdt ci #8664

Uh oh!

olsajiri commented Mar 12, 2025

Uh oh!

Uh oh!

Bpf/optimized usdt ci #8664

Bpf/optimized usdt ci #8664

Uh oh!

Conversation

olsajiri commented Mar 12, 2025

Uh oh!

Uh oh!