-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bpf/optimized usdt ci #8664
Draft
olsajiri
wants to merge
1,351
commits into
kernel-patches:bpf-next_base
Choose a base branch
from
olsajiri:bpf/optimized_usdt_ci
base: bpf-next_base
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Bpf/optimized usdt ci #8664
olsajiri
wants to merge
1,351
commits into
kernel-patches:bpf-next_base
from
olsajiri:bpf/optimized_usdt_ci
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The pipe_occupancy() logic implicitly relied on the natural unsigned modulo arithmetic in C, but that doesn't work for the new 'pipe_index_t' case, since any arithmetic will be done in 'int' (and here we had also made it 'unsigned int' due to the function call boundary). So make the modulo arithmetic explicit by casting the result to the proper type. Cc: Oleg Nesterov <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Swapnil Sapkal <[email protected]> Cc: Alexey Gladkov <[email protected]> Cc: K Prateek Nayak <[email protected]> Link: https://lore.kernel.org/all/CAHk-=wjyHsGLx=rxg6PKYBNkPYAejgo7=CbyL3=HGLZLsAaJFQ@mail.gmail.com/ Fixes: 3d25216 ("fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex") Signed-off-by: Linus Torvalds <[email protected]>
Add htmldoc annotation for the newly introduced "head_tail" member describing it to be a union of the pipe_inode_info's @Head and @tail members. Reported-by: Stephen Rothwell <[email protected]> Closes: https://lore.kernel.org/lkml/[email protected]/ Fixes: 3d25216 ("fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex") Signed-off-by: K Prateek Nayak <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
Add the __counted_by() compiler attribute to the flexible array member buf to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. No functional changes intended. Signed-off-by: Thorsten Blum <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
num_gb_pipes was set to a wrong value using r420_pipe_config This have lead to HyperZ glitches on fast Z clearing. Closes: https://bugs.freedesktop.org/show_bug.cgi?id=110897 Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Richard Thier <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 044e59a85c4d84e3c8d004c486e5c479640563a6) Cc: [email protected]
always allow ih interrupt from fw on smu v14 based on the interface requirement Signed-off-by: Kenneth Feng <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit a3199eba46c54324193607d9114a1e321292d7a1) Cc: [email protected] # 6.12.x
While looking for incorrect users of the pipe head/tail fields (see commit c27c66a: "fs/pipe: Fix pipe_occupancy() with 16-bit indexes"), I found a bug in pipe_discard_from() that looked entirely broken. However, the fix is trivial: this buggy function isn't actually called by anything, so let's just remove it ASAP. Signed-off-by: Linus Torvalds <[email protected]>
…linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - power management fix in intel-thc-hid (Even Xu) - nintendo gencon mapping fix (Ryan McClelland) - fix for UAF on device diconnect path in hid-steam (Vicki Pfau) - two fixes for UAF on device disconnect path in intel-ish-hid (Zhang Lixu) - fix for potential NULL dereference in hid-appleir (Daniil Dulov) - few other small cosmetic fixes (e.g. typos) * tag 'hid-for-linus-2025030501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: Intel-thc-hid: Intel-quickspi: Correct device state after S4 HID: intel-thc-hid: Fix spelling mistake "intput" -> "input" HID: hid-steam: Fix use-after-free when detaching device HID: debug: Fix spelling mistake "Messanger" -> "Messenger" HID: appleir: Fix potential NULL dereference at raw event handle HID: apple: disable Fn key handling on the Omoton KB066 HID: i2c-hid: improve i2c_hid_get_report error message HID: intel-ish-hid: Fix use-after-free issue in ishtp_hid_remove() HID: intel-ish-hid: Fix use-after-free issue in hid_ishtp_cl_remove() HID: google: fix unused variable warning under !CONFIG_ACPI HID: nintendo: fix gencon button events map HID: corsair-void: Update power supply values with a unified work handler
The kernel_recvmsg() function returns an int which could be either negative error codes or the number of bytes received. The problem is that the condition: if (ret < sizeof(*icresp)) { is type promoted to type unsigned long and negative values are treated as high positive values which is success, when they should be treated as failure. Handle invalid positive returns separately from negative error codes to avoid this problem. Fixes: 578539e ("nvme-tcp: fix connect failure on receiving partial ICResp PDU") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Caleb Sander Mateos <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Keith Busch <[email protected]>
…S35L41 HDA Add support for ASUS G814PH/PM/PP and G814FH/FM/FP. Laptops use 2 CS35L41 Amps with HDA, using Internal boost, with I2C. Signed-off-by: Stefan Binding <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Link: https://patch.msgid.link/[email protected]
… CS35L41 HDA Add support for ASUS GA603KP, GA603KM and GA603KH. Laptops use 2 CS35L41 Amps with HDA, using Internal boost, with I2C Signed-off-by: Stefan Binding <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Link: https://patch.msgid.link/[email protected]
…CS35L41 HDA Add support for ASUS G614PH/PM/PP and G614FH/FM/FP. Laptops use 2 CS35L41 Amps with HDA, using Internal boost, with I2C Signed-off-by: Stefan Binding <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Link: https://patch.msgid.link/[email protected]
… HDA Add support for ASUS B3405CVA, B5405CVA, B5605CVA, B3605CVA. Laptops use 2 CS35L41 Amps with HDA, using Internal boost, with SPI Signed-off-by: Stefan Binding <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Link: https://patch.msgid.link/[email protected]
… CS35L41 HDA Add support for ASUS B3405CCA / P3405CCA, B3605CCA / P3605CCA, B3405CCA, B3605CCA. Laptops use 2 CS35L41 Amps with HDA, using Internal boost, with SPI Signed-off-by: Stefan Binding <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Link: https://patch.msgid.link/[email protected]
… CS35L41 HDA Add support for ASUS B5605CCA and B5405CCA. Laptops use 2 CS35L41 Amps with HDA, using Internal boost, with SPI Signed-off-by: Stefan Binding <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Link: https://patch.msgid.link/[email protected]
…g CS35L41 HDA Laptop uses 2 CS35L41 Amps with HDA, using External boost with I2C Signed-off-by: Stefan Binding <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Link: https://patch.msgid.link/[email protected]
If a userptr vma subject to prefetching was already invalidated or invalidated during the prefetch operation, the operation would repeatedly return -EAGAIN which would typically cause an infinite loop. Validate the userptr to ensure this doesn't happen. v2: - Don't fallthrough from UNMAP to PREFETCH (Matthew Brost) Fixes: 5bd24e7 ("drm/xe/vm: Subclass userptr vmas") Fixes: 617eebb ("drm/xe: Fix array of binds") Cc: Matthew Brost <[email protected]> Cc: <[email protected]> # v6.9+ Suggested-by: Matthew Brost <[email protected]> Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 03c346d4d0d85d210d549d43c8cfb3dfb7f20e0a) Signed-off-by: Rodrigo Vivi <[email protected]>
Fix a (harmless) misplaced #endif leading to declarations appearing multiple times. Fixes: 0eb2a18 ("drm/xe: Implement VM snapshot support for BO's and userptr") Cc: Maarten Lankhorst <[email protected]> Cc: José Roberto de Souza <[email protected]> Cc: <[email protected]> # v6.12+ Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Lucas De Marchi <[email protected]> Reviewed-by: Tejas Upadhyay <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit fcc20a4c752214b3e25632021c57d7d1d71ee1dd) Signed-off-by: Rodrigo Vivi <[email protected]>
Fix fault mode invalidation racing with unbind leading to the PTE zapping potentially traversing an invalid page-table tree. Do this by holding the notifier lock across PTE zapping. This might transfer any contention waiting on the notifier seqlock read side to the notifier lock read side, but that shouldn't be a major problem. At the same time get rid of the open-coded invalidation in the bind code by relying on the notifier even when the vma bind is not yet committed. Finally let userptr invalidation call a dedicated xe_vm function performing a full invalidation. Fixes: e8babb2 ("drm/xe: Convert multiple bind ops into single job") Cc: Thomas Hellström <[email protected]> Cc: Matthew Brost <[email protected]> Cc: Matthew Auld <[email protected]> Cc: <[email protected]> # v6.12+ Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 100a5b8dadfca50d91d9a4c9fc01431b42a25cab) Signed-off-by: Rodrigo Vivi <[email protected]>
Concurrent VM bind staging and zapping of PTEs from a userptr notifier do not work because the view of PTEs is not stable. VM binds cannot acquire the notifier lock during staging, as memory allocations are required. To resolve this race condition, use a staging tree for VM binds that is committed only under the userptr notifier lock during the final step of the bind. This ensures a consistent view of the PTEs in the userptr notifier. A follow up may only use staging for VM in fault mode as this is the only mode in which the above race exists. v3: - Drop zap PTE change (Thomas) - s/xe_pt_entry/xe_pt_entry_staging (Thomas) Suggested-by: Thomas Hellström <[email protected]> Cc: <[email protected]> Fixes: e8babb2 ("drm/xe: Convert multiple bind ops into single job") Fixes: a708f65 ("drm/xe: Update PT layer with better error handling") Signed-off-by: Matthew Brost <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Thomas Hellström <[email protected]> (cherry picked from commit 6f39b0c5ef0385eae586760d10b9767168037aa5) Signed-off-by: Rodrigo Vivi <[email protected]>
Add proper #ifndef around the xe_hmm.h header, proper spacing and since the documentation mostly follows kerneldoc format, make it kerneldoc. Also prepare for upcoming -stable fixes. Fixes: 81e058a ("drm/xe: Introduce helper to populate userptr") Cc: Oak Zeng <[email protected]> Cc: <[email protected]> # v6.10+ Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Acked-by: Matthew Brost <Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit bbe2b06b55bc061c8fcec034ed26e88287f39143) Signed-off-by: Rodrigo Vivi <[email protected]>
The pnfs that we obtain from hmm_range_fault() point to pages that we don't have a reference on, and the guarantee that they are still in the cpu page-tables is that the notifier lock must be held and the notifier seqno is still valid. So while building the sg table and marking the pages accesses / dirty we need to hold this lock with a validated seqno. However, the lock is reclaim tainted which makes sg_alloc_table_from_pages_segment() unusable, since it internally allocates memory. Instead build the sg-table manually. For the non-iommu case this might lead to fewer coalesces, but if that's a problem it can be fixed up later in the resource cursor code. For the iommu case, the whole sg-table may still be coalesced to a single contigous device va region. This avoids marking pages that we don't own dirty and accessed, and it also avoid dereferencing struct pages that we don't own. v2: - Use assert to check whether hmm pfns are valid (Matthew Auld) - Take into account that large pages may cross range boundaries (Matthew Auld) v3: - Don't unnecessarily check for a non-freed sg-table. (Matthew Auld) - Add a missing up_read() in an error path. (Matthew Auld) Fixes: 81e058a ("drm/xe: Introduce helper to populate userptr") Cc: Oak Zeng <[email protected]> Cc: <[email protected]> # v6.10+ Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Acked-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit ea3e66d280ce2576664a862693d1da8fd324c317) Signed-off-by: Rodrigo Vivi <[email protected]>
If userptr pages are freed after a call to the xe mmu notifier, the device will not be blocked out from theoretically accessing these pages unless they are also unmapped from the iommu, and this violates some aspects of the iommu-imposed security. Ensure that userptrs are unmapped in the mmu notifier to mitigate this. A naive attempt would try to free the sg table, but the sg table itself may be accessed by a concurrent bind operation, so settle for only unmapping. v3: - Update lockdep asserts. - Fix a typo (Matthew Auld) Fixes: 81e058a ("drm/xe: Introduce helper to populate userptr") Cc: Oak Zeng <[email protected]> Cc: Matthew Auld <[email protected]> Cc: <[email protected]> # v6.10+ Signed-off-by: Thomas Hellström <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Acked-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit ba767b9d01a2c552d76cf6f46b125d50ec4147a6) Signed-off-by: Rodrigo Vivi <[email protected]>
The IOPOLL path posts CQEs when the io_kiocb is marked as completed, so it cannot rely on the usual retry that non-IOPOLL requests do for read/write requests. If -EAGAIN is received and the request should be retried, go through the normal completion path and let the normal flush logic catch it and reissue it, like what is done for !IOPOLL reads or writes. Fixes: d803d12 ("io_uring/rw: handle -EAGAIN retry at IO completion time") Reported-by: John Garry <[email protected]> Link: https://lore.kernel.org/io-uring/[email protected]/ Signed-off-by: Jens Axboe <[email protected]>
There is not enough room in the 12-bit ASID address space to hand out broadcast ASIDs to every process. Only hand out broadcast ASIDs to processes when they are observed to be simultaneously running on 4 or more CPUs. This also allows single threaded process to continue using the cheaper, local TLB invalidation instructions like INVLPGB. Due to the structure of flush_tlb_mm_range(), the INVLPGB flushing is done in a generically named broadcast_tlb_flush() function which can later also be used for Intel RAR. Combined with the removal of unnecessary lru_add_drain calls() (see https://lore.kernel.org/r/20241219153253.3da9e8aa@fangorn) this results in a nice performance boost for the will-it-scale tlb_flush2_threads test on an AMD Milan system with 36 cores: - vanilla kernel: 527k loops/second - lru_add_drain removal: 731k loops/second - only INVLPGB: 527k loops/second - lru_add_drain + INVLPGB: 1157k loops/second Profiling with only the INVLPGB changes showed while TLB invalidation went down from 40% of the total CPU time to only around 4% of CPU time, the contention simply moved to the LRU lock. Fixing both at the same time about doubles the number of iterations per second from this case. Comparing will-it-scale tlb_flush2_threads with several different numbers of threads on a 72 CPU AMD Milan shows similar results. The number represents the total number of loops per second across all the threads: threads tip INVLPGB 1 315k 304k 2 423k 424k 4 644k 1032k 8 652k 1267k 16 737k 1368k 32 759k 1199k 64 636k 1094k 72 609k 993k 1 and 2 thread performance is similar with and without INVLPGB, because INVLPGB is only used on processes using 4 or more CPUs simultaneously. The number is the median across 5 runs. Some numbers closer to real world performance can be found at Phoronix, thanks to Michael: https://www.phoronix.com/news/AMD-INVLPGB-Linux-Benefits [ bp: - Massage - :%s/\<static_cpu_has\>/cpu_feature_enabled/cgi - :%s/\<clear_asid_transition\>/mm_clear_asid_transition/cgi - Fold in a 0day bot fix: https://lore.kernel.org/oe-kbuild-all/[email protected] ] Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Nadav Amit <[email protected]> Link: https://lore.kernel.org/r/[email protected]
With AMD TCE (translation cache extensions) only the intermediate mappings that cover the address range zapped by INVLPG / INVLPGB get invalidated, rather than all intermediate mappings getting zapped at every TLB invalidation. This can help reduce the TLB miss rate, by keeping more intermediate mappings in the cache. From the AMD manual: Translation Cache Extension (TCE) Bit. Bit 15, read/write. Setting this bit to 1 changes how the INVLPG, INVLPGB, and INVPCID instructions operate on TLB entries. When this bit is 0, these instructions remove the target PTE from the TLB as well as all upper-level table entries that are cached in the TLB, whether or not they are associated with the target PTE. When this bit is set, these instructions will remove the target PTE and only those upper-level entries that lead to the target PTE in the page table hierarchy, leaving unrelated upper-level entries intact. [ bp: use cpu_has()... I know, it is a mess. ] Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
When executing the INVLPGB instruction on a bare-metal host or hypervisor, if the ASID valid bit is not set, the instruction will flush the TLB entries that match the specified criteria for any ASID, not just the those of the host. If virtual machines are running on the system, this may result in inadvertent flushes of guest TLB entries. When executing the INVLPGB instruction in a guest and the INVLPGB instruction is not intercepted by the hypervisor, the hardware will replace the requested ASID with the guest ASID and set the ASID valid bit before doing the broadcast invalidation. Thus a guest is only able to flush its own TLB entries. So to limit the host TLB flushing reach, always set the ASID valid bit using an ASID value of 0 (which represents the host/hypervisor). This will will result in the desired effect in both host and guest. Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/20250304120449.GHZ8bsYYyEBOKQIxBm@fat_crate.local
Signed-off-by: Ingo Molnar <[email protected]>
Conflicts: arch/x86/kernel/cpu/amd.c Signed-off-by: Ingo Molnar <[email protected]>
On MMIO devices (e.g. MT7988 or EN7581) unicast traffic received on lanX port is flooded on all other user ports if the DSA switch is configured without VLAN support since PORT_MATRIX in PCR regs contains all user ports. Similar to MDIO devices (e.g. MT7530 and MT7531) fix the issue defining default VLAN-ID 0 for MT7530 MMIO devices. Fixes: 110c18b ("net: dsa: mt7530: introduce driver for MT7988 built-in switch") Signed-off-by: Lorenzo Bianconi <[email protected]> Reviewed-by: Chester A. Unal <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
This reverts commit a5c6bc5. The general approach described in commit e076eac ("selftests: break the dependency upon local header files") was taken one step too far here: it should not have been extended to include the syscall numbers. This is because doing so would require per-arch support in tools/include/uapi, and no such support exists. This revert fixes two separate reports of test failures, from Dave Hansen[1], and Li Wang[2]. An excerpt of Dave's report: Before this commit (a5c6bc5) things are fine. But after, I get: running PKEY tests for unsupported CPU/OS An excerpt of Li's report: I just found that mlock2_() return a wrong value in mlock2-test [1] https://lore.kernel.org/[email protected] [2] https://lore.kernel.org/CAEemH2eW=UMu9+turT2jRie7+6ewUazXmA6kL+VBo3cGDGU6RA@mail.gmail.com Link: https://lkml.kernel.org/r/[email protected] Fixes: a5c6bc5 ("selftests/mm: remove local __NR_* definitions") Signed-off-by: John Hubbard <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Li Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jeff Xu <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Muhammad Usama Anjum <[email protected]> Cc: Peter Xu <[email protected]> Cc: Rich Felker <[email protected]> Cc: Shuah Khan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
We are about to add uprobe trampoline, so cleaning up the namespace. Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Jiri Olsa <[email protected]>
Making copy_from_page global and adding uprobe prefix. Adding the uprobe prefix to copy_to_page as well for symmetry. Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Jiri Olsa <[email protected]>
The uprobe_write_opcode function currently updates also refctr offset if there's one defined for uprobe. This is not handy for following changes which needs to make several updates (writes) to install or remove uprobe, but update refctr offset just once. Adding set_swbp_refctr/set_orig_refctr which makes sure refctr offset is updated. Signed-off-by: Jiri Olsa <[email protected]>
Adding uprobe_write function that does what uprobe_write_opcode did so far, but allows to pass verify callback function that checks the memory location before writing the opcode. It will be used in following changes to simplify the checking logic. The uprobe_write_opcode now calls uprobe_write with verify_opcode as the verify callback. Signed-off-by: Jiri Olsa <[email protected]>
Adding nbytes argument to uprobe_write_opcode as preparation for writing whole instructions in following changes. Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Jiri Olsa <[email protected]>
The uprobe_write has special path to restore the original page when we write original instruction back. This happens when uprobe_write detects that we want to write anything else but breakpoint instruction. In following changes we want to use uprobe_write function for multiple updates, so adding new function argument to denote that this is the original instruction update. This way uprobe_write can make appropriate checks and restore the original page when possible. Signed-off-by: Jiri Olsa <[email protected]>
bd2fa12
to
fcaa810
Compare
Signed-off-by: Jiri Olsa <[email protected]>
Adding new uprobe syscall that calls uprobe handlers for given 'breakpoint' address. The idea is that the 'breakpoint' address calls the user space trampoline which executes the uprobe syscall. The syscall handler reads the return address of the initial call to retrieve the original 'breakpoint' address. With this address we find the related uprobe object and call its consumers. Adding the arch_uprobe_trampoline_mapping function that provides uprobe trampoline mapping. This mapping is backed with one global page initialized at __init time and shared by the all the mapping instances. We do not allow to execute uprobe syscall if the caller is not from uprobe trampoline mapping. Signed-off-by: Jiri Olsa <[email protected]>
Adding support to add special mapping for for user space trampoline with following functions: uprobe_trampoline_get - find or add related uprobe_trampoline uprobe_trampoline_put - remove ref or destroy uprobe_trampoline The user space trampoline is exported as architecture specific user space special mapping, which is provided by arch_uprobe_trampoline_mapping function. The uprobe trampoline needs to be callable/reachable from the probe address, so while searching for available address we use uprobe_is_callable function to decide if the uprobe trampoline is callable from the probe address. All uprobe_trampoline objects are stored in uprobes_state object and are cleaned up when the process mm_struct goes down. Adding new arch hooks for that, because this change is x86_64 specific. Locking is provided by callers in following changes. Signed-off-by: Jiri Olsa <[email protected]>
Adding support to emulate nop5 as the original uprobe instruction. Signed-off-by: Jiri Olsa <[email protected]>
Putting together all the previously added pieces to support optimized uprobes on top of 5-byte nop instruction. The current uprobe execution goes through following: - installs breakpoint instruction over original instruction - exception handler hit and calls related uprobe consumers - and either simulates original instruction or does out of line single step execution of it - returns to user space The optimized uprobe path - checks the original instruction is 5-byte nop (plus other checks) - adds (or uses existing) user space trampoline and overwrites original instruction (5-byte nop) with call to user space trampoline - the user space trampoline executes uprobe syscall that calls related uprobe consumers - trampoline returns back to next instruction This approach won't speed up all uprobes as it's limited to using nop5 as original instruction, but we could use nop5 as USDT probe instruction (which uses single byte nop ATM) and speed up the USDT probes. This patch overloads related arch functions in uprobe_write_opcode and set_orig_insn so they can install call instruction if needed. The arch_uprobe_optimize triggers the uprobe optimization and is called after first uprobe hit. I originally had it called on uprobe installation but then it clashed with elf loader, because the user space trampoline was added in a place where loader might need to put elf segments, so I decided to do it after first uprobe hit when loading is done. We do not unmap and release uprobe trampoline when it's no longer needed, because there's no easy way to make sure none of the threads is still inside the trampoline. But we do not waste memory, because there's just single page for all the uprobe trampoline mappings. We do waste frmae on page mapping for every 4GB by keeping the uprobe trampoline page mapped, but that seems ok. Attaching the speed up from benchs/run_bench_uprobes.sh script: current: usermode-count : 818.836 ± 2.842M/s syscall-count : 8.917 ± 0.003M/s uprobe-nop : 3.056 ± 0.013M/s uprobe-push : 2.903 ± 0.002M/s uprobe-ret : 1.533 ± 0.001M/s --> uprobe-nop5 : 1.492 ± 0.000M/s uretprobe-nop : 1.783 ± 0.000M/s uretprobe-push : 1.672 ± 0.001M/s uretprobe-ret : 1.067 ± 0.002M/s --> uretprobe-nop5 : 1.052 ± 0.000M/s after the change: usermode-count : 818.386 ± 1.886M/s syscall-count : 8.923 ± 0.003M/s uprobe-nop : 3.086 ± 0.005M/s uprobe-push : 2.751 ± 0.001M/s uprobe-ret : 1.481 ± 0.000M/s --> uprobe-nop5 : 4.016 ± 0.002M/s uretprobe-nop : 1.712 ± 0.008M/s uretprobe-push : 1.616 ± 0.001M/s uretprobe-ret : 1.052 ± 0.000M/s --> uretprobe-nop5 : 2.015 ± 0.000M/s Signed-off-by: Jiri Olsa <[email protected]>
Adding __test_uprobe_syscall with non x86_64 stub to execute all the tests, so we don't need to keep adding non x86_64 stub functions for new tests. Signed-off-by: Jiri Olsa <[email protected]>
Using 5-byte nop for x86 usdt probes so we can switch to optimized uprobe them. Signed-off-by: Jiri Olsa <[email protected]>
Adding tests for optimized uprobe/usdt probes. Checking that we get expected trampoline and attached bpf programs get executed properly. Signed-off-by: Jiri Olsa <[email protected]>
Adding test that makes sure parallel execution of the uprobe and attach/detach of optimized uprobe on it works properly. Signed-off-by: Jiri Olsa <[email protected]>
Make sure that calling uprobe syscall from outside uprobe trampoline results in sigill signal. Signed-off-by: Jiri Olsa <[email protected]>
Adding optimized usdt variant for basic usdt test to check that usdt arguments are properly passed in optimized code path. Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Add 5-byte nop uprobe trigger bench (x86_64 specific) to measure uprobes/uretprobes on top of nop5 instruction. Signed-off-by: Jiri Olsa <[email protected]>
…robe Signed-off-by: Jiri Olsa <[email protected]>
Pass uprobe systemcall through seccomp without depending on configuration. Note: uprobe is currently only x86_64 and isn't expected to ever be supported in i386. Signed-off-by: Jiri Olsa <[email protected]>
UPROBE.not_attached.uprobe_default_allow UPROBE.not_attached.uprobe_default_block UPROBE.not_attached.uprobe_block_syscall UPROBE.not_attached.uprobe_default_block_with_syscall UPROBE.uprobe_attached.uprobe_default_allow UPROBE.uprobe_attached.uprobe_default_block UPROBE.uprobe_attached.uprobe_block_syscall UPROBE.uprobe_attached.uprobe_default_block_with_syscall UPROBE.uretprobe_attached.uprobe_default_allow UPROBE.uretprobe_attached.uprobe_default_block UPROBE.uretprobe_attached.uprobe_block_syscall UPROBE.uretprobe_attached.uprobe_default_block_with_syscall Signed-off-by: Jiri Olsa <[email protected]>
fcaa810
to
b836e48
Compare
44c3a1d
to
720c696
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.