-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernelCTF: added CVE-2024-1086 lts mitigation #96
Merged
Merged
Changes from 16 commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
6bbf4c3
kernelCTF: added CVE-2024-1086 lts mitigation
Notselwyn 4679759
fix: musl-tools added
Notselwyn 2247e38
fix: trying apt update to fix include issue?
Notselwyn abf94e0
fix: tred fixing includes replacing musl-gcc with gcc. stability conc…
Notselwyn 6bd783b
fix: reversed previous commit. invalid AVX512 instructions
Notselwyn f8dc724
fix: tried including -mno-avx512f
Notselwyn 3a2e162
fix: tried replacing musl-gcc with gcc
Notselwyn ace0b44
fix: reverse previous -mno-avx512f commit (it does not fix static gli…
Notselwyn c4f8d3c
fix: attempted fix by inversing include dirs, and added debug statements
Notselwyn d2b943d
fix: added debug statements
Notselwyn 7ed3c6e
fix: added more debuig
Notselwyn 6001429
fix: added header files
Notselwyn 6a47e54
fix: added UAPI header files for lts
Notselwyn 9d205d3
fix: removed debug statements
Notselwyn 65aaf65
CVE-2024-1086: added more info to exploit (still incomplete)
Notselwyn a4da963
fix: completed exploit.md
Notselwyn e9bf593
docs: added abbreviations for diagram
Notselwyn af171cb
docs: added references in code snippet
Notselwyn b16d13d
docs: explained ip struct values in detail
Notselwyn f9231eb
docs: included link to blogpost
Notselwyn 7c81a0c
docs: fixed PUD pagetable layer nr
Notselwyn 3a7fdcd
docs: improved documentation for dirty pagetable technique
Notselwyn e79a21a
docs: changed paths to external repo to relative path in repo
Notselwyn 5b669bd
Update novel-techniques.md
Notselwyn cfc6857
test: kernelctf gcc static compile
Notselwyn 8052699
Merge branch 'master' of https://github.com/Notselwyn/security-research
Notselwyn fa84cc2
test: added libmnl-dev dependency for header
Notselwyn 7d25ba8
fix: added libnftnl headers to dependencies
Notselwyn c947564
test: switched to using apt installed headers
Notselwyn af609ba
fix: include header path
Notselwyn 13106b9
fix: changed include path order
Notselwyn c430f4a
fix: include with incorrect header paths
Notselwyn 0a7cabc
fix: linux header include path
Notselwyn 04004a6
chore: got rid of header bomb lol
Notselwyn 9babeec
fix: asm headers
Notselwyn b58725c
fix: asm-generic headers (please let this be the last)
Notselwyn 4ce9f15
fix: asm headers
Notselwyn 78128d5
fix: got rid of header nuke
Notselwyn 3d18475
chore: got rid of header nuke for real this time
Notselwyn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
There are no files selected for viewing
701 changes: 701 additions & 0 deletions
701
pocs/linux/kernelctf/CVE-2024-1086_lts_mitigation/docs/exploit.md
Large diffs are not rendered by default.
Oops, something went wrong.
3 changes: 3 additions & 0 deletions
3
pocs/linux/kernelctf/CVE-2024-1086_lts_mitigation/docs/img/pagesetup.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
199 changes: 199 additions & 0 deletions
199
pocs/linux/kernelctf/CVE-2024-1086_lts_mitigation/docs/novel-techniques.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,199 @@ | ||
# novel-techniques | ||
|
||
TODO: increase description granularity for techniques proven to be novel. | ||
|
||
## Bypassing KernelCTF mitigation instance corruption checks for skb's | ||
|
||
One of the mitigations in the KernelCTF Mitigation instance is checking the freelist next pointer when allocating an object through a freelist pointer. | ||
|
||
In the exploit, the following happens when doing the double-free: | ||
1. alloc skb1 | ||
2. free skb1 (set new freelist pointer) | ||
3. modify skb->len (overlapping with freelist next pointer) | ||
4. free skb1 (set new freelist pointer) | ||
|
||
This means upon step 3 the freelist next pointer gets corrupted. `CONFIG_FREELIST_HARDENED` is excluded here for demonstration purposes. When the background applications in the system try to transmit packets, they will inevitably try to allocate the skb object with the corrupted freelist next pointer, causing a system crash. | ||
|
||
To bypass this, we leverage the fact that these corruption checks only happen on allocation, not on free. Hence, we can mask the corrupted object by spraying "healthy" objects which can be allocated instead. Hence, it would look like this: | ||
|
||
1. alloc N skb objects | ||
2. alloc skb1 | ||
3. free skb1 (set new freelist pointer) | ||
4. modify skb->len (overlapping with freelist next pointer) | ||
5. free N skb objects | ||
6. free skb1 (set new freelist pointer) | ||
|
||
Whilst this is probably not the vulnerability which freelist next pointer corruption detection is intended to mitigate, it would definitively mitigate exploiting this specific scenario. | ||
|
||
The fix for this technique would be checking the freelist next pointer of the previous object in the freelist when freeing an object. | ||
|
||
|
||
## Dirty Pagedirectory (pagetable confusion) | ||
|
||
Perhaps the most interesting technique in this exploit is Dirty Pagedirectory: plainly put, pagetable confusion between pagetables like PUD+PMD and PMD+PTE. | ||
|
||
By double-allocating an PUD page and PMD page, or an PMD page and a PTE page, which can set pagetable entries from userland pages. This allows for a *very* powerful primitive allowing the exploit to do rapid memory read/writes across all physical memory of the system. | ||
|
||
Note how PT entries not only include the physical address (PFN), but also the page flags. Hence, we can write to read-only pages like modprobe_path. As if that isn't enough, we can set the target area to 1GiB (PMD+PTE) and/or 512GiB (PUD+PMD) addresses at the same time. Ofcourse, this can be limited to save memory usage and overhead. | ||
|
||
|
||
## Freeing skb's instantly on arbitrary CPUs without UDP/TCP stacks | ||
|
||
In order to bypass certain double-free detections, we need to free skb's on specific timings on specific CPUs. Additionally, we cannot make use of the UDP and TCP stacks in the kernel, since they access (due to double-free) corrupted fields in the skb. | ||
|
||
Fortunately, we can do this with the IPv4 fragment queues (IFQs). By sending an IPv4 fragment to localhost, we make it wait `ipfrag_time` seconds until all fragments are freed. Alternatively, it gets freed when the IFQ is completed (i.e. the target length is reached with the fragments in the IFQ). | ||
|
||
If needed, we can prolong the lifetime of the IFQ by writing to `/proc/sys/net/ipv4/ipfrag_time`. | ||
|
||
Unfortunately, the target length of the IFQ is depending on skb->len, which is corrupted by the double-free. Hence, we need to do this by triggering an error in the IFQ code, causing it to free all fragments in the queue on the CPU handling the triggering skb. | ||
|
||
It looks like this in action with the double-free: | ||
1. alloc skb1 (double-freed IPv4 fragment) @ CPU `X` | ||
2. free skb1 (1) @ CPU `X` | ||
3. make skb1 go into IFQ (utilizing its' content) | ||
4. do stuff here, like spraying skb's, spraying PTEs, etc | ||
5. alloc skb2 (errornous IPv4 fragment) @ CPU `Y` | ||
6. free skb2 @ CPU `Y` | ||
7. free skb1 @ CPU `Y` | ||
|
||
## Fileless privesc using fd hijacking | ||
|
||
We can escape the namespace by doing file descriptor hijacking: hooking up the file descriptors of another process (or `/dev/console`) to the `/bin/sh` instance as root triggered by the `modprobe_path` technique. | ||
|
||
For example: | ||
- hijack `/dev/console` (works only on local TTYs): `/bin/sh 0</dev/console 1>/dev/console 2>&1` | ||
- hijack exploit fd's (works on reverse shells as well): `/bin/sh 0</proc/<exploit_pid>/fd/0 1>/proc/<exploit_pid>/fd/1 2>&1` | ||
|
||
This way we can do fileless privesc and escape the namespace without even writing a single file, allowing for privesc on read-only systems. | ||
|
||
## Fileless privesc using modprobe_path + procfs | ||
|
||
We can combine overwriting `modprobe_path` with procfs to allow for fileless privesc script execution as root from the root namespace. With this primitive, we can utilize fd hijacking to perform fileless namespace escapes. | ||
|
||
We can overwrite `modprobe_path` to `/proc/<exploit_pid>/fd/<privesc_script_fd>` and it will execute the privesc script completely from memory, allowing privesc on read-only systems. | ||
|
||
## TLB flushing with PCID enabled | ||
|
||
One of the things required for Dirty Pagedirectory is a working TLB flushing primitive. Assuming the target VMA is shared, we can fork() and munmap() that VMA in the child. This allows for 100% working TLB flushing regardless of PCID, without altering the original pagetables. I presume the CPU needs to be pinned, to avoid flushing an incorrect CPU core's TLB cache. | ||
|
||
The code for this looks like: | ||
|
||
```c | ||
#define SPINLOCK(cmp) while (cmp) { usleep(10 * 1000); } | ||
|
||
// presumably needs to be CPU pinned | ||
static void flush_tlb(void *addr, size_t len) | ||
{ | ||
short *status; | ||
|
||
status = mmap(NULL, sizeof(short), PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); | ||
|
||
*status = FLUSH_STAT_INPROGRESS; | ||
if (fork() == 0) | ||
{ | ||
munmap(addr, len); | ||
*status = FLUSH_STAT_DONE; | ||
sleep(9999); | ||
} | ||
|
||
SPINLOCK(*status == FLUSH_STAT_INPROGRESS); | ||
|
||
munmap(status, sizeof(short)); | ||
} | ||
``` | ||
|
||
Note that the child sleeps instead of exits, to avoid certain kernel bugs when doing dirty pagedirectory. | ||
|
||
## Easing physical KASLR bruteforce | ||
|
||
It is possible to ease physical KASLR bruteforcing. The Linux kernel base is aligned to `CONFIG_PHYSICAL_START` (and/or `CONFIG_PHYSICAL_ALIGN`) bytes. This essentially means the Linux kernel must be aligned to 16MiB or 2MiB, reducing the amount of possible base addresses from e.g. 8GiB addresses (assuming 8GiB physical memory) to 512 addresses (a bruteforcable amount). | ||
|
||
## Validating the correct modprobe_path | ||
|
||
We can validate if we found the correct `modprobe_path` object in physical memory (when using Dirty Pagedirectory), by checking if the output of `/proc/sys/kernel/modprobe` has changed to the new value, since it is a "real-time" reference to the `modprobe_path` object used in the kernel. | ||
|
||
For example, this can be done with: | ||
|
||
```c | ||
static int get_modprobe_path(char *buf, size_t buflen) | ||
{ | ||
int size; | ||
|
||
size = read_file("/proc/sys/kernel/modprobe", buf, buflen); | ||
|
||
if (size == buflen) | ||
printf("[*] ==== read max amount of modprobe_path bytes, perhaps increment KMOD_PATH_LEN? ====\n"); | ||
|
||
// remove \x0a | ||
buf[size-1] = '\x00'; | ||
|
||
return size; | ||
} | ||
|
||
static int strcmp_modprobe_path(char *new_str) | ||
{ | ||
char buf[KMOD_PATH_LEN] = { '\x00' }; | ||
|
||
get_modprobe_path(buf, KMOD_PATH_LEN); | ||
|
||
return strncmp(new_str, buf, KMOD_PATH_LEN); | ||
} | ||
|
||
void *memmem_modprobe_path(void *haystack_virt, size_t haystack_len, char *modprobe_path_str, size_t modprobe_path_len) | ||
{ | ||
void *pmd_modprobe_addr; | ||
|
||
// search 0x200000 bytes (a full PTE at a time) for the modprobe_path signature | ||
pmd_modprobe_addr = memmem(haystack_virt, haystack_len, modprobe_path_str, modprobe_path_len); | ||
if (pmd_modprobe_addr == NULL) | ||
return NULL; | ||
|
||
// check if this is the actual modprobe by overwriting it, and checking /proc/sys/kernel/modprobe | ||
strcpy(pmd_modprobe_addr, "/sanitycheck"); | ||
if (strcmp_modprobe_path("/sanitycheck") != 0) | ||
{ | ||
printf("[-] ^false positive. skipping to next one\n"); | ||
return NULL; | ||
} | ||
|
||
return pmd_modprobe_addr; | ||
} | ||
``` | ||
|
||
## Page refcount juggling | ||
|
||
When freeing a page, the Linux kernel checks if the pages' refcount is 0. If it is not, it will refuse to free the page. To bypass this behaviour we simply juggle the refcounts, by utilizing the following order of operations for the double-free: | ||
|
||
1. alloc obj1 | refcount 0 -> 1 | ||
2. free obj1 | refcount 1 -> 0 | ||
3. alloc obj2 | refcount 0 -> 1 | ||
4. free obj1 | refcount 1 -> 0 | ||
5. alloc obj3 | refcount 0 -> 1 | ||
|
||
obj2 and obj3 will now be overlapping (having the same page), because the refcounts were always 0 when freeing. | ||
|
||
```c | ||
void __free_pages(struct page *page, unsigned int order) | ||
{ | ||
/* get PageHead before we drop reference */ | ||
int head = PageHead(page); | ||
|
||
if (put_page_testzero(page)) | ||
free_the_page(page, order); | ||
else if (!head) | ||
while (order-- > 0) | ||
free_the_page(page + (1 << order), order); | ||
} | ||
``` | ||
|
||
## Double-free order 4 to order 0 (old: race condition) | ||
|
||
When double-freeing pages, we can convert the page order to 0 utilizing a race condition with a `WARN()` message on really slow systems (like QEMU VMs with synchronous terminals). In the new exploit, this has been replaced with PCP draining as this works on all systems. | ||
|
||
This allows us to double-allocate `order==0` pages whilst having a double-free primitive on `order==4` pages. | ||
|
||
## Double-free order X to order Y (new: PCP refill) | ||
|
||
When double-freeing pages, we can convert the page order to an arbitrary order by double-freeing pages with `order>=4` such that it will end up in the buddy allocator freelist. Then, we can allocate it to the PCP list of an arbitrary `order<=3` page freelist, by draining said PCP-freelist and refilling it with the pages from the buddy-freelist. | ||
|
||
This is the new variant of the race condition-based method. |
47 changes: 47 additions & 0 deletions
47
pocs/linux/kernelctf/CVE-2024-1086_lts_mitigation/docs/vulnerability.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# vulnerability | ||
|
||
Document containing information about the vulnerability, the requirements, and the affected Linux kernel versions. | ||
|
||
## technical details | ||
|
||
### outlines | ||
|
||
The root cause is an input sanitization bug in `nft_verdict_init()` (`net/netfilter/nf_tables_api.c:9814`), which allowed rule verdicts to return positive drop errors. This is classified as CVE-2024-1086. | ||
|
||
The impact of this is a stable double-free primitive on both `struct sk_buff` objects, as well as `sk_buff->head` objects (kmalloc objects, ranging from size 256 to 65536 (assuming ipv4) a.k.a. order 4 buddy pages). | ||
|
||
The fix for the vulnerability was simply disallowing all drop errors in `nft_verdict_init()`, as this wouldn't allow userland applications to provide any drop errors anymore. It did not make sense to the kernel developers that userland applications could do this anyways, so hence they fully disabled it. | ||
|
||
### triggering the bug | ||
|
||
An exploit can create a rule containing an expression which sets the verdict to `0xFFFF0000`. | ||
|
||
When this rule gets evaluated for an skb passing the nf_tables firewall, `nf_hook_slow()` attempts to free an skb object because `NF_DROP` is returned from the verdict mask of the rule verdict (`0xFFFF0000 (verdict) & 0x000000ff (NF_VERDICT_MASK) == 0 (NF_DROP)`). Then, `nf_hook_slow()` returns `NF_ACCEPT` (`NF_DROP_GETERR(0xFFFF0000) == NF_ACCEPT`) as if every hook/rule in the chain returned `NF_ACCEPT`. | ||
|
||
This causes the caller of `nf_hook_slow()` to misinterpret the situation (it believes the packet has not been freed, and should be handled), and continue parsing the packet and eventually double-free both the skb object and its skb->head object. | ||
|
||
## requirements | ||
|
||
Capabilities: | ||
- `CAP_NET_ADMIN` | ||
|
||
Kernel configuration: | ||
- `CONFIG_NF_TABLES=y` | ||
- `CONFIG_NETFILTER=y` | ||
|
||
User namespaces needed: | ||
- Yes, in order to setup rules for nf_tables to trigger the bug (`CAP_NET_ADMIN` in the current namespace should also be enough) | ||
|
||
## version info | ||
|
||
Commit which introduced the vuln: | ||
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e0abdadcc6e113ed2e22c85b35007 | ||
|
||
Commit which fixed the vuln (revert of previous commit): | ||
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f342de4e2f33e0e39165d8639387aa6c19dff660 | ||
|
||
Affected kernel versions: | ||
- everything between `v3.5` and `v6.8-rc1` | ||
- excluding `v6.1.76` and higher on `v6.1.x` | ||
- excluding `v6.6.15` and higher on `v6.6.x` | ||
- excluding `v6.7.3` and higher on `v6.7.x` |
34 changes: 34 additions & 0 deletions
34
pocs/linux/kernelctf/CVE-2024-1086_lts_mitigation/exploit/lts-6.1.72/Makefile
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
SRC_FILES := src/exploit.c src/env.c src/net.c src/nftnl.c src/file.c | ||
OUT_NAME = ./exploit | ||
|
||
# use musl-gcc since statically linking glibc with gcc generated invalid opcodes for qemu | ||
# and dynamically linking raised glibc ABI versioning errors | ||
CC = musl-gcc | ||
|
||
# use custom headers with fixed versions in a musl-gcc compatible manner | ||
# - ./include/libmnl: libmnl v1.0.5 | ||
# - ./include/libnftnl: libnftnl v1.2.6 | ||
# - ./include/linux-lts-6.1.72: linux v6.1.72 | ||
CFLAGS = -I./include -I./include/linux-lts-6.1.72 -Wall -Wno-deprecated-declarations | ||
|
||
# use custom object archives compiled with musl-gcc for compatibility. normal ones | ||
# are used with gcc and have _chk funcs which musl doesn't support | ||
# the versions are the same as the headers above | ||
LIBMNL_PATH = ./lib/libmnl.a | ||
LIBNFTNL_PATH = ./lib/libnftnl.a | ||
|
||
exploit: _compile_static _strip_bin | ||
prerequisites: _install_musl | ||
run: _run_outfile | ||
clean: _clean_outfile | ||
|
||
_install_musl: | ||
sudo apt-get install musl-tools | ||
_compile_static: | ||
$(CC) $(CFLAGS) $(SRC_FILES) -o $(OUT_NAME) -static $(LIBNFTNL_PATH) $(LIBMNL_PATH) | ||
_strip_bin: | ||
strip $(OUT_NAME) | ||
_run_outfile: | ||
$(OUT_NAME) | ||
_clean_outfile: | ||
rm $(OUT_NAME) |
Binary file added
BIN
+165 KB
pocs/linux/kernelctf/CVE-2024-1086_lts_mitigation/exploit/lts-6.1.72/exploit
Binary file not shown.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see there is still a TODO here, do you want to make any changes before we merge this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking it up. If you believe the explanations are detailed enough, I'm fine with the way it currently is :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please still look into if you can exclude the header files?
There are several other nftables submissions in the repo and in the PRs. Is there anything different in your submission which does not make this possible?