Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-Wframe-larger-than= in arch/x86/kernel/kvm.c #202

Closed
nickdesaulniers opened this issue Oct 10, 2018 · 8 comments
Closed

-Wframe-larger-than= in arch/x86/kernel/kvm.c #202

nickdesaulniers opened this issue Oct 10, 2018 · 8 comments
Labels
unreproducible Not or no longer reproducible

Comments

@nickdesaulniers
Copy link
Member

mm/memcontrol.c:5541:12: warning: stack frame size of 1032 bytes in function 'memory_stat_show' [-Wframe-larger-than=]
static int memory_stat_show(struct seq_file *m, void *v)
           ^
mm/memcontrol.c:3388:12: warning: stack frame size of 1064 bytes in function 'memcg_stat_show' [-Wframe-larger-than=]
static int memcg_stat_show(struct seq_file *m, void *v)
           ^
2 warnings generated.

Not a dupe of #39 , KASAN not enabled.

@nickdesaulniers nickdesaulniers added [BUG] linux A bug that should be fixed in the mainline kernel. [ARCH] x86_64 This bug impacts ARCH=x86_64 low priority This bug is not critical and not a priority -Wframe-larger-than= labels Oct 10, 2018
@arndb
Copy link

arndb commented Oct 11, 2018

I have to look again, but I think I'm not seeing any -Wframe-larger-than= warnings with clang-8 once asan-stack is disabled. This means I probably have a bugfix for this one and many others somewhere in my tree.

@arndb
Copy link

arndb commented Oct 11, 2018

From 240a0792ff1d0bd4afc71869f9d1c5006559f4fc Mon Sep 17 00:00:00 2001
From: Arnd Bergmann <[email protected]>
Date: Tue, 14 Feb 2017 13:52:42 +0100
Subject: [PATCH] rewrite READ_ONCE/WRITE_ONCE

When CONFIG_KASAN is enabled, the READ_ONCE/WRITE_ONCE macros cause
rather large kernel stacks, e.g.:

mm/vmscan.c: In function 'shrink_page_list':
mm/vmscan.c:1333:1: error: the frame size of 3456 bytes is larger than 3072 bytes [-Werror=frame-larger-than=]
block/cfq-iosched.c: In function 'cfqg_stats_add_aux':
block/cfq-iosched.c:750:1: error: the frame size of 4048 bytes is larger than 3072 bytes [-Werror=frame-larger-than=]
fs/btrfs/disk-io.c: In function 'open_ctree':
fs/btrfs/disk-io.c:3314:1: error: the frame size of 3136 bytes is larger than 3072 bytes [-Werror=frame-larger-than=]
fs/btrfs/relocation.c: In function 'build_backref_tree':
fs/btrfs/relocation.c:1193:1: error: the frame size of 4336 bytes is larger than 3072 bytes [-Werror=frame-larger-than=]
fs/fscache/stats.c: In function 'fscache_stats_show':
fs/fscache/stats.c:287:1: error: the frame size of 6512 bytes is larger than 3072 bytes [-Werror=frame-larger-than=]
fs/jbd2/commit.c: In function 'jbd2_journal_commit_transaction':
fs/jbd2/commit.c:1139:1: error: the frame size of 3760 bytes is larger than 3072 bytes [-Werror=frame-larger-than=]

This attempts a rewrite of the two macros, using a simpler implementation
for the most common case of having a naturally aligned 1, 2, 4, or (on
64-bit architectures) 8  byte object that can be accessed with a single
instruction.  For these, we go back to a volatile pointer dereference
that we had with the ACCESS_ONCE macro.

READ_ONCE/WRITE_ONCE also try to handle unaligned objects and objects
of other sizes by forcing either a word-size access (which may trap
on some architectures) or doing a non-atomic memcpy. I could not figure
out what these are actually used for, but they appear to be done
intentionally, so I'm leaving that code untouched.

I had to fix up a couple of files that either use WRITE_ONCE() as an
implicit typecast, or ignore the result of READ_ONCE(). In all cases,
the modified code seems no worse to me than the original.

Cc: Christian Borntraeger <[email protected]>
Cc: Paul McKenney <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>

diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 36bd243843d6..d96bdb7e5f7a 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -32,7 +32,7 @@ static inline void prepare_switch_to(struct task_struct *next)
 	 *
 	 * To minimize cache pollution, just follow the stack pointer.
 	 */
-	READ_ONCE(*(unsigned char *)next->thread.sp);
+	(void)READ_ONCE(*(unsigned char *)next->thread.sp);
 #endif
 }
 
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 7a52e2aba1d4..fc958b0f5688 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -210,6 +210,10 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
 	}
 }
 
+#define __ALIGNED_WORD(x)						\
+	((sizeof(x) == 1 || sizeof(x) == 2 || sizeof(x) == 4 ||		\
+	  sizeof(x) == sizeof(long)) && (sizeof(x) == __alignof__(x)))	\
+
 /*
  * Prevent the compiler from merging or refetching reads or writes. The
  * compiler is also forbidden from reordering successive instances of
@@ -231,6 +235,12 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
  * mutilate accesses that either do not require ordering or that interact
  * with an explicit memory barrier or atomic instruction that provides the
  * required ordering.
+ *
+ * Unaligned data is particularly tricky here: if the type that gets
+ * passed in is not naturally aligned, we cast to a type of higher
+ * alignment, which is not well-defined in C. This is fine as long
+ * as the actual data is aligned, but otherwise might require a trap
+ * to satisfy the load.
  */
 #include <asm/barrier.h>
 #include <linux/kasan-checks.h>
@@ -245,7 +255,32 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
 	smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
 	__u.__val;							\
 })
-#define READ_ONCE(x) __READ_ONCE(x, 1)
+
+#define __WRITE_ONCE(x, val)						\
+({									\
+	union { typeof(x) __val; char __c[1]; } __u =			\
+		{ .__val = (__force typeof(x)) (val) };			\
+	__write_once_size(&(x), __u.__c, sizeof(x));			\
+	__u.__val;							\
+})
+
+
+/*
+ * the common case is simple: x is naturally aligned, not an array,
+ * and accessible with a single load, avoiding the need for local
+ * variables. With KASAN, this is important as any call to
+ *__write_once_size(),__read_once_size_nocheck() or __read_once_size()
+ * uses significant amounts of stack space for checking that we don't
+ * overflow the union.
+ */
+#define __READ_ONCE_SIMPLE(x)						\
+	(typeof(x))(*(volatile typeof(&(x)))&(x))
+
+#define __WRITE_ONCE_SIMPLE(x, val)					\
+	({*(volatile typeof(&(x)))&(x) = (val); })
+
+#define READ_ONCE(x) __builtin_choose_expr(__ALIGNED_WORD(x), 		\
+	__READ_ONCE_SIMPLE(x), __READ_ONCE(x, 1))
 
 /*
  * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need
@@ -260,13 +295,8 @@ unsigned long read_word_at_a_time(const void *addr)
 	return *(unsigned long *)addr;
 }
 
-#define WRITE_ONCE(x, val) \
-({							\
-	union { typeof(x) __val; char __c[1]; } __u =	\
-		{ .__val = (__force typeof(x)) (val) }; \
-	__write_once_size(&(x), __u.__c, sizeof(x));	\
-	__u.__val;					\
-})
+#define WRITE_ONCE(x, val) do { __builtin_choose_expr(__ALIGNED_WORD(x), \
+	__WRITE_ONCE_SIMPLE(x, val), __WRITE_ONCE(x, val)); } while (0)
 
 #endif /* __KERNEL__ */

@arndb
Copy link

arndb commented Oct 11, 2018

I tried to reply with a patch above, that didn't go well. Please see https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/commit/?h=y2038-4.20-next&id=9485431c7f8e537e6b6b7350b7b0bb3724397183 for a patch I did a while ago that might address this.

@nickdesaulniers
Copy link
Member Author

heh, no worries. I think you can attach a file to comments on github. alternatively, just point me to a tree/branch/sha to add as a remote, fetch from, and cherry pick.

@nickdesaulniers
Copy link
Member Author

Just checking, the comment message mentions When CONFIG_KASAN is enabled but I don't have KASAN enabled. What's your CONFIG_FRAME_WARN set to?

@arndb
Copy link

arndb commented Oct 12, 2018 via email

@nickdesaulniers
Copy link
Member Author

strange bug I filed. I reported issues in mm/memcontrol.c, which needs CONFIG_MEMCG=y, which isn't set by x86's defconfig. I also titled the bug "arch/x86/kernel/kvm.c" which doesn't match the warning I reported.

Enabling CONFIG_MEMCG=y on top of defconfig does not produce a warning for mm/memcontrol.c related to stack frame size.

Ditto for KVM_GUEST=y (kvmconfig) and arch/x86/kernel/kvm.c.

Closing as nonsensical.

@nickdesaulniers
Copy link
Member Author

probably duped the warning from #201 by accident.

@nickdesaulniers nickdesaulniers added unreproducible Not or no longer reproducible and removed -Wframe-larger-than= [ARCH] x86_64 This bug impacts ARCH=x86_64 [BUG] linux A bug that should be fixed in the mainline kernel. low priority This bug is not critical and not a priority labels May 20, 2019
nathanchance pushed a commit that referenced this issue Jul 18, 2019
When loading a module with rodata=n, it causes an executing
NX-protected page BUG.

[   32.379191] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   32.382917] BUG: unable to handle page fault for address: ffffffffc0005000
[   32.385947] #PF: supervisor instruction fetch in kernel mode
[   32.387662] #PF: error_code(0x0011) - permissions violation
[   32.389352] PGD 240c067 P4D 240c067 PUD 240e067 PMD 421a52067 PTE 8000000421a53063
[   32.391396] Oops: 0011 [#1] SMP PTI
[   32.392478] CPU: 7 PID: 2697 Comm: insmod Tainted: G           O      5.2.0-rc5+ #202
[   32.394588] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[   32.398157] RIP: 0010:ko_test_init+0x0/0x1000 [ko_test]
[   32.399662] Code: Bad RIP value.
[   32.400621] RSP: 0018:ffffc900029f3ca8 EFLAGS: 00010246
[   32.402171] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   32.404332] RDX: 00000000000004c7 RSI: 0000000000000cc0 RDI: ffffffffc0005000
[   32.406347] RBP: ffffffffc0005000 R08: ffff88842fbebc40 R09: ffffffff810ede4a
[   32.408392] R10: ffffea00108e3480 R11: 0000000000000000 R12: ffff88842bee21a0
[   32.410472] R13: 0000000000000001 R14: 0000000000000001 R15: ffffc900029f3e78
[   32.412609] FS:  00007fb4f0c0a700(0000) GS:ffff88842fbc0000(0000) knlGS:0000000000000000
[   32.414722] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   32.416290] CR2: ffffffffc0004fd6 CR3: 0000000421a90004 CR4: 0000000000020ee0
[   32.418471] Call Trace:
[   32.419136]  do_one_initcall+0x41/0x1df
[   32.420199]  ? _cond_resched+0x10/0x40
[   32.421433]  ? kmem_cache_alloc_trace+0x36/0x160
[   32.422827]  do_init_module+0x56/0x1f7
[   32.423946]  load_module+0x1e67/0x2580
[   32.424947]  ? __alloc_pages_nodemask+0x150/0x2c0
[   32.426413]  ? map_vm_area+0x2d/0x40
[   32.427530]  ? __vmalloc_node_range+0x1ef/0x260
[   32.428850]  ? __do_sys_init_module+0x135/0x170
[   32.430060]  ? _cond_resched+0x10/0x40
[   32.431249]  __do_sys_init_module+0x135/0x170
[   32.432547]  do_syscall_64+0x43/0x120
[   32.433853]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Because if rodata=n, set_memory_x() can't be called, fix this by
calling set_memory_x in complete_formation();

Fixes: f2c65fb ("x86/modules: Avoid breaking W^X while loading modules")
Suggested-by: Jian Cheng <[email protected]>
Reviewed-by: Nadav Amit <[email protected]>
Signed-off-by: Yang Yingliang <[email protected]>
Signed-off-by: Jessica Yu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unreproducible Not or no longer reproducible
Projects
None yet
Development

No branches or pull requests

2 participants