Skip to content

Commit

Permalink
rcu: Make rcu_barrier() understand about missing rcuo kthreads
Browse files Browse the repository at this point in the history
Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
avoids creating rcuo kthreads for CPUs that never come online.  This
fixes a bug in many instances of firmware: Instead of lying about their
age, these systems instead lie about the number of CPUs that they have.
Before commit 35ce7f29a44a, this could result in huge numbers of useless
rcuo kthreads being created.

It appears that experience indicates that I should have told the
people suffering from this problem to fix their broken firmware, but
I instead produced what turned out to be a partial fix.   The missing
piece supplied by this commit makes sure that rcu_barrier() knows not to
post callbacks for no-CBs CPUs that have not yet come online, because
otherwise rcu_barrier() will hang on systems having firmware that lies
about the number of CPUs.

It is tempting to simply have rcu_barrier() refuse to post a callback on
any no-CBs CPU that does not have an rcuo kthread.  This unfortunately
does not work because rcu_barrier() is required to wait for all pending
callbacks.  It is therefore required to wait even for those callbacks
that cannot possibly be invoked.  Even if doing so hangs the system.

Given that posting a callback to a no-CBs CPU that does not yet have an
rcuo kthread can hang rcu_barrier(), It is tempting to report an error
in this case.  Unfortunately, this will result in false positives at
boot time, when it is perfectly legal to post callbacks to the boot CPU
before the scheduler has started, in other words, before it is legal
to invoke rcu_barrier().

So this commit instead has rcu_barrier() avoid posting callbacks to
CPUs having neither rcuo kthread nor pending callbacks, and has it
complain bitterly if it finds CPUs having no rcuo kthread but some
pending callbacks.  And when rcu_barrier() does find CPUs having no rcuo
kthread but pending callbacks, as noted earlier, it has no choice but
to hang indefinitely.

Reported-by: Yanko Kaneti <[email protected]>
Reported-by: Jay Vosburgh <[email protected]>
Reported-by: Meelis Roos <[email protected]>
Reported-by: Eric B Munson <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Tested-by: Eric B Munson <[email protected]>
Tested-by: Jay Vosburgh <[email protected]>
Tested-by: Yanko Kaneti <[email protected]>
Tested-by: Kevin Fenzi <[email protected]>
Tested-by: Meelis Roos <[email protected]>
Signed-off-by: Ad!thya R <[email protected]>
  • Loading branch information
paulmck authored and adithya2306 committed Jun 29, 2018
1 parent 4236b71 commit 1c4e383
Show file tree
Hide file tree
Showing 4 changed files with 53 additions and 14 deletions.
18 changes: 9 additions & 9 deletions include/trace/events/rcu.h
Original file line number Diff line number Diff line change
Expand Up @@ -645,18 +645,18 @@ TRACE_EVENT(rcu_torture_read,
/*
* Tracepoint for _rcu_barrier() execution. The string "s" describes
* the _rcu_barrier phase:
* "Begin": rcu_barrier_callback() started.
* "Check": rcu_barrier_callback() checking for piggybacking.
* "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
* "Inc1": rcu_barrier_callback() piggyback check counter incremented.
* "Offline": rcu_barrier_callback() found offline CPU
* "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
* "OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
* "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
* "Begin": _rcu_barrier() started.
* "Check": _rcu_barrier() checking for piggybacking.
* "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
* "Inc1": _rcu_barrier() piggyback check counter incremented.
* "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
* "OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
* "OnlineQ": _rcu_barrier() found online CPU with callbacks.
* "OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
* "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
* "CB": An rcu_barrier_callback() invoked a callback, not the last.
* "LastCB": An rcu_barrier_callback() invoked the last callback.
* "Inc2": rcu_barrier_callback() piggyback check counter incremented.
* "Inc2": _rcu_barrier() piggyback check counter incremented.
* The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
* is the count of remaining callbacks, and "done" is the piggybacking count.
*/
Expand Down
15 changes: 10 additions & 5 deletions kernel/rcu/tree.c
Original file line number Diff line number Diff line change
Expand Up @@ -2995,11 +2995,16 @@ static void _rcu_barrier(struct rcu_state *rsp)
continue;
rdp = per_cpu_ptr(rsp->rda, cpu);
if (rcu_is_nocb_cpu(cpu)) {
_rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
rsp->n_barrier_done);
atomic_inc(&rsp->barrier_cpu_count);
__call_rcu(&rdp->barrier_head, rcu_barrier_callback,
rsp, cpu, 0);
if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
_rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
rsp->n_barrier_done);
} else {
_rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
rsp->n_barrier_done);
atomic_inc(&rsp->barrier_cpu_count);
__call_rcu(&rdp->barrier_head,
rcu_barrier_callback, rsp, cpu, 0);
}
} else if (ACCESS_ONCE(rdp->qlen)) {
_rcu_barrier_trace(rsp, "OnlineQ", cpu,
rsp->n_barrier_done);
Expand Down
1 change: 1 addition & 0 deletions kernel/rcu/tree.h
Original file line number Diff line number Diff line change
Expand Up @@ -564,6 +564,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
static void print_cpu_stall_info_end(void);
static void zero_cpu_stall_ticks(struct rcu_data *rdp);
static void increment_cpu_stall_ticks(void);
static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
static void rcu_init_one_nocb(struct rcu_node *rnp);
Expand Down
33 changes: 33 additions & 0 deletions kernel/rcu/tree_plugin.h
Original file line number Diff line number Diff line change
Expand Up @@ -2020,6 +2020,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
}
}

/*
* Does the specified CPU need an RCU callback for the specified flavor
* of rcu_barrier()?
*/
static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
{
struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
struct rcu_head *rhp;

/* No-CBs CPUs might have callbacks on any of three lists. */
rhp = ACCESS_ONCE(rdp->nocb_head);
if (!rhp)
rhp = ACCESS_ONCE(rdp->nocb_gp_head);
if (!rhp)
rhp = ACCESS_ONCE(rdp->nocb_follower_head);

/* Having no rcuo kthread but CBs after scheduler starts is bad! */
if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) {
/* RCU callback enqueued before CPU first came online??? */
pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
cpu, rhp->func);
WARN_ON_ONCE(1);
}

return !!rhp;
}

/*
* Enqueue the specified string of rcu_head structures onto the specified
* CPU's no-CBs lists. The CPU is specified by rdp, the head of the
Expand Down Expand Up @@ -2597,6 +2624,12 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)

#else /* #ifdef CONFIG_RCU_NOCB_CPU */

static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
{
WARN_ON_ONCE(1); /* Should be dead code. */
return false;
}

static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
{
}
Expand Down

0 comments on commit 1c4e383

Please sign in to comment.