Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kern: fix client-controlled kernel panic in kipc #1961

Merged
merged 1 commit into from
Jan 25, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 35 additions & 13 deletions sys/kern/src/arch/arm_m.rs
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ use crate::time::Timestamp;
use crate::umem::USlice;
#[cfg(any(armv7m, armv8m))]
use abi::FaultSource;
use abi::{FaultInfo, InterruptNum};
use abi::{FaultInfo, InterruptNum, UsageError};
#[cfg(armv8m)]
use armv8_m_mpu::{disable_mpu, enable_mpu};
use unwrap_lite::UnwrapLite;
Expand Down Expand Up @@ -1233,7 +1233,10 @@ pub unsafe extern "C" fn DefaultHandler() {
.unwrap_or_else(|| panic!("unhandled IRQ {irq_num}"));

let switch = with_task_table(|tasks| {
disable_irq(irq_num, false);
// This can only fail if the IRQ number is out of range, which
// in this case would mean the hardware is conspiring against
// us. So ignore it to ensure we don't generate a bogus check.
disable_irq(irq_num, false).ok();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you prefer this over let _ because the latter is a bit of a footgun in other contexts, but there is a part of me that kind of wonders if

fn_returning_result().ok();

optimizes worse than

let _ = fn_returning_result();

because it's a function call, though hopefully LLVM can eliminate it...I might have to go check.

EDIT: After spending some time in Compiler Explorer, I have been proven definitively wrong about this, in fact, the opt-level=2 compiler is so smart about this that it correctly realizes that these are actually the same function and doesn't even generate two separate functions for them:

#[no_mangle]
#[inline(never)]
pub extern fn ignore_err_ok() {
    returns_result().ok();
    // pretend the function then does other stuff
    core::hint::black_box(1);
}

#[no_mangle]
#[inline(never)]
pub extern fn ignore_err_underscore() {
    let _ = returns_result();
    // pretend the function then does other stuff
    core::hint::black_box(1);
}

So, I have seen the error of my ways and should not have doubted the use of .ok() to ignore Results!.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always worth checking in cases like this! The best codegen changes over time, too, so the ground truth back when I developed this habit might be different from today. So, thanks for chasing it down.


// Now, post the notification and return the
// scheduling hint.
Expand All @@ -1250,40 +1253,54 @@ pub unsafe extern "C" fn DefaultHandler() {
crate::profiling::event_isr_exit();
}

pub fn disable_irq(n: u32, also_clear_pending: bool) {
pub fn disable_irq(n: u32, also_clear_pending: bool) -> Result<(), UsageError> {
// Disable the interrupt by poking the Interrupt Clear Enable Register.
let nvic = unsafe { &*cortex_m::peripheral::NVIC::PTR };
let reg_num = (n / 32) as usize;
let bit_mask = 1 << (n % 32);
unsafe {
nvic.icer[reg_num].write(bit_mask);
nvic.icer
.get(reg_num)
.ok_or(UsageError::NoIrq)?
.write(bit_mask);
}
if also_clear_pending {
unsafe {
nvic.icpr[reg_num].write(bit_mask);
nvic.icpr
.get(reg_num)
.ok_or(UsageError::NoIrq)?
.write(bit_mask);
}
}
Comment on lines 1261 to 1274
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we wanted to be really fastidious about code size, we could potentially do a single check that the IRQ number is in range, and then make both of the actual accessess unchecked; if rustc isn't smart enough to know that if it's in range for ICER, it's also in range for ICPR? but, I dunno if it's worth the more manual checking to save 40 bytes of kernel --- feels sketchier for sure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I'd be curious how much that'd actually save -- probably both the error return paths for these expressions get merged into a single epilogue, but I could be wrong.

I feel like the result is potentially more fragile, though, so I hesitate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is fine! It's just a thought that occurred to me.

Ok(())
}

pub fn enable_irq(n: u32, also_clear_pending: bool) {
pub fn enable_irq(n: u32, also_clear_pending: bool) -> Result<(), UsageError> {
// Enable the interrupt by poking the Interrupt Set Enable Register.
let nvic = unsafe { &*cortex_m::peripheral::NVIC::PTR };
let reg_num = (n / 32) as usize;
let bit_mask = 1 << (n % 32);
if also_clear_pending {
// Do this _before_ enabling.
unsafe {
nvic.icpr[reg_num].write(bit_mask);
nvic.icpr
.get(reg_num)
.ok_or(UsageError::NoIrq)?
.write(bit_mask);
}
}
unsafe {
nvic.iser[reg_num].write(bit_mask);
nvic.iser
.get(reg_num)
.ok_or(UsageError::NoIrq)?
.write(bit_mask);
}
Ok(())
}

/// Looks up an interrupt in the NVIC and returns a cross-platform
/// representation of that interrupt's status.
pub fn irq_status(n: u32) -> abi::IrqStatus {
pub fn irq_status(n: u32) -> Result<abi::IrqStatus, UsageError> {
let mut status = abi::IrqStatus::empty();

let nvic = unsafe { &*cortex_m::peripheral::NVIC::PTR };
Expand All @@ -1292,25 +1309,30 @@ pub fn irq_status(n: u32) -> abi::IrqStatus {

// See if the interrupt is enabled by checking the bit in the Interrupt Set
// Enable Register.
let enabled = nvic.iser[reg_num].read() & bit_mask == bit_mask;
let iser_reg = nvic.iser.get(reg_num).ok_or(UsageError::NoIrq)?;
let enabled = iser_reg.read() & bit_mask == bit_mask;
status.set(abi::IrqStatus::ENABLED, enabled);

// See if the interrupt is pending by checking the bit in the Interrupt
// Set Pending Register (ISPR).
let pending = nvic.ispr[reg_num].read() & bit_mask == bit_mask;
status.set(abi::IrqStatus::PENDING, pending);

status
Ok(status)
}

pub fn pend_software_irq(InterruptNum(n): InterruptNum) {
pub fn pend_software_irq(
InterruptNum(n): InterruptNum,
) -> Result<(), UsageError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to say "hmm, shouldn't we update the documentation for the software_irq KIPC to reflect that it will fault you if you ask for a nonexistent IRQ...but then, I realized that the Hubris reference docs for KIPCs doesn't even mention software_irq, and instead suggests that find_faulted_task is KIPC 8, which it isn't (it's KIPC 9, and software_irq is 8). This is almost certainly my fault back when I added the software IRQ KIPC, so I should probably fix that...

let nvic = unsafe { &*cortex_m::peripheral::NVIC::PTR };
let reg_num = (n / 32) as usize;
let bit_mask = 1 << (n % 32);

// Pend the IRQ by poking the corresponding bit in the Interrupt Set Pending
// Register (ISPR).
unsafe { nvic.ispr[reg_num].write(bit_mask) };
let ispr_reg = nvic.ispr.get(reg_num).ok_or(UsageError::NoIrq)?;
unsafe { ispr_reg.write(bit_mask) };
Ok(())
}

#[repr(u8)]
Expand Down
7 changes: 5 additions & 2 deletions sys/kern/src/kipc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,13 @@
//! Implementation of IPC operations on the virtual kernel task.

use abi::{FaultInfo, Kipcnum, SchedState, TaskState, UsageError};
use core::mem::size_of;
use unwrap_lite::UnwrapLite;

use crate::arch;
use crate::err::UserError;
use crate::task::{current_id, ArchState, NextTask, Task};
use crate::umem::USlice;
use core::mem::size_of;

/// Message dispatcher.
pub fn handle_kernel_message(
Expand Down Expand Up @@ -462,7 +463,9 @@ fn software_irq(
)))?;

for &irq in irqs.iter() {
crate::arch::pend_software_irq(irq);
// Any error here would be a problem in our dispatch table, not the
// caller, so we panic because we want to hear about it.
crate::arch::pend_software_irq(irq).unwrap_lite();
Comment on lines +466 to +468
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm --- this is the only call site of pend_software_irq, right? So, if we're going to unwrap here, we could just leave that function using infallible array indexing and let it panic, instead. I dunno if that has a meaningful impact on binary size, though, and this is maybe nicer in the event that we add other uses of pend_software_irq later, which don't want to panic. I think having it be the caller's decision whether or not to panic is almost certainly the ideologically nicer factoring...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd want to provide callers with some way of checking if an IRQ is valid for the architecture then, since it's in the arch module. The current factoring makes it responsible for the check, using its architecture-specific knowledge, which I suspect results in simpler code at the use sites.

But as you say, there is only one use site atm, and I'm not sure how likely that is to change.

}

tasks[caller].save_mut().set_send_response_and_length(0, 0);
Expand Down
11 changes: 7 additions & 4 deletions sys/kern/src/syscalls.rs
Original file line number Diff line number Diff line change
Expand Up @@ -783,7 +783,9 @@ fn irq_control(
UsageError::NoIrq,
)))?;
for i in irqs.iter() {
operation(i.0, also_clear_pending);
// Because the IRQ numbers are coming from our own table, any error here
// would indicate a kernel bug, _not_ bad syscall arguments.
operation(i.0, also_clear_pending).unwrap_lite();
}
Ok(NextTask::Same)
}
Expand Down Expand Up @@ -923,9 +925,10 @@ fn irq_status(
)))?;

// Combine the platform-level status of all the IRQs in the notification set.
let mut status = irqs.iter().fold(IrqStatus::empty(), |status, irq| {
status | crate::arch::irq_status(irq.0)
});
let mut status =
irqs.iter().try_fold(IrqStatus::empty(), |status, irq| {
crate::arch::irq_status(irq.0).map(|n| status | n)
})?;
Comment on lines +929 to +931
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's nice how this was so trivially converted to fallible with try_fold! :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't reach for the fold family of functions often, but sometimes they're just the ticket!


// If any bits in the notification mask are set in the caller's notification
// set, then a notification has been posted to the task and not yet consumed.
Expand Down
Loading