Adds fault monitors #199

aidnem · 2024-05-02T16:33:22Z

Purpose
The purpose of this addition is to add an easily expansible system of monitors to detect various faults on the robot.
One use of this is to create performance monitors for detecting performance degradation.

In autonomous systems, a "monitor" is something that keeps track of variables of interest to ensure they remain within acceptable ranges. If the variable goes outside the acceptable range for a persistence, the monitor goes into alarm and typically takes an action.

For example, let's say you have a battery system that could experience a thermal runaway. To mitigate this risk and protect the system, a monitor could be used to shut off power if the system exceeds a certain temperature. The following monitor could be appropriate for this application:

Monitor Type: Below Threshold Monitor
Variable: Temperature
Threshold: User defined max temperature
Persistence: User defined
Action: Turn off system

In this addition we should add some basic monitors that we can use on our robot, such as:

Variable Below / Above threshold
Variable equal / not equal monitor
Variable inside/outside range (can be implemented using the above two monitors)
Potentially a custom condition (pass the monitor a function which will return a boolean for whether or not the monitored is currently outside of an acceptable state). This will allow for more complex monitors, such as detecting aimer encoder being unplugged.

Project Scope

Add base class Monitor that takes a generic supplier (variable to monitor), a threshold, a persistence, and a generic consumer (action to take if the monitor is in alarm). There should also be a getter function for if the monitor is in alarm.
Make specific classes that inherit the monitor class for the threshold and equality check monitors
Make Monitor types that combine these logical monitors (OR, AND for now).
Implement a monitor for the aimer encoder being unplugged to as a first test that the monitor system works and it is easy to write monitors. Should satisfy requirements in Add auto shutoff to shooter arm if encoder unplugged #192.
Begin implementing monitors for performance as per Add monitoring to detect performance degradation #193.

jkleiber · 2024-05-02T17:11:37Z

Can we incorporate some ideas from #193 and then close #193

aidnem · 2024-05-02T17:50:17Z

I've incorporated #193 into this issue. If the new description of this issue is deemed satisfactory, we can go ahead and close #193.

aidnem · 2024-05-03T22:00:37Z

@jkleiber Should I make the custom condition monitor class use generics to accept any type, or can we reasonably assume that all values dealt with will be doubles?

jkleiber · 2024-05-03T22:32:11Z

Let's go as generic as possible - there's always a chance we'll want to monitor integers/strings/booleans for example

aidnem · 2024-05-04T15:28:54Z

Awesome. Would it be preferable to have a callback when the fault is triggered or do we want to only poll the monitors and act upon a fault later? I'm thinking the callback route could look something like the following:

MonitorManager.addMonitor(
    new CustomMonitor(
        "armEncoderUnplugged", // Name to log callback under
        true, // Sticky fault (remain faulted even after conditions become acceptable again
        () -> !(DriverStation.isEnabled()
                && (aimerInputs.aimAppliedVolts > ScoringConstants.aimerkS * 2
                        && aimerInputs.aimVelocityRadPerSec
                                < ScoringConstants.aimerMovementThresholdRadPerSec)), // isStateValid function to check whether the state is currently valid or not
        ScoringConstants.maxAimUnresponsiveTimeSeconds, // timeToFault (how long can state be invalid before a fault occurs)
        () -> { // faultCallback, the function run every tick when the monitor has detected a fault.
            ScoringSubsystem.setOverrideMode(true);
            ScoringSubsystem.setOverrideVolts(0.0);
        }));

I'm not sure entirely how I'd incorporate MonitorManager into the robot, but basically, it would just keep track of all of the monitors and run their periodic functions every tick.

jkleiber · 2024-05-04T19:28:00Z

@aidnem yeah this looks good!

If you want to avoid the complications of managing the singleton MonitorManager (mapping all the variables/suppliers/consumers might be a mess in RobotContainer), you could make a MonitoredSubsystem that inherits from Subsystem and then turn the ScoringSubsystem into a MonitoredSubsystem. The periodic method of MonitoredSubsystem would then run all registered monitors at the start, and then call a method that can be overridden by the user. Then the scope of all these variables would be contained in the subsystem they are relevant to

aidnem · 2024-05-04T20:45:19Z

@jkleiber Good idea. I had no idea how I would make the MonitorManager singleton yet so this will probably be a better solution. I'll look into doing that when I have time.

aidnem · 2024-05-05T17:40:02Z

@jkleiber just one question: wouldn't we want the monitors to be checked before the periodic function runs but then acted upon at the end of periodic? This way their behavior would override any behavior from the periodic function of the subsystem itself.

jkleiber · 2024-05-05T17:44:10Z

Yeah you could also just check and act at the end

aidnem · 2024-05-05T17:51:53Z

Yeah, but I wanted the monitors to be able to view the IO and the state of the physical world in the same timeframe. If we check at the end, it'll be the next frame's IO but the state of the robot after being controlled by the previous frame's IO. At a glance, this seems like it would cause false positives. I'll just check and act at the end, and if this becomes an issue later we can swap to check at the start, act at the end.

jkleiber · 2024-05-05T18:18:44Z

We could also make this subsystem do things in a defined order:

CopperheadSubsystem execution order:

Update inputs
User defined code
Poll Monitors
Update outputs

Then we could also register IO classes as a list and define them with a common interface such that they get update inputs/outputs called automatically

aidnem · 2024-05-05T18:36:45Z

If you don't mind, I'd rather push that to a future issue to prevent this one from getting overwhelming. If you want, we could replace this issue with a CopperheadSubsystem feature instead.

I have one other question/note: The multiple different types of monitors listed above seem unnecessary. Wouldn't it be simpler to make the base monitor class accept a function, and then define all monitors using suppliers? If you have reasoning for having multiple different monitor classes I'll gladly do it, I was just wondering about the thought process behind it.

jkleiber · 2024-05-05T18:49:47Z

Implementing the monitors via user defined functions/lambdas is totally valid. We can go with that, I mainly had this idea as someone who works in c++ a lot where that design pattern isn't common.

I think the copperhead subsystem could be follow on work, but getting the initial set up might be very useful for this issue

aidnem · 2024-05-05T22:13:30Z

Basic Monitor and MonitoredSubsystem implementations are committed and pushed. I didn't create a PR yet since it's so early but I can do that soon if you like.

aidnem · 2024-10-07T12:59:49Z

@jkleiber we now have an arm on high key as well. Want to move this over to coppercore so we can (hopefully) not destroy that one as well?
Also we could consider using elastic 2025's notification feature to send alerts when faults are triggered.

jkleiber · 2024-10-07T13:05:56Z

@aidnem yes let's move it to coppercore. We can consider the elastic feature as well, but at first I just want a simple "if violation for persistence then take an action" general monitor architecture

jkleiber · 2024-10-07T13:08:43Z

@aidnem if you create a ticket I can drop some ideas in there and we can discuss at shop

aidnem · 2024-10-07T13:08:56Z

Yeah, that's fair. We could also extend the Monitor class in future to add a MonitorWithNotification class or something similar.

I'll create a coppercore issue for that to be included under our wpi interface.

aidnem · 2024-10-07T13:12:45Z

@jkleiber issue is created now.

aidnem added the feature New feature or request label May 2, 2024

aidnem mentioned this issue May 2, 2024

Add fault detection and auto-shutoff to shooter if encoder is unplugged #198

Draft

2 tasks

jkleiber mentioned this issue May 2, 2024

Add monitoring to detect performance degradation #193

Closed

3 tasks

aidnem mentioned this issue Oct 7, 2024

Add Monitor and MonitoredSubystem to WPILIb Interface team401/coppercore#24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds fault monitors #199

Adds fault monitors #199

aidnem commented May 2, 2024 •

edited

Loading

jkleiber commented May 2, 2024

aidnem commented May 2, 2024

aidnem commented May 3, 2024

jkleiber commented May 3, 2024

aidnem commented May 4, 2024

jkleiber commented May 4, 2024

aidnem commented May 4, 2024

aidnem commented May 5, 2024

jkleiber commented May 5, 2024

aidnem commented May 5, 2024

jkleiber commented May 5, 2024

aidnem commented May 5, 2024

jkleiber commented May 5, 2024

aidnem commented May 5, 2024

aidnem commented Oct 7, 2024

jkleiber commented Oct 7, 2024

jkleiber commented Oct 7, 2024

aidnem commented Oct 7, 2024

aidnem commented Oct 7, 2024

Adds fault monitors #199

Adds fault monitors #199

Comments

aidnem commented May 2, 2024 • edited Loading

jkleiber commented May 2, 2024

aidnem commented May 2, 2024

aidnem commented May 3, 2024

jkleiber commented May 3, 2024

aidnem commented May 4, 2024

jkleiber commented May 4, 2024

aidnem commented May 4, 2024

aidnem commented May 5, 2024

jkleiber commented May 5, 2024

aidnem commented May 5, 2024

jkleiber commented May 5, 2024

aidnem commented May 5, 2024

jkleiber commented May 5, 2024

aidnem commented May 5, 2024

aidnem commented Oct 7, 2024

jkleiber commented Oct 7, 2024

jkleiber commented Oct 7, 2024

aidnem commented Oct 7, 2024

aidnem commented Oct 7, 2024

aidnem commented May 2, 2024 •

edited

Loading