Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds fault monitors #199

Open
5 tasks
aidnem opened this issue May 2, 2024 · 19 comments
Open
5 tasks

Adds fault monitors #199

aidnem opened this issue May 2, 2024 · 19 comments
Labels
feature New feature or request

Comments

@aidnem
Copy link
Contributor

aidnem commented May 2, 2024

Purpose
The purpose of this addition is to add an easily expansible system of monitors to detect various faults on the robot.
One use of this is to create performance monitors for detecting performance degradation.

In autonomous systems, a "monitor" is something that keeps track of variables of interest to ensure they remain within acceptable ranges. If the variable goes outside the acceptable range for a persistence, the monitor goes into alarm and typically takes an action.

For example, let's say you have a battery system that could experience a thermal runaway. To mitigate this risk and protect the system, a monitor could be used to shut off power if the system exceeds a certain temperature. The following monitor could be appropriate for this application:

  • Monitor Type: Below Threshold Monitor
  • Variable: Temperature
  • Threshold: User defined max temperature
  • Persistence: User defined
  • Action: Turn off system

In this addition we should add some basic monitors that we can use on our robot, such as:

  • Variable Below / Above threshold
  • Variable equal / not equal monitor
  • Variable inside/outside range (can be implemented using the above two monitors)
  • Potentially a custom condition (pass the monitor a function which will return a boolean for whether or not the monitored is currently outside of an acceptable state). This will allow for more complex monitors, such as detecting aimer encoder being unplugged.

Project Scope

  • Add base class Monitor that takes a generic supplier (variable to monitor), a threshold, a persistence, and a generic consumer (action to take if the monitor is in alarm). There should also be a getter function for if the monitor is in alarm.
  • Make specific classes that inherit the monitor class for the threshold and equality check monitors
  • Make Monitor types that combine these logical monitors (OR, AND for now).
  • Implement a monitor for the aimer encoder being unplugged to as a first test that the monitor system works and it is easy to write monitors. Should satisfy requirements in Add auto shutoff to shooter arm if encoder unplugged #192.
  • Begin implementing monitors for performance as per Add monitoring to detect performance degradation #193.
@jkleiber
Copy link
Member

jkleiber commented May 2, 2024

Can we incorporate some ideas from #193 and then close #193

@aidnem
Copy link
Contributor Author

aidnem commented May 2, 2024

I've incorporated #193 into this issue. If the new description of this issue is deemed satisfactory, we can go ahead and close #193.

@aidnem
Copy link
Contributor Author

aidnem commented May 3, 2024

@jkleiber Should I make the custom condition monitor class use generics to accept any type, or can we reasonably assume that all values dealt with will be doubles?

@jkleiber
Copy link
Member

jkleiber commented May 3, 2024

Let's go as generic as possible - there's always a chance we'll want to monitor integers/strings/booleans for example

@aidnem
Copy link
Contributor Author

aidnem commented May 4, 2024

Awesome. Would it be preferable to have a callback when the fault is triggered or do we want to only poll the monitors and act upon a fault later? I'm thinking the callback route could look something like the following:

MonitorManager.addMonitor(
    new CustomMonitor(
        "armEncoderUnplugged", // Name to log callback under
        true, // Sticky fault (remain faulted even after conditions become acceptable again
        () -> !(DriverStation.isEnabled()
                && (aimerInputs.aimAppliedVolts > ScoringConstants.aimerkS * 2
                        && aimerInputs.aimVelocityRadPerSec
                                < ScoringConstants.aimerMovementThresholdRadPerSec)), // isStateValid function to check whether the state is currently valid or not
        ScoringConstants.maxAimUnresponsiveTimeSeconds, // timeToFault (how long can state be invalid before a fault occurs)
        () -> { // faultCallback, the function run every tick when the monitor has detected a fault.
            ScoringSubsystem.setOverrideMode(true);
            ScoringSubsystem.setOverrideVolts(0.0);
        }));

I'm not sure entirely how I'd incorporate MonitorManager into the robot, but basically, it would just keep track of all of the monitors and run their periodic functions every tick.

@jkleiber
Copy link
Member

jkleiber commented May 4, 2024

@aidnem yeah this looks good!

If you want to avoid the complications of managing the singleton MonitorManager (mapping all the variables/suppliers/consumers might be a mess in RobotContainer), you could make a MonitoredSubsystem that inherits from Subsystem and then turn the ScoringSubsystem into a MonitoredSubsystem. The periodic method of MonitoredSubsystem would then run all registered monitors at the start, and then call a method that can be overridden by the user. Then the scope of all these variables would be contained in the subsystem they are relevant to

@aidnem
Copy link
Contributor Author

aidnem commented May 4, 2024

@jkleiber Good idea. I had no idea how I would make the MonitorManager singleton yet so this will probably be a better solution. I'll look into doing that when I have time.

@aidnem
Copy link
Contributor Author

aidnem commented May 5, 2024

@jkleiber just one question: wouldn't we want the monitors to be checked before the periodic function runs but then acted upon at the end of periodic? This way their behavior would override any behavior from the periodic function of the subsystem itself.

@jkleiber
Copy link
Member

jkleiber commented May 5, 2024

Yeah you could also just check and act at the end

@aidnem
Copy link
Contributor Author

aidnem commented May 5, 2024

Yeah, but I wanted the monitors to be able to view the IO and the state of the physical world in the same timeframe. If we check at the end, it'll be the next frame's IO but the state of the robot after being controlled by the previous frame's IO. At a glance, this seems like it would cause false positives. I'll just check and act at the end, and if this becomes an issue later we can swap to check at the start, act at the end.

@jkleiber
Copy link
Member

jkleiber commented May 5, 2024

We could also make this subsystem do things in a defined order:

CopperheadSubsystem execution order:

  1. Update inputs
  2. User defined code
  3. Poll Monitors
  4. Update outputs

Then we could also register IO classes as a list and define them with a common interface such that they get update inputs/outputs called automatically

@aidnem
Copy link
Contributor Author

aidnem commented May 5, 2024

If you don't mind, I'd rather push that to a future issue to prevent this one from getting overwhelming. If you want, we could replace this issue with a CopperheadSubsystem feature instead.

I have one other question/note: The multiple different types of monitors listed above seem unnecessary. Wouldn't it be simpler to make the base monitor class accept a function, and then define all monitors using suppliers? If you have reasoning for having multiple different monitor classes I'll gladly do it, I was just wondering about the thought process behind it.

@jkleiber
Copy link
Member

jkleiber commented May 5, 2024

Implementing the monitors via user defined functions/lambdas is totally valid. We can go with that, I mainly had this idea as someone who works in c++ a lot where that design pattern isn't common.

I think the copperhead subsystem could be follow on work, but getting the initial set up might be very useful for this issue

@aidnem
Copy link
Contributor Author

aidnem commented May 5, 2024

Basic Monitor and MonitoredSubsystem implementations are committed and pushed. I didn't create a PR yet since it's so early but I can do that soon if you like.

@aidnem
Copy link
Contributor Author

aidnem commented Oct 7, 2024

@jkleiber we now have an arm on high key as well. Want to move this over to coppercore so we can (hopefully) not destroy that one as well?
Also we could consider using elastic 2025's notification feature to send alerts when faults are triggered.

@jkleiber
Copy link
Member

jkleiber commented Oct 7, 2024

@aidnem yes let's move it to coppercore. We can consider the elastic feature as well, but at first I just want a simple "if violation for persistence then take an action" general monitor architecture

@jkleiber
Copy link
Member

jkleiber commented Oct 7, 2024

@aidnem if you create a ticket I can drop some ideas in there and we can discuss at shop

@aidnem
Copy link
Contributor Author

aidnem commented Oct 7, 2024

Yeah, that's fair. We could also extend the Monitor class in future to add a MonitorWithNotification class or something similar.

I'll create a coppercore issue for that to be included under our wpi interface.

@aidnem
Copy link
Contributor Author

aidnem commented Oct 7, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants