Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wpilib] Tracer Overhaul #7099

Open
wants to merge 64 commits into
base: main
Choose a base branch
from
Open

Conversation

oh-yes-0-fps
Copy link
Contributor

@oh-yes-0-fps oh-yes-0-fps commented Sep 19, 2024

@oh-yes-0-fps oh-yes-0-fps requested a review from a team as a code owner September 19, 2024 05:11
@oh-yes-0-fps oh-yes-0-fps requested a review from a team as a code owner September 19, 2024 05:12
@rzblue
Copy link
Member

rzblue commented Sep 19, 2024

I don't think Tracer should be a static global, as it is used in multiple places that shouldn't have mixed outputs.

@oh-yes-0-fps
Copy link
Contributor Author

oh-yes-0-fps commented Sep 19, 2024

I don't think Tracer should be a static global, as it is used in multiple places that shouldn't have mixed outputs.

It's being used differently here, it's similar to a flame graph with a cascading list of durations.
There are no mixed outputs, the tracer covers the whole robot and everything that happens during its runtime.
Also the tracer is more of ThreadLocal static and supports tracing multiple threads at once or restricting it to 1 thread.
It also has some tooling to attempt to isolate gc time from execution time but that can be turned off on a per thread basis.
image

@oh-yes-0-fps
Copy link
Contributor Author

If performance is a concern we can make it so after a certain number of nests it is sent to only datalog instead.

@rzblue
Copy link
Member

rzblue commented Sep 19, 2024

Ok, that's pretty cool.

A couple things:

  • Tracer is part of the public API, so we will need to do a deprecation cycle on the old interface

  • I think we should talk about how we want to handle telemetry like this; much like alerts we don't want it to be printed to the console, but it needs to be obvious how to access it. I also don't usually like tying things to NT directly, but maybe that's fine for now.

  • have you done any benchmarks on the rio? I doubt performance will be a significant regression from the current implementation but it'd be good to double check.

@oh-yes-0-fps
Copy link
Contributor Author

Performance wise it's quite good, we ran a version close to this all of the 2024 season based on some cursory testing it seems to use less than a ms per loop. The current implementation is more refined with 1/3 as many string allocs so it should be retested.

I don't like doing 1 topic per duration but there isn't really a way around it.

In terms of displaying to user im unsure of better ways to convey information. The watchdog will still print when overruns happen but not the epoch times. With glass, ascope and maybe elastic being packaged with wpilib I think teams should be expected to be able to use a dashboard to take full advantage of wpilib.

@oh-yes-0-fps
Copy link
Contributor Author

oh-yes-0-fps commented Sep 19, 2024

In terms of deprecation of the old tracer api, we could just name this something else while still removing tracer from watchdog. Profiler isn't a terrible name but also not great.

Whats the etiquette on just leaving the old api with deprecation warnings and just having all the functions no-op or smthn until we can remove it.

@rzblue
Copy link
Member

rzblue commented Sep 19, 2024

Whats the etiquette on just leaving the old api with deprecation warnings and just having all the functions no-op or smthn until we can remove it.

That's worse than removing it without deprecation.

@KangarooKoala
Copy link
Contributor

Generally the route for changing to a new API is that you make the new API with a different name and deprecate the old one (without doing any other changes to the old version)- The point of deprecation is that you can still use it without breakage. When it comes time to remove the old API (usually a year after deprecation), you completely remove it.
I can't think of any case where you'd want to change methods to no-op, since that's a silent breakage.

@oh-yes-0-fps
Copy link
Contributor Author

So I guess this is a broader question but what is the worry of changing the public API for something when adding it in a new major version? Especially something that is currently not used very much by users.

@calcmogul
Copy link
Member

calcmogul commented Sep 19, 2024

So I guess this is a broader question but what is the worry of changing the public API for something when adding it in a new major version?

Ostensibly, it lets teams migrate their code gradually over the course of a year rather than breaking it all at once. In my experience though, teams either fix the deprecation warnings immediately because they think they're errors, or they don't migrate until we force them to by removing the old API.

At least deprecation warnings provide guidance on how to upgrade though; immediate removal or API change gives no guidance whatsoever and can be frustrating, especially if we broke working code from the previous year.

I'm not sure what we'd name the new class instead though, because Tracer is the most obvious name.

@oh-yes-0-fps
Copy link
Contributor Author

This class's functionality is now static, it wouldn't be hard to make an instance of it to keep the functionality of the old tracer. It would just be confusing but using proper docs and deprecation warnings could help with that.

@oh-yes-0-fps oh-yes-0-fps changed the title [wpilibj] Tracer Overhaul [wpilib] Tracer Overhaul Sep 23, 2024
@oh-yes-0-fps
Copy link
Contributor Author

Ok I added more overloads to addPeriodic

I need help porting the changes I made to TimedRobot.Callback over to C++

Another thought I had was that each periodic Callback now has a name. The name could be used when printing out errors to make debugging from which Callback the error is coming from easier.

Copy link
Contributor

@KangarooKoala KangarooKoala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need help porting the changes I made to TimedRobot.Callback over to C++

You can use std::optional instead of java.util.Optional, with the exception that you can't use map() or ifPresentOrElse() (which aren't supported on the compiler versions we use)- Instead, you'll have to write the logic manually. (name.map(SubstitutiveTracer::new) -> name ? frc::Tracer::SubstitutiveTracer{name} : std::nullopt or m_tracer.ifPresentOrElse(sub -> sub.subWith(m_func), m_func) -> if (m_tracer) { sub.subWith(m_func); } else { m_func(); })

Comment on lines +399 to +400
* Decorates this Command with a name. Is an inline function for
* Command::SetName(std::string_view);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated?


if constexpr (IsSimulation()) {
HAL_SimPeriodicBefore();
SimulationPeriodic();
Tracer::TraceFunc("SimulationPeriodic",
std::bind(&IterativeRobotBase::SimulationPeriodic, this));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is techinically a behavior change. Previously, everything in the if statement would be timed, but now it's only SimulationPeriodic. Not necessarily bad, just noting.

// Warn on loop time overruns
if (m_watchdog.IsExpired()) {
m_watchdog.PrintEpochs();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there functionality to alert the user when an overrun happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The watchdog already does that with the callback

}

std::string Tracer::TracerState::PopTraceStack() {
m_stackSize > 0 ? m_stackSize-- : m_stackSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use an if statement here.

Comment on lines 32 to 33
EXPECT_TRUE(test1Entry.GetDouble(0.0) - 500.0 < 1.0);
EXPECT_TRUE(test2Entry.GetDouble(0.0) - 400.0 < 1.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the default value is zero, the value will be negative and will satisfy the condition. If you need some tolerance, use EXPECT_NEAR, or just EXPECT_EQ if you want to be exact.

auto test2Entry = nt::NetworkTableInstance::GetDefault().GetEntry(
"/Tracer/" + newThreadName + "/ThreadTest1");

EXPECT_TRUE(test2Entry.GetDouble(0.0) - 400.0 < 1.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

auto test1Entry = nt::NetworkTableInstance::GetDefault().GetEntry(
"/Tracer/SingleThreadTest1");

EXPECT_TRUE(test1Entry.GetDouble(0.0) - 100.0 < 1.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

wpilibc/src/test/native/cpp/TracerTest.cpp Outdated Show resolved Hide resolved
@@ -421,18 +432,11 @@ protected void loopFunc() {
NetworkTableInstance.getDefault().flushLocal();
}

// Warn on loop time overruns
if (m_watchdog.isExpired()) {
m_watchdog.printEpochs();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there functionality to alert the user of overruns?

Copy link
Contributor

@Gold856 Gold856 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any documentation for the time units Tracer publishes in?

wpilibc/src/main/native/cpp/Tracer.cpp Outdated Show resolved Hide resolved
@oh-yes-0-fps
Copy link
Contributor Author

Is there any documentation for the time units Tracer publishes in?

Where would the best place to document those be? maybe it should even be placed in NT?

@Gold856
Copy link
Contributor

Gold856 commented Oct 3, 2024

Documentation in NT and the class doc would be good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants