-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add root mean squared scaled error to metrics #2031
Add root mean squared scaled error to metrics #2031
Conversation
It looks interesting but wouldn't be simpler to just add an argument to mse? Or at least reuse it in rmsse? |
Thanks for having a look @madtoinou . I think you are right about reusing rmse() in rmsse(). I changed it in the PR. Adding an argument to mse() would make mse more complicated then it needs to be. I think the way rmsse() is now included is in line with how mase is included, as well as in line with how rmse() relates to mse(). What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Beerstabr and thanks for this, looks like a great start 🚀
You can go ahead with the PR!
Main suggestion was to refactor the MASE and RMSSE so that they both use the same functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Beerstabber for this contribution and sorry for the late review.
I think we can indeed add this to Darts (keeping in mind that RMSSE also has its cons, e.g. when the naive score is bad).
As @madtoinou mentioned, we should try to simplify this a bit.
It is using a similar logic to MASE, can we refactor it so that both functions make use of the same logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dennisbader I refactored the code such that mase and rmsse are wrapped similar to the other metrics. Might be possible to further refactor by for example combining multi_ts_support() and multi_ts_support_insample(). Not sure if it is worth the effort. Let me know what you think.
Also just copied the mase tests for rmsse. Will refactor it if you like the other changes I've made so far.
Would like to add the Scaled Quantile Loss metric as well if you agree that this would be a worthwhile addition.
…github.com/Beerstabr/darts into feature/add_root_mean_squared_scaled_error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very neat now, great job and thanks @Beerstabr 🚀
Apart from some minor suggestions, I think it makes sense to combine the multi_ts_support for the non-scaled and scaled logic. I left a comment there with a suggestions.
@@ -277,11 +282,9 @@ def test_smape(self): | |||
self.helper_test_nan(metrics.smape) | |||
|
|||
def test_mase(self): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can parametrize this test with pytest to check both metrics:
@pytest.mark.parametrize("metric", [metrics.mase, metrics.rmsse])
def test_scaled_metrics(self, metric):
then we can replace metrics.mape
with metric
and remove the test_rmsse
test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a check for MASE, RMSSE that verifies an expected metric score for a prediction that is not equal to the target (e.g. MASE/RMSSE != 0)?
@@ -107,6 +107,104 @@ def wrapper_multi_ts_support(*args, **kwargs): | |||
return wrapper_multi_ts_support | |||
|
|||
|
|||
def multi_ts_support_insample(func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job with refactoring the scaled logic!
It would indeed be nice to have the multi ts support handled by the same wrapper, as only the insample
handling is different between the two.
You could check for example that "insample" is part of func
's signature to know which logic to use.
What do you think?
elif len(args) > 3: | ||
m = args[3] | ||
else: | ||
m = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should use the func's default parameters here and for intersect:
m = 1 | |
m = signature(func).parameters["m"].default |
) | ||
m = 1 | ||
|
||
value_list.append(func(y_true, y_hat, x_t, m, *args[3:], **kwargs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if m
is passed in args
, then this will give it twice to func
(once through m
, and another time through *args[3:]
).
Do we have to exclude it from args when we retrieve it at the top?
raise_if_not( | ||
not np.isclose(scale, 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below might be more interpretable
raise_if_not( | |
not np.isclose(scale, 0), | |
raise_if( | |
np.isclose(scale, 0), |
raise_if_not( | ||
not np.isclose(scale, 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise_if_not( | |
not np.isclose(scale, 0), | |
raise_if( | |
np.isclose(scale, 0), |
Hi @Beerstabr, just wanted to check in whether you're still working on this PR? :) |
Added root mean squared scaled error as used during the M5 competition to metrics.
Also see this link for explanation.
Is this something you are interested in? If so, I'll add some tests.