-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch Estimator to use an exponential weighting #539
Conversation
Edit: all the issues in this comment have now been fixed. Leaving for posterity. For the sake of having a draft PR to discuss and test, I put the A few notes on things that might need changing before this is merged:
|
484a638
to
f331ce5
Compare
src/state.rs
Outdated
if divisor != 0.0 { | ||
batch = duration_to_secs(elapsed) / divisor; | ||
}; | ||
let delta_steps = new - self.prev.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One trick I used in a previous job was to just straight up set the smoothed_steps_per_sec
(and double_smoothed_steps_per_sec
) to the new value if they were uninitialized. Maybe that would be a decent approach here? Should those values become Option
in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are two separate questions here:
- What should the estimator return when a value is requested but it is in the uninitialized state?
- What the should original (uninitialized) internal value of the estimator be?
Because the weight of data in the estimator is always positive (there's no horizon at which the weight = 0), the only sensible answer to (2) is 0. That's because you have no way to pick the right value ahead of time, so you will want to divide out the effect of the original estimator value (I call this debiasing). This is much easier to do if it is 0, because then you just divide the naive estimate by weight(age)
.
I don't really have an opinion on (1) so I'm happy to have it do whatever you like. If you'd like to have an uninitialized estimator return None
, that's fine with me.
BTW in case it's unclear, after the first update
the estimator value will be equal to the value of the first update (for both smoothing and double smoothing), it's just that prior to the first update there's not a correct rate to return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are two separate questions here:
- What should the estimator return when a value is requested but it is in the uninitialized state?
[..]
I don't really have an opinion on (1) so I'm happy to have it do whatever you like. If you'd like to have an uninitialized estimator return
None
, that's fine with me.
Having an unitialized estimator return None
sounds good to me in theory at least, though it might depend a bit on how much downstream complexity that requires to make work? Off the top of my head it seems like it might not be too much and would force us to write something like N/A
in the progress bars, which seems good?
- What the should original (uninitialized) internal value of the estimator be?
Because the weight of data in the estimator is always positive (there's no horizon at which the weight = 0), the only sensible answer to (2) is 0. That's because you have no way to pick the right value ahead of time, so you will want to divide out the effect of the original estimator value (I call this debiasing). This is much easier to do if it is 0, because then you just divide the naive estimate by
weight(age)
.
Fair enough.
BTW in case it's unclear, after the first
update
the estimator value will be equal to the value of the first update (for both smoothing and double smoothing), it's just that prior to the first update there's not a correct rate to return.
Okay, good!
This is looking great, thanks for working on it!
Nice. The reason these are here is (a) for performance, calling
I've addressed this in my review feedback.
Right, I think this makes sense.
Maybe we can go back to just deriving |
Most outstanding issues resolved. Heads up that I will be force pushing in a moment to split out the |
@chris-laplante have opinions on this stuff? I've felt like #394 has been the biggest open issue and this feels like a good step forward. |
3a0951a
to
7f24174
Compare
@djc I've fixed all the issues I had with the code, so I've marked it ready for review. Happy to make additional changes you want to see. Wasn't totally sure how you wanted the comments on the weight function formatted - I made them doc comments but code comments would maybe make more sense, as these are implementation details. The remaining nit I'd like to poke at is trying to extract the debiasing from the |
I've been slammed at work, but I will try to take some time later this week to go over this in detail. |
7f24174
to
c670350
Compare
@chris-laplante what's the chance that you'll get to it this week? Otherwise let's just go ahead and I'll take care of it and you can follow up later? |
@djc I'd like to replace the |
Yup, go ahead. Please make a separate commit that only does that, though. |
c670350
to
e31cbf3
Compare
I've now done so, and force pushed so that it is grouped with the other refactoring change before my exponential weighting work. I also added some more substantial commentary on debiasing to the |
Sorry, was on vacation last week. I will follow up with it later. Thanks for understanding! |
I'm no statistician, but I looked this over and I think it makes sense. Your explanation of "debiasing" also make sense. I do also wonder if there is a better term than "debias" we could use - I tried searching for it in the context of exponential smoothing and it doesn't seem to come up in literature much. But I also don't think it's a big deal now that you've added plenty of comments to explain the approach. This definitely seems like a great step forward, so thanks again for your work! |
I agree that "debias" is not an especially accurate description of what's happening. Strictly speaking, the problem with the naive estimate is that it assumes a non-zero weight for the initial value of the estimator, not that it is "biased" towards that initial value. Maybe a better way of talking about this would be to call debiased estimates "normalized estimates" instead. After all, if where This makes the problem fairly obvious. The sum of the weights in our weighted average isn't 1! As you can see from my plot of the weighting function the weights in the weighted average will only sum to 1 if the samples continue back into the infinite past. Since we set the initial value of the estimator to 0, rather than getting a bogus estimate, all the weights for times prior to @chris-laplante @djc Do you like this way of describing the concepts better? I can rewrite my explanatory comment to use the language of normalization rather than calling this debiasing. Or we could just call it a "correction", though that's a little vague. Also, please let me know if there's anything you'd still like to see, as I think this is otherwise ready to be merged. Side note on the statistical literature and why I initially chose the "debiasing" language:I developed the approach here independently; a quick skim of the literature suggests that this is usually discussed as an "initialization" problem. The assumption is that the estimator has to take some initial value, and so you figure out a method for doing so. Either you just use the first sample, or the sample mean, or a MSE minimizing value, etc etc. Trying to set a reasonable initial value like this probably makes sense when doing a post-hoc analysis on data that has an (unknown) value before After developing this method, I went looking for anyone who had taken a similar approach to the problem, and found this one StackOverflow answer which handles the initialization issue in the same way. I borrowed the "debiasing" language from that answer, as I hadn't come up with a better way to describe it. |
I've pushed a new commit that does as I suggested in the comment above, and hopefully includes enough commentary to clearly justify the behavior. I kept it as a second commit for now so you can see the change, but I'll squash it into the previous commit and force push if you want to clean up the history before merging. |
d1cbe7f
to
653891e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! If you squash these last two commits, I'll be happy to merge this.
Would you mind adding a commit that bumps the version to 0.17.5?
It's easier to read separate state tracking fields for steps and time than to store them together in a single tuple. No functional change intended.
This is an implementation of an exponentially weighted running average with a tuning parameter (currently a constant in this implementation): * the `ews` parameter is a number of seconds. Progress the Estimator has observed that is older than this value will receive a total weight of 0.1 in the rate estimation, and newer values will receive a total of 0.9. The default is 15 seconds. This implementation does double smoothing by applying the running average to itself. The result avoids undesirable instantaneous movements in the estimate when large updates occur. The exponential estimator works by keeping a running tally, where an existing tally that has aged `t` seconds is reweighted such that weight ^ (ewa / age) = 0.1 For instance, data aged 5 seconds with a 15 second weight parameter would receive `weight = 0.1 ^ (5/15) = 0.464`. If it then ages another 10 seconds, it would receive `weight = 0.1 ^ (10/15) = 0.215`. After being multiplied by both weights, it would have a weight of `0.464 * 0.215 = 0.1`, as expected. A couple of basic features are also implemented for higher quality estimates: * We divide out any weight given to data before the estimator was initialized. This is called "normalization" in the code, because it renormalizes the weights in the weighted average to sum to 1. * When returning an estimate, we include the time since the last updated was received as non-progress, which means that the estimator does not freeze when progress stalls.
653891e
to
1d0668d
Compare
Done and done. Needed to rebase because the PR was based on 0.17.3. |
Thanks for all your work on this! 0.17.5 has made it out to crates.io. |
This is an implementation of an exponentially weighted running average with a tuning parameter (currently a constant in this implementation):
ews
parameter is a number of seconds. Progress theEstimator
has observed that is older than this value will receive a total weight of 0.1 in the rate estimation, and newer values will receive a total of 0.9. The default is 15 seconds.This implementation does double smoothing by applying the running average to itself. The result avoids undesirable instantaneous movements in the estimate when large updates occur.
The exponential estimator works by keeping a running tally, where an existing tally that has aged
age
seconds is reweighted such thatFor instance, data aged 5 seconds with a 15 second weight parameter would receive
weight = 0.1 ^ (5/15) = 0.464
. If it then ages another 10 seconds, it would receiveweight = 0.1 ^ (10/15) = 0.215
. After being multiplied by both weights, it would have a weight of0.464 * 0.215 = 0.1
, as expected.A couple of basic features are also implemented for higher quality estimates:
We divide out any weight given to data before the estimator was initialized, since these values are meaningless.
When returning an estimate, we include the time since the last updated was received as non-progress, which means that the estimator does not freeze when progress stalls.