-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Functionality to Apply Constraints to Predictions #92
base: main
Are you sure you want to change the base?
Conversation
@SimonKamuk did you figure out a solution to #19 (comment) ? Sorry to comment already now, I know this is work in progress, I'm just curious about this :) Thinking about it a bit more, I realized that one solution would be to just always apply the skip-connection before any activation function. So that the skip-connection is for the non-clamped values. E.g. since both sigmoid and softplus is invertible you could do something like |
That's quite a neat way to do it! I initially applied the clamping function to the new state, f(X_t+model()) but then realized this would mess with the residual connection, so what I implemented is basically this:
I wonder if there is an argument for using this method compared to your suggestion? |
This is quite interesting and I'm trying to get some better understanding of what might be a good approach. I made this desmos interactive plot to try to wrap my head around it: https://www.desmos.com/calculator/tnrd6igkqb Your method definitely also works correctly. I realized that this (clamping the delta to new bounds) is equivalent to not having any skip-connection to the previous state. Below I'm ignoring the mean and std rescaling for simplicity, and assume we want to rescale variable to where So this practically removes the dependence on the previous state. The difference to my "invert-previous-clamping" approach is that that would equate skip connections on the logits, before any activation function (clamping here). So that does maintain some dependence. I'm not sure if this is important. A simple way to implement that approach would be to do the clamping in This really relates to if one wants to use the skip-connections over states or not. I think it would eventually be nice to have this as an option. Maybe these two clamping strategies should correspond to the selection of that then? |
Oh wow that's a good catch. I agree that we want the option to keep the skip connection, so my method is not the way to go - even if it was applied after unrolling, because then we would still be removing the final skip-connection at every ar step (although the first ones would indeed be preserved). I'll have a go at implementing your inverse method |
I implemented your suggestion, but I added the constraint that the input (previous state) to the inverse sigmoid and softplus are clamped hard to avoid the inverse functions from returning inf - this would have prevented the model from ever outputting anything other than 1 if say relative humidity was clamped to [0,1] and the previous state was already 1. But this should not be an issue, as the clamping is only applied to the previous state, not the model output itself, so the gradients can still be computed. |
Hmm, yes that's an important consideration. Good that you thought about this. I'm guessing that the situation could occur that a variable is >=0, and an initial state where it is = 0 exists. Note that gradients do go through also the previous state (we don't detach these from the computational graph), not just the model output, when we unroll during training. So the clamping does still impact gradients. However, I don't think this should be a problem in practice and this solution should work fine. In the case that the previous state comes from a model prediction during rollout, it should not be possible for it to hit exactly 0/1, so the clamping would anyhow not have an effect. |
I still don't understand why this last test is failing, could it be a resource issue? If anyone knows what is going on I'm all ears 😄 but as far as my changes are concerned I think this is ready for review |
The test failing is probably not directly related to code, but to resources. I see I've added to my TODO list to give this a proper review. A couple high-level consideration in the meantime:
What's your thoughts on this? Are there good reasons to do it the "inversion"-way? The extra unnecessary compute is quite small, so maybe not an issue really, but doing the inverse-clamping is a bit more complicated and less transparent in showing that this is applying skip connections on pre-activation representations. |
|
Added description of clamping feature in config.yaml
Thanks for the clarifications above @SimonKamuk ! You make some good points that I did not think about. In particular, as you write, we need to consider what actually goes into the model at each time step, which depends on when the clamping is applied. I need to think that over + then give this a full review. That will have to be in 2025 though, so just letting you know to not expect my review on this until after new years 😃 |
Describe your changes
This change implements a method for constraining model output to a specified valid range. This is useful to ensure reliable model output for variables the cannot physically fall outside of this range - such as absolute temperature which must be positive or relative humidity which must be between 0 and 100%.
This is implemented by using the config.yaml for specifying valid ranges for each parameter, where each variable defaults to not having a limit. A scaled sigmoid function is then applied to the prediction for variables that have both an upper and lower limit, and a scaled softplus is used for variables that must be above or below a certain threshold.
Issue Link
closes #19
Type of change
Checklist before requesting a review
pull
with--rebase
option if possible).Checklist for reviewers
Each PR comes with its own improvements and flaws. The reviewer should check the following:
Author checklist after completed review
reflecting type of change (add section where missing):
Checklist for assignee