-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect Results for Extra Floating Functions #108
Comments
I believe this was addressed by #111, so I'll close this. Please re-open if there is more to do here. |
All good, thank you very much! |
Welp, looks like I will be working on this some more: >>> import Numeric.AD
>>> diff (diff log1pexp) (-1000)
NaN This is after #111 has been applied. Is there a way to reopen this issue? As far as I am concerned, the topic still applies. I will make a fresh PR though. |
The problem here seems to be with derivatives of reciprocals of large numbers. This problem is not unique to the new implementations of the extra floating-point functions in #111. For example: >>> import Numeric.AD
>>> diff' (recip . exp) 1000
>>> diff' (\ x -> recip $ x * x) 1e400
(0.0,NaN)
(0.0,NaN) In both cases, the function evaluates to This seems to be a fundamental problem for the |
How many digits of excess precision would we need to calculate it
correctly?
…On Thu, Mar 14, 2024 at 8:39 AM Julian Brunner ***@***.***> wrote:
The problem here seems to be with derivatives of reciprocals of large
numbers. This problem is not unique to the new implementations of the extra
floating-point functions in #111 <#111>.
For example:
>>> import Numeric.AD>>> diff' (recip . exp) 1000>>> diff' (\ x -> recip $ x * x) 1e400
(0.0,NaN)
(0.0,NaN)
In both cases, the function evaluates to recip Infinity, which is just 0
and that would be fine. The derivatives also exist and are well-behaved.
However, they are calculated as follows:
$$f(x) = \frac{1}{g(x)} \quad f'(x) = -\frac{g'(x)}{g(x)^2}$$
So we end up with - exp x / exp x ^ 2 and - 2 * x / x ^ 4 respectively.
This evaluates to Infinity / Infinity = NaN.
This seems to be a fundamental problem for the ad library. My guess is
that it does not usually come up as one would have to be operating with
extreme values. However, with log1pexp being just the Softplus function,
it is quite reasonable to evaluate it at values like -1000 or 1000, so it
is easy to run into this issue.
—
Reply to this email directly, view it on GitHub
<#108 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABBQUFVQAUO4GV4OUPDP3YYGLAPAVCNFSM6AAAAABEJCP4WGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXGM3DIMJXHA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I am not sure what you mean by that, but the derivative of This is neither possible nor necessary, as |
I might be overlooking something, but none of this would even be an issue in the first place if
I understand why |
Yesterday I was wondering why It turns out that one of the motivations was to support instances of vector spaces (https://mail.haskell.org/pipermail/libraries/2014-April/022741.html). In the mean time, I did overlook the fact that types like ghci> log1pexp (1000 :+ 0)
NaN :+ NaN It also looks like @ekmett, who orignally made the proposal, already had |
As for the actual issue at hand here, I have been considering several ideas:
It sounds like this should not be such a hard problem, but I am pretty lost and do not really see a good solution here. I feel like the issue boils down to this:
Sorry if I am rambling a bit. Just trying to put my thoughts down before the weekend since I will not have time to work on this until next week. I am curious if anyone has either more ideas or thoughts on the ideas listed here already. |
I think one part of the answer might be to provide a custom I also vaguely recall that log1mexp might have had wrongly defined behavior in base, which is yet another issue with getting these fixed up to a fully usable state. |
Thank you for chiming in!
I am not using
Maybe you mean this https://gitlab.haskell.org/ghc/ghc/-/issues/17125? I think the |
In https://mail.haskell.org/pipermail/libraries/2014-April/022741.html, @ekmett writes:
Could you elaborate on this (I know it has been a while)? I am trying to understand the possible use cases for having Part of me is still convinced that they should not be type class members but instead separate functions. They cause issues in If there are no compelling use cases, I might consider making a CLC proposal to demote them to regular functions with an |
It was a bit of a nightmare even getting things to the half-implemented state they are in The comment about vector spaces has to do with the fact that, say, That's on me. The folks that get complicated is when you have things twisted together, e.g. Complex numbers or Quaternions or AD or Compensated arithmetic. Complex a (done properly) needs information not supplied by Ord, even if they never got around to implementing Complex properly in base. (I vaguely recall that there are a bunch of other issues with Complex as specified, but let's keep going here.) There you have to actually know something more about what you are lifting, which is what you are encountering here. I'd be rather vehemently against going back to a world where I can't ever fix these functions on vector spaces and have to implement non-trivial numerical algorithms either monomorphically or with some one-off ad-hoc class instead. I needed log1p to get rid of the problem with the taylor series starting with 1 + ... drowning out all information when x is small. From there everything else is gravy.
and For types that do offer Ord It is a known issue that any class that seeks to use this extra information e.g. Floating (AD a) is going to have to make a choice, but I didn't want that issue to extend down to the base Now we can try to tease apart what AD should do here. Consider the following cases: I'd like to do something sensible for each -- even if that involves splitting Forward into multiple types, or applying an argument tag or separate set of modules to indicate chosen behavior.
But now let's consider a symbolic "Expr" type:
My personal primary usecase for AD involves about an 80/20 mix of doing 80% Expr-style AD calculations and 20% actual simple floating point operations. Losing the ability to handle expression types is a complete nonstarter as literally it is the reason I wrote and maintain the library. Situational improvements to the other case are a nice to have, though, and I'm more than open to adding modules that are more robust for that limited scenario or to finding nice ways to tag types to allow us to special case reasoning for it. e.g. we have ForwardDouble. That could benefit, obviously. With a little elbow grease, we could roll more specialized instances, or come up with a general pattern that captures this across multiple numerical types. The best mechanism I have for doing this is backpack though, and backpack runs afoul of stack never having gotten around to supporting it. The downside of the status quo is that we don't recover from some NaNs that we otherwise could recover from in the case where you use this on large floating point values. Do I have that correct? |
First of all, thank you for your reply. From what I know, you are not as active in the Haskell community anymore, so I really appreciate you taking the time to give such a detailed response.
Alright, I think I understand the problem. Although it feels like this approach is difficult to scale, as every function that does any kind of comparison would need to be moved to one of the numerical type classes.
The implementation ghci> log1p . negate . exp <$> [-1e-16, -1.5e-16, -1e-20]
[-36.7368005696771,-36.7368005696771,-Infinity]
ghci> log1mexp <$> [-1e-16, -1.5e-16, -1e-20]
[-36.841361487904734,-36.43589637979657,-46.051701859880914]
Before #111, it was worse than what you describe here, with overflow, underflow, and catastrophic loss of precision. This was because all four of the extra functions were using their default implementations, both for evaluation and for the first derivative. After #111, there should be neither loss of precision nor overflows in any of the extra functions or their first derivatives. Only the higher-order derivatives have overflows for reasons described in #108 (comment).
Yes, and that is probably what I will be using for my application. It is just regrettable that the functions in the
This is probably a good insight. It is easy to become accustomed to AD things getting everything right when that is in fact a difficult problem to solve. Maybe it should be more suprising that it works so well in almost all other cases.
My use case is entirely floating point numerics for an optimization problem. But I am not at all suggesting we drop support for doing AD on arbitrary types. I was just hoping there would be a way to have both. Unfortunately I do not have the time to do any ambitious restructuring of |
I do think we could do a bit better. e.g. letting log1pexp just delegate to the log1pexp of the underlying number type for the primal would help a lot in both worlds. I think the "right" fix for this might be to add variants to the constructions here in a separate subdir, maybe "Numeric.AD.Numeric.X.Y.Z" that handles these maybe by using some awful class to describe the derivatives of them that can be applied to the argument type, but that's like 30+ modules to cut and paste. If I got clever with backpack the changes would be contained to a dozen lines, but then I break stack support. We could do better with another include file hack like we use to generate most of the instances, but then the code becomes even harder to read. |
Yes, that is exactly what #111 does.
I have never used backpack, but all of these options sound like they would add a considerable amount of complexity for perhaps not too great of a payoff. But that is not for me to decide. |
I basically already paid this complexity once to add the Rank1 variants so that folks who wanted to skip the safety of the API in exchange for being able to do things with multiple passes, etc. could do so if they wanted, so this isn't out of line with what I've been willing to do in the past. |
Alright, sounds good to me. So my conclusion is that we are currently doing the best we can be doing with the current architecture. As mentioned earlier, I do not have the time to do any big changes at the moment, so I will use |
I'll keep it open to nag me for now. |
The library calculates incorrect results for at least some of the extra functions
log1p
,expm1
,log1pexp
, andlog1mexp
of theFloating
class.For example:
Near zero, there are only mild inaccuracies, but further away, the results are completely wrong.
I believe that this is due to missing definitions for these functions in the type class instances. This causes them to use the default implementations for those functions, which are pretty bad. For example,
log1pexp x = log1p (exp x)
, which exhausts the precision ofDouble
quite quickly.I will do some further testing and then most likely start working on a pull request.
The text was updated successfully, but these errors were encountered: